Earlier this year, I wanted to create a data explorer app using Shiny for Python. You see this kind of application a lot on government data portals. There are data visualizations, components to filter the data, the raw data itself, and be able to download the final data set so you can use it locally.
Shiny did not have the type of filtering behavior I wanted, so I needed to build my own. I got the opportunity to work with Joe Cheng at Posit to learn about building software and Shiny modules. The shiny_adaptive_filter
is available as a solution to the filtering problem you can install and try in other contexts. Skip to the end for an example and install instructions.
Here’s a few tips I learned through the process to level-up your Shiny code to be more maintainable and reusable.
Note
If you are new to Shiny Modules you can read about them in the Shiny Modules learning page.
One part that always bothered me when using components to interactively filter a data set was that, nothing prompted me about which filter options were valid as I was interacting with the application. I would frequently end up choosing a combination of filters that would return an empty dataframe. Clicking around and not knowing you would end up with an empty dataframe was a bit jarring. A big chunk of my interface would just disappear.
The current set of Shiny filters and the way most Shiny apps are written have all the filtering components and
each other to filter the data. This is why it will return an empty dataframe.
What I wanted were filters that were aware of what the other filters were doing, and update their own values that adapts to the other filters. There was no way to do this in Shiny for Python out-of-the box, so I had to go and implement the feature on my own, and hopefully be able to share it with everyone else.
As a data scientist, we typically write code in some kind of pipeline. So, we end up making sequential modifications to a variable. My advice, do not reuse the same variable names, especially if you are going to be changing the variable’s type throughout the code. It adds to cognitive load when trying to understand the implementation and also makes it a bit harder to reason about during maintenance.
#| include: false
from shiny import reactive
Here’s an actual example of the code I wrote that illustrates this point. I want to take the intersection of data frame index values so I can get a final index of values I can use to subset my data frame that represents the choices the user selected in my filter components.
The function does these steps
set
list
of the values@reactive.calc
def filter_idx():
= df_tips() # <<
df = set(df.index) # <<
idx
if input.filter_day():
= df.loc[df["day"].isin(input.filter_day())].index
current_idx = idx.intersection(set(current_idx)) # <<
idx
if input.filter_time():
= df.loc[df["time"].isin(input.filter_time())].index
current_idx = idx.intersection(set(current_idx)) # <<
idx
return list(idx) # list because .loc[] would return TypeError# <<
The idx
variable undergoes 3 type changes, index
, set
, and list
!
If you just plan to use this function (technically a @reactive.calc
), you would only care that it returns something that you can use to subset a data frame with. That’s the benefit of abstraction, but as someone who will maintain the code base, and in my case pair-programming, it makes the implementation extremely hard to follow.
Turns out pandas Index
objects, have an .intersection()
method. So, no reason to convert to a set
, and we can implement everything we need with the same data type.
@reactive.calc
def filter_idx():
= df_tips() # <<
df = df.index # <<
idx
if input.filter_day():
= df.loc[df["day"].isin(input.filter_day())].index
current_idx = idx.intersection(current_idx) # <<
idx
if input.filter_time():
= df.loc[df["time"].isin(input.filter_time())].index
current_idx = idx.intersection(current_idx) # <<
idx
return idx # <<
Much better! No more getting values of one type, and doing inline type conversions to make a calculation and returning an entirely different type. This makes the implementation much easier to reason with, and if you’re skimming the code to track a bug, you won’t miss the set()
and list()
calls.
Python Type hints are a way where you can quickly see what datatype you are working with. Especially when your code base is a bit larger and you need to know what the inputs and outputs of a function are.
You won’t need to rely on duck typing and “hope for the best”. Your intentions are clear. The problem I encountered from the previous issue, can be mitigated by writing the code better, but adding type hints can help with larger code bases when you are working with different types.
Python is a dynamically typed language, variables do not need explicit type declarations. This is where Python’s “duck typing” comes from, and is what makes python flexible as a scripting language, but can make it more difficult to understand and maintain in larger projects.
Python type hints were introduced in PEP 484 and implemented in Python 3.5. They help address duck typing ambiguity with type annotations. Type hints specify the expected types for variables, function arguments, and return values. Type hints don’t enforce types at runtime (you can even put in incorrect types). They can serve as documentation, improve code readability, and enable tools to catch type-related errors.
Pyright and Mypy are two of the more popular static type checkers for Python. The Shiny team uses Pyright in strict
mode in their code base. Sometimes you need to go out of your way to make the type checker happy. But, the benefit is you get complete type safety and will see warnings as you work. This makes your code much easier to maintain, bring on new people, and reason with as your are working on different parts of the codebase.
Here are 2 different examples of basic type hints in Python
If you end up completely redefining the type of a variable with the same name (not exactly the same situation from Tip 1), you will get a reportRedeclaration
message from the type checker.
from __future__ import annotations
import pandas as pd
= pd.Index([1, 2, 3]) # reportRedeclaration here
idx_int: pd.Index[Any] int] = pd.Index([1, 2, 3]) idx_int: pd.Index[
You may need the from __future__ import annotations
at the top of your python file. This allows type hints to be stored as strings rather than immediately being evaluated. Without it, you may get a TypeError
, in this specific example, you would can get a TypeError: type 'Index' is not subscriptable
message.
Sometimes things are out of your control, and you will manually need to turn off a warning. One example comes from pandas
, where I used their functions to determine the dtype
of a Series
.
from pandas.api.types import is_numeric_dtype, is_string_dtype
Pyright reports a reportUnknownVariableType
error, and there is not much I can do to fix this error without making a change to the main pandas
library, or pandas-stubs.
Pyright 1.1.229+ supports suppressing individual diagnostics
# pyright: ignore [reportUnknownVariableType]
The cast()
function is used when you know (or assume) that a variable is of a specific type, even if Python or a static type checker might not infer it directly. You do need to be careful when using cast()
, since you are intensionally making an assumption about the variable type, and if you get it wrong, it would be difficult to track down the bug.
One example that was used in the adaptive_filter
codebase was telling Pyright the type of data stored in the dataframe index. We used a function to return the index of a dataframe to make sure the type checker understands the data type stored.
from typing import Any, cast
def return_index(df: pd.DataFrame) -> "pd.Index[Any]":
return cast("pd.Index[Any]", df.index)
Now anywhere we would normally call df.index
we would now call return_index(df)
TypeVar
and Generic
s are tools in the python typing system that allows you to create more flexible (i.e., “generic”) type hints where you can put in a placeholder for a types and delay specifying the actual type later on, while still maintaining type safety.
If you are familiar with object oriented programming (OOP), inheriting objects from a common base class, or abstract base class (abc
), then TypeVar
and Generic
s are how you will add type hints to your base class.
In our adaptive filter module, we had an abc
for a BaseFilter
class, One of the methods in the abc
returns needs to return values from a filter component, but depending on the data stored in the filter, it may return values as a str
or int
.
from abc import ABC
from typing import TypeVar, Generic, Iterable
= TypeVar("T")
T
class BaseFilter(ABC, Generic[T]):
...
def _get_input_value(self) -> Iterable[T] | None:
...
When we go implement each object that inherits from BaseFilter
, we can pass in the actual type that T
was used as a placeholder.
# class that can deal with categorical variables stored as a string
class FilterCatStringSelect(BaseFilter[str]):
...
Shiny modules are used to follow the DRY (Don’t Repeat Yourself) principle. The same concept of creating functions so you can abstract away and reuse computations, is a similar concept of Shiny modules. The term “module” in Python typically refers to a .py
file that contains Python functions that get imported into a file. While “Shiny modules” are typically .py
files that get imported into the app, the term “Shiny module” is not synonymous with a regular “module”. “Shiny modules” are specifically used in a Shiny for Python application to encapsulate reactive components in a namespace to avoid namespace clashing of the component id
because each component in Shiny must have a unique id
.
Writing functions isn’t the only way you can reduce repeated code. for
loops are another common way to write code to reuse a common code base for repeated actions. So how do you know when you need to refactor your code into Shiny modules? What “code smells” should you look out for?
If you find yourself in any of the following situations, you may want to think about refactoring your code into a Shiny module.
id
values and iterating over and calling a function that makes a component.id
and some other input for the component.
id
or label
, but can also include things like a column name of a dataframe.zip()
functionIn the adaptive filter module, the initial implementation tracked 3 things: id
, column name, and type of variable stored in the column.
= ["filter_size", "filter_id", "filter_total_bill"]
filters = ["size", "id", "total_bill"]
cols = ["cat", "cat", "sliders"]
col_types
for fltr, col, col_type in zip(filters, cols, col_types):
...
All 3 bits of information needed to be tracked together.
filters
: get the user inputs from the ui
.cols
: tied to the filters
variable, and used to extract the corresponding column from the data.col_type
: determine how the data needed to be filtered. For example, selectize
components always return values as a list of strings (List[str]
), and needed to be converted to a numeric type to filter the data.From a maintenance and end user perspective, knowing the column name should be enough to figure out the rest of the parts. As long as your provide a way for the end user to override any default, the code as written forces them to manually track a lot of unnecessary information for their own application.
The previous “code smells” are listed in the Shiny for Python modules documentation, but there are other ways you may want to consider whether or not you need modules.
id
s without fear of clashing with the main application.@reactive
intermediate steps.server()
and ui
.When you create a module, you specifically create a namespace for all the components inside. Whatever id
names and calculations you need are all in their own namespace.
Using the same code example from above, we are manually tracking 3 parts for each component.
= ["filter_size", "filter_id", "filter_total_bill"]
filters = ["size", "id", "total_bill"]
cols = ["cat", "cat", "sliders"] col_types
If we are giving just the column name, cols
, we can automatically create the id
by prepending the filter_
string. We can run into a risk that if this component was just a function, it will clash with an existing id
by the end user, e.g., what if they already have a filter_size
component id
for something else? If you think the answer is to add more underscores _
to the id
name, and create something like _filter__size
, then you really need to encapsulate the function into a module.
By default, we don’t really need the end user to provide anything, just the dataframe is enough to get the cols
value, from there we can generate the filters
list, and we can write out own function that calculates a col_types
. We talk more about user overrides in Tip 5, for now let’s assume we only have 3 columns in our entire data set. All 3 of those calculations can be done in separate @reactive
calls. A module will be able to abstract away all these calculations outside the main app.py
, and make the main application easier to maintain.
Finally, if we wanted to add another adaptive filter component into the application, we need to track the information in at least 3 places:
# in the server function of the application
= df.loc[df[col].isin(filter_value)].index
current_ids
# in one of the helper functions
if is_string_dtype(col):
return "cat_str"
# in the server and/or ui of the application
"table_size_filter"),
ui.output_ui(
@render.ui
def table_id_filter():
return ui.input_selectize(
"filter_id",
"id filter:",
sorted(df_tips()["id"].unique().tolist()),
=True,
multiple=True,
remove_button={"plugins": ["clear_button"]},
options )
Just needing to create a new component, or new component type requires the user to change the code many locations in the application. Forgetting to change any one of the locations is a common mistake, and can be easily forgotten. As the application grows, the places where the codebase needs to be updated to incorporate new features will be farther apart, i.e., more lines of code between needed changes. Creating a module can keep the coupling of code closer together, so making changes or extensions is easier.
Testing is always a good idea. When working with Shiny you want to split up functions that require Shiny end-to-end and behavior testing, with your main logic.
Not everything needs to be in the server function, and not everything needs to be inside a reactive. You can still call regular Python functions, so when possible, write regular functions and call them in a reactive. If you have written unit tests before or used the assert
statement, then you can still write tests for your Shiny application.
If you are able to refactor your code into individual non-reactive functions, you can leverage the larger unit testing infrastructure Python provides, e.g., pytest This is a great general Shiny tip, where you can and should be able to create helper functions completely outside of any @reactive
context, and then call the function inside a @reactive
.
Here is a helper function that was used in our adaptive filters module, It takes in a list of pandas Index
objects, and finds the .intersection()
across all the objects. This calculation is used many times throughout the application. It also makes a few data checks beforehand (expressed by the ...
in the code below).
def index_intersection_all(
"pd.Index[Any] | None"],
to_intersect: List["pd.Index[Any]",
default: -> "pd.Index[Any]":
)
...
= intersect[0]
intersection for index in intersect:
= intersection.intersection(index)
intersection
return intersection
We can test this function like a regular Python function.
import pandas as pd
= pd.Index([1, 2, 3, 4, 5])
idx1 = pd.Index([2, 3, 4, 5, 6])
idx2 = pd.Index([3, 4, 5, 6, 7])
idx3 = pd.Index([1, 2, 3, 4, 5, 6, 7])
default
def test_index_intersection_all():
= [idx1, idx2, idx3]
to_intersect = pd.Index([3, 4, 5])
expected = index_intersection_all(
calculated
to_intersect,=default,
default
)assert (calculated == expected).all()
assert calculated.equals(expected)
You can then leverage all the benefits and tools from pytest
in testing your Shiny application and/or Shiny module, including test fixtures.
Testing the reactivity and end-to-end behavior in Shiny for Python is limited to Playwright. The Shiny for Python documentation has an article on End-to-end testing for Shiny. You can run your end-to-test testing with pytest
and playwright
with
pip install pytest pytest-playwright
You are a bit limited to the capabilities of Playwright, but Shiny does have a few wrappers for playwright that makes it easier for you to test your application.
First, Shiny provides you a controller
object. This provides you a more convenient way of finding input or output components by the id
that was used in the application.
from shiny.playwright import controller
From there, you can use the Shiny Testing API docs to find the corresponding component in your application that you want to test. For example, this is the documentation for the InputSelectize playwright controller.
We then create our test app and test function. This will run the Shiny application in a single browser tab, and then go to the app URL.
from shiny.run import ShinyAppProc
from playwright.sync_api import Page
from shiny.pytest import create_app_fixture
= create_app_fixture("app.py")
app
def test_basic_app(page: Page, app: ShinyAppProc):
page.goto(app.url) ...
From there we can use various .set()
, .expect_*()
, methods from the controller components to modify the application and test the results.
The tests will run through pytest
.
pytest .
You do have the option to set different browsers and also headlessly (default) or headed (by passing --headed
). Changing browsers and the “headedness” can help see if your issue is specific to the application or with the browser.
pytest . --browser chromium --headed # chromium also headed
pytest . --browser firefox # firefox
pytest . --browser webkit # webkit/safari
Here’s an example of a test we used in a simple adaptive filter application.
from shiny.playwright import controller
from shiny.run import ShinyAppProc
from playwright.sync_api import Page
from shiny.pytest import create_app_fixture
= create_app_fixture("app.py")
app
def test_basic_app(page: Page, app: ShinyAppProc):
page.goto(app.url)
= controller.InputSelectize(page, "adaptive-filter_day")
selectize_day set("Fri")
selectize_day."Fri"])
selectize_day.expect_selected([
= controller.InputSelectize(page, "adaptive-filter_time")
selectize_time "Dinner"]) selectize_time.expect_choices([
Since the adaptive filters are created within a Shiny module, we have to be mindful of the id
given for the module’s namespace. The actual component will be the given id
followed by a dash, -
, then whatever id
was used for the component inside the Shiny module.
For example, if we called the module with the adaptive
id
,
= adaptive_filter_module.filter_server("adaptive", df=tips) filter_return
the component id
of the application for the day
column filter would be adaptive-filter_day
because inside the module, our filter names use a filter_colname
format.
From there, we can test clicking on a day of the week using an adaptive filter, and checking to see if another filter’s values have changed to the selection.
People will only use your tools if you make it easy for them to use. This is a bit of an art, and every situation is going to be different.
If this is code just for yourself, and only you will maintain a codebase in the future, then the user interface does not need to be as seamless. If you are going to put this codebase into the hands of other people, and you are trying to get people to adopt your tool, then you do not want to put any more hurdles in their way. You will also need to think about the skill of the average person who may use your tool.
Shiny is a tool for data scientists, and because of data science’s popularity in the last decade, the training for a data scientist is not from software engineering and computer science.
But when a tradeoff between convenience for the user, the future developer who will extend the library, yourself in the future, or the object oriented dogma. Sometimes it might be okay to sacrifice the dogma to make everything else convenient.
The filters will try its best to use simple heuristics to automatically return the correct filter type based on the contents of the column.
We ended up writing the code, so the user can provide customizations in one of 3 ways:
None
label
parameter to a custom component type to rename the labelHere’s an example of how the user can customize the components
= {
override "total_bill": None,
"day": "DAY!",
"time": shiny_adaptive_filter.FilterCatStringSelect(),
"size": shiny_adaptive_filter.FilterCatNumericCheckbox(label="Party Size"),
}
All of the filters are documented in the module for the app author to look up, and provides an easy interface for them to use: the dictionary keys are the columns of the data set, and the values are any manual overrides the developer wants in their application.
# in the ui
"adaptive")
shiny_adaptive_filter.filter_ui(
...
# in the server
= shiny_adaptive_filter.filter_server(
adaptive_filters "adaptive", df=data, override=override
)
# a reactive value that can be used anywhere else in the app
= adaptive_filters["filter_idx"] adaptive_filters_idx
The implementation we used in adaptive filters uses an old C trick of having a finish_init()
method that is run after the developer passes in the inputs for the constructor.
class BaseFilter(ABC, Generic[T]):
def __init__(self, *, label: str | None = None):
...
def finish_init(
self,
| pd.DataFrame,
data: Callable[[], pd.DataFrame] id: str,
str,
column_name: *,
| None = None,
session: Session
):
...
return self
This is so the user only needs to pass in the type of filter they want to override, or the component label that will be displayed in the application. This decision was made so it balances user convenience and developer convenience, but sacrifices on one of the dogmas of object oriented programming, where a valid object gets created, but still cannot be used until another method gets called. Anytime the filter constructor gets called with the user inputs, we must call the .finish_init()
method to have a use able filter component object.
=label)\
shiny_adaptive_filter.FilterCatStringCheckbox(labelid, col_str, session=session) .finish_init(df,
This tradeoff was made to avoid having the user pass in a partial (aka currying) or lambdas. Contrast the original override
dictionary with either of the ones below, and you can see how sacrificing the object oriented dogma may be worth it to make the tool easier for users.
from functools import partial
= {
override # using a partial
"time": partial(shiny_adaptive_filter.FilterCatStringCheckbox, label="Time of Day"),
# using a lambda
"size": lambda data, id, colname, session: shiny_adaptive_filter.FilterCatNumericCheckbox(data, id, colname, label="Time of Day", session=session),
}
I like to remind my student students that just because the code works without error, doesn’t mean it’s correct, and just because it’s correct, doesn’t mean you can’t improve it.
I wanted a Shiny app to have filter behaviors that did not exist and set off creating a custom implementation to be able to share with others. This lead me to create custom filtering component behaviors, refactoring them into Shiny modules, and creating a python package to share with others. Along the way I got help from Joe Cheng, CTO at Posit, PBC and one of the main authors of Shiny, who taught me how to take my original proof of concept code, and make it respectable from a software engineering point of view. Can the codebase be improved? Absolutely. But I hope the tips in this post can help level up your software engineering skills as a data scientist.
Here’s a minimal example of the adaptive filters at work. Set the day
checkbox to Fri
and see how the other filters “adapt” to the results. If you want to give it a try yourself, you can install the filters from PyPI.
pip install shiny_adaptive_filter
#| standalone: true
#| components: [editor, viewer]
#| viewerHeight: 700
import pandas as pd
from shiny import App, render, reactive, ui
import shiny_adaptive_filter as af
app_ui = ui.page_sidebar(
ui.sidebar(
af.filter_ui("adaptive"),
),
ui.output_data_frame("render_df"),
)
def server(input, output, session):
@reactive.calc
def data_filtered():
df = tips.loc[filter_idx()]
return df
@render.data_frame
def render_df():
return render.DataGrid(data_filtered())
override = {
"total_bill": None,
"tip": None,
}
filter_return = af.filter_server(
"adaptive",
df=tips,
override=override,
)
filter_idx = filter_return["filter_idx"]
data = {
'total_bill': [16.99, 10.34, 21.01, 23.68, 24.59],
'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
'sex': ['Female', 'Male', 'Male', 'Male', 'Female'],
'smoker': ['No', 'No', 'No', 'No', 'Yes'],
'day': ['Sun', 'Sun', 'Sun', 'Fri', 'Sun'],
'time': ['Lunch', 'Dinner', 'Dinner', 'Dinner', 'Dinner'],
'size': [2, 3, 3, 2, 4]
}
tips = pd.DataFrame(data)
app = App(app_ui, server)
## file: requirements.txt
shiny_adaptive_filter