Tips for Implementing Custom Features in Shiny

Earlier this year, I wanted to create a data explorer app using Shiny for Python. You see this kind of application a lot on government data portals. There are data visualizations, components to filter the data, the raw data itself, and be able to download the final data set so you can use it locally.

Shiny did not have the type of filtering behavior I wanted, so I needed to build my own. I got the opportunity to work with Joe Cheng at Posit to learn about building software and Shiny modules. The shiny_adaptive_filter is available as a solution to the filtering problem you can install and try in other contexts. Skip to the end for an example and install instructions.

Here’s a few tips I learned through the process to level-up your Shiny code to be more maintainable and reusable.

Background

One part that always bothered me when using components to interactively filter a data set was that, nothing prompted me about which filter options were valid as I was interacting with the application. I would frequently end up choosing a combination of filters that would return an empty dataframe. Clicking around and not knowing you would end up with an empty dataframe was a bit jarring. A big chunk of my interface would just disappear.

The current set of Shiny filters and the way most Shiny apps are written have all the filtering components and each other to filter the data. This is why it will return an empty dataframe.

What I wanted were filters that were aware of what the other filters were doing, and update their own values that adapts to the other filters. There was no way to do this in Shiny for Python out-of-the box, so I had to go and implement the feature on my own, and hopefully be able to share it with everyone else.

Tip 1: Keep Variable Types Consistent

As a data scientist, we typically write code in some kind of pipeline. So, we end up making sequential modifications to a variable. My advice, do not reuse the same variable names, especially if you are going to be changing the variable’s type throughout the code. It adds to cognitive load when trying to understand the implementation and also makes it a bit harder to reason about during maintenance.

Example

Here’s an actual example of the code I wrote that illustrates this point. I want to take the intersection of data frame index values so I can get a final index of values I can use to subset my data frame that represents the choices the user selected in my filter components.

@reactive.calc
def filter_idx():
    df = df_tips() # <<
    idx = set(df.index) # <<

    if input.filter_day():
        current_idx = df.loc[df["day"].isin(input.filter_day())].index
        idx = idx.intersection(set(current_idx)) # <<

    if input.filter_time():
        current_idx = df.loc[df["time"].isin(input.filter_time())].index
        idx = idx.intersection(set(current_idx)) # <<

    return list(idx) # list because .loc[] would return TypeError# <<

If you just plan to use this function (technically a @reactive.calc), you would only care that it returns something that you can use to subset a data frame with. That’s the benefit of abstraction, but as someone who will maintain the code base, and in my case pair-programming, it makes the implementation extremely hard to follow.

Turns out pandas Index objects, have an .intersection() method. So, no reason to convert to a set, and we can implement everything we need with the same data type.

@reactive.calc
def filter_idx():
    df = df_tips() # <<
    idx = df.index # <<

    if input.filter_day():
        current_idx = df.loc[df["day"].isin(input.filter_day())].index
        idx = idx.intersection(current_idx) # <<

    if input.filter_time():
        current_idx = df.loc[df["time"].isin(input.filter_time())].index
        idx = idx.intersection(current_idx) # <<

    return idx # <<

Much better! No more getting values of one type, and doing inline type conversions to make a calculation and returning an entirely different type. This makes the implementation much easier to reason with, and if you’re skimming the code to track a bug, you won’t miss the set() and list() calls.

Tip 2: Use Python Type Hints

Python Type hints are a way where you can quickly see what datatype you are working with. Especially when your code base is a bit larger and you need to know what the inputs and outputs of a function are.

You won’t need to rely on duck typing and “hope for the best”. Your intentions are clear. The problem I encountered from the previous issue, can be mitigated by writing the code better, but adding type hints can help with larger code bases when you are working with different types.

Python is a dynamically typed language, variables do not need explicit type declarations. This is where Python’s “duck typing” comes from, and is what makes python flexible as a scripting language, but can make it more difficult to understand and maintain in larger projects.

Python type hints were introduced in PEP 484 and implemented in Python 3.5. They help address duck typing ambiguity with type annotations. Type hints specify the expected types for variables, function arguments, and return values. Type hints don’t enforce types at runtime (you can even put in incorrect types). They can serve as documentation, improve code readability, and enable tools to catch type-related errors.

Pyright

Pyright and Mypy are two of the more popular static type checkers for Python. The Shiny team uses Pyright in strict mode in their code base. Sometimes you need to go out of your way to make the type checker happy. But, the benefit is you get complete type safety and will see warnings as you work. This makes your code much easier to maintain, bring on new people, and reason with as your are working on different parts of the codebase.

Examples

Warns You About Variable Redeclaration

If you end up completely redefining the type of a variable with the same name (not exactly the same situation from Tip 1), you will get a reportRedeclaration message from the type checker.

from __future__ import annotations

import pandas as pd

idx_int: pd.Index[Any] = pd.Index([1, 2, 3]) # reportRedeclaration here
idx_int: pd.Index[int] = pd.Index([1, 2, 3])

You may need the from __future__ import annotations at the top of your python file. This allows type hints to be stored as strings rather than immediately being evaluated. Without it, you may get a TypeError, in this specific example, you would can get a TypeError: type 'Index' is not subscriptable message.

Turn Off Specific Warnings

Sometimes things are out of your control, and you will manually need to turn off a warning. One example comes from pandas, where I used their functions to determine the dtype of a Series.

from pandas.api.types import is_numeric_dtype, is_string_dtype

Pyright reports a reportUnknownVariableType error, and there is not much I can do to fix this error without making a change to the main pandas library, or pandas-stubs.

Cast

The cast() function is used when you know (or assume) that a variable is of a specific type, even if Python or a static type checker might not infer it directly. You do need to be careful when using cast(), since you are intensionally making an assumption about the variable type, and if you get it wrong, it would be difficult to track down the bug.

Example

One example that was used in the adaptive_filter codebase was telling Pyright the type of data stored in the dataframe index. We used a function to return the index of a dataframe to make sure the type checker understands the data type stored.

from typing import Any, cast

def return_index(df: pd.DataFrame) -> "pd.Index[Any]":
    return cast("pd.Index[Any]", df.index)

Now anywhere we would normally call df.index we would now call return_index(df)

TypeVar and Generics

TypeVar and Generics are tools in the python typing system that allows you to create more flexible (i.e., “generic”) type hints where you can put in a placeholder for a types and delay specifying the actual type later on, while still maintaining type safety.

If you are familiar with object oriented programming (OOP), inheriting objects from a common base class, or abstract base class (abc), then TypeVar and Generics are how you will add type hints to your base class.

Example

In our adaptive filter module, we had an abc for a BaseFilter class, One of the methods in the abc returns needs to return values from a filter component, but depending on the data stored in the filter, it may return values as a str or int.

from abc import ABC
from typing import TypeVar, Generic, Iterable

T = TypeVar("T")


class BaseFilter(ABC, Generic[T]):
    ...

    def _get_input_value(self) -> Iterable[T] | None:
        ...

When we go implement each object that inherits from BaseFilter, we can pass in the actual type that T was used as a placeholder.

# class that can deal with categorical variables stored as a string
class FilterCatStringSelect(BaseFilter[str]):
    ...

Tip 3: Know When You Should Use Shiny Modules

Shiny modules are used to follow the DRY (Don’t Repeat Yourself) principle. The same concept of creating functions so you can abstract away and reuse computations, is a similar concept of Shiny modules. The term “module” in Python typically refers to a .py file that contains Python functions that get imported into a file. While “Shiny modules” are typically .py files that get imported into the app, the term “Shiny module” is not synonymous with a regular “module”. “Shiny modules” are specifically used in a Shiny for Python application to encapsulate reactive components in a namespace to avoid namespace clashing of the component id because each component in Shiny must have a unique id.

Writing functions isn’t the only way you can reduce repeated code. for loops are another common way to write code to reuse a common code base for repeated actions. So how do you know when you need to refactor your code into Shiny modules? What “code smells” should you look out for?

Tracking List(s) of Component Values

If you find yourself in any of the following situations, you may want to think about refactoring your code into a Shiny module.

Example

In the adaptive filter module, the initial implementation tracked 3 things: id, column name, and type of variable stored in the column.

filters = ["filter_size", "filter_id", "filter_total_bill"]
cols = ["size", "id", "total_bill"]
col_types = ["cat", "cat", "sliders"]

for fltr, col, col_type in zip(filters, cols, col_types):
    ...

From a maintenance and end user perspective, knowing the column name should be enough to figure out the rest of the parts. As long as your provide a way for the end user to override any default, the code as written forces them to manually track a lot of unnecessary information for their own application.

Complex and Interweaved Behaviors

The previous “code smells” are listed in the Shiny for Python modules documentation, but there are other ways you may want to consider whether or not you need modules.

When you create a module, you specifically create a namespace for all the components inside. Whatever id names and calculations you need are all in their own namespace.

Example

Using the same code example from above, we are manually tracking 3 parts for each component.

filters = ["filter_size", "filter_id", "filter_total_bill"]
cols = ["size", "id", "total_bill"]
col_types = ["cat", "cat", "sliders"]

If we are giving just the column name, cols, we can automatically create the id by prepending the filter_ string. We can run into a risk that if this component was just a function, it will clash with an existing id by the end user, e.g., what if they already have a filter_size component id for something else? If you think the answer is to add more underscores _ to the id name, and create something like _filter__size, then you really need to encapsulate the function into a module.

By default, we don’t really need the end user to provide anything, just the dataframe is enough to get the cols value, from there we can generate the filters list, and we can write out own function that calculates a col_types. We talk more about user overrides in Tip 5, for now let’s assume we only have 3 columns in our entire data set. All 3 of those calculations can be done in separate @reactive calls. A module will be able to abstract away all these calculations outside the main app.py, and make the main application easier to maintain.

Finally, if we wanted to add another adaptive filter component into the application, we need to track the information in at least 3 places:

# in the server function of the application
current_ids = df.loc[df[col].isin(filter_value)].index

# in one of the helper functions
if is_string_dtype(col):
    return "cat_str"

# in the server and/or ui of the application
ui.output_ui("table_size_filter"),

@render.ui
def table_id_filter():
    return ui.input_selectize(
        "filter_id",
        "id filter:",
        sorted(df_tips()["id"].unique().tolist()),
        multiple=True,
        remove_button=True,
        options={"plugins": ["clear_button"]},
    )

Just needing to create a new component, or new component type requires the user to change the code many locations in the application. Forgetting to change any one of the locations is a common mistake, and can be easily forgotten. As the application grows, the places where the codebase needs to be updated to incorporate new features will be farther apart, i.e., more lines of code between needed changes. Creating a module can keep the coupling of code closer together, so making changes or extensions is easier.

Tip 4: Make Your Shiny App Testable

Testing is always a good idea. When working with Shiny you want to split up functions that require Shiny end-to-end and behavior testing, with your main logic.

Not everything needs to be in the server function, and not everything needs to be inside a reactive. You can still call regular Python functions, so when possible, write regular functions and call them in a reactive. If you have written unit tests before or used the assert statement, then you can still write tests for your Shiny application.

Testing Functions with pytest

If you are able to refactor your code into individual non-reactive functions, you can leverage the larger unit testing infrastructure Python provides, e.g., pytest This is a great general Shiny tip, where you can and should be able to create helper functions completely outside of any @reactive context, and then call the function inside a @reactive.

Example

Here is a helper function that was used in our adaptive filters module, It takes in a list of pandas Index objects, and finds the .intersection() across all the objects. This calculation is used many times throughout the application. It also makes a few data checks beforehand (expressed by the ... in the code below).

def index_intersection_all(
    to_intersect: List["pd.Index[Any] | None"],
    default: "pd.Index[Any]",
) -> "pd.Index[Any]":

    ...

    intersection = intersect[0]
    for index in intersect:
        intersection = intersection.intersection(index)

    return intersection

import pandas as pd

idx1 = pd.Index([1, 2, 3, 4, 5])
idx2 = pd.Index([2, 3, 4, 5, 6])
idx3 = pd.Index([3, 4, 5, 6, 7])
default = pd.Index([1, 2, 3, 4, 5, 6, 7])

def test_index_intersection_all():
    to_intersect = [idx1, idx2, idx3]
    expected = pd.Index([3, 4, 5])
    calculated = index_intersection_all(
        to_intersect,
        default=default,
    )
    assert (calculated == expected).all()
    assert calculated.equals(expected)

You can then leverage all the benefits and tools from pytest in testing your Shiny application and/or Shiny module, including test fixtures.

Playwright

Testing the reactivity and end-to-end behavior in Shiny for Python is limited to Playwright. The Shiny for Python documentation has an article on End-to-end testing for Shiny. You can run your end-to-test testing with pytest and playwright with

You are a bit limited to the capabilities of Playwright, but Shiny does have a few wrappers for playwright that makes it easier for you to test your application.

First, Shiny provides you a controller object. This provides you a more convenient way of finding input or output components by the id that was used in the application.

We then create our test app and test function. This will run the Shiny application in a single browser tab, and then go to the app URL.

from shiny.run import ShinyAppProc
from playwright.sync_api import Page
from shiny.pytest import create_app_fixture

app = create_app_fixture("app.py")

def test_basic_app(page: Page, app: ShinyAppProc):
    page.goto(app.url)
    ...

From there we can use various .set(), .expect_*(), methods from the controller components to modify the application and test the results.

You do have the option to set different browsers and also headlessly (default) or headed (by passing --headed). Changing browsers and the “headedness” can help see if your issue is specific to the application or with the browser.

pytest . --browser chromium --headed # chromium also headed
pytest . --browser firefox # firefox
pytest . --browser webkit # webkit/safari

Example

from shiny.playwright import controller
from shiny.run import ShinyAppProc
from playwright.sync_api import Page
from shiny.pytest import create_app_fixture

app = create_app_fixture("app.py")


def test_basic_app(page: Page, app: ShinyAppProc):
    page.goto(app.url)

    selectize_day = controller.InputSelectize(page, "adaptive-filter_day")
    selectize_day.set("Fri")
    selectize_day.expect_selected(["Fri"])

    selectize_time = controller.InputSelectize(page, "adaptive-filter_time")
    selectize_time.expect_choices(["Dinner"])

Since the adaptive filters are created within a Shiny module, we have to be mindful of the id given for the module’s namespace. The actual component will be the given id followed by a dash, -, then whatever id was used for the component inside the Shiny module.

filter_return = adaptive_filter_module.filter_server("adaptive", df=tips)

the component id of the application for the day column filter would be adaptive-filter_day because inside the module, our filter names use a filter_colname format.

From there, we can test clicking on a day of the week using an adaptive filter, and checking to see if another filter’s values have changed to the selection.

Tip 5: Think of the Developer Experience

People will only use your tools if you make it easy for them to use. This is a bit of an art, and every situation is going to be different.

If this is code just for yourself, and only you will maintain a codebase in the future, then the user interface does not need to be as seamless. If you are going to put this codebase into the hands of other people, and you are trying to get people to adopt your tool, then you do not want to put any more hurdles in their way. You will also need to think about the skill of the average person who may use your tool.

Shiny is a tool for data scientists, and because of data science’s popularity in the last decade, the training for a data scientist is not from software engineering and computer science.

But when a tradeoff between convenience for the user, the future developer who will extend the library, yourself in the future, or the object oriented dogma. Sometimes it might be okay to sacrifice the dogma to make everything else convenient.

Example

The filters will try its best to use simple heuristics to automatically return the correct filter type based on the contents of the column.

We ended up writing the code, so the user can provide customizations in one of 3 ways:

override = {
    "total_bill": None,
    "day": "DAY!",
    "time": shiny_adaptive_filter.FilterCatStringSelect(),
    "size": shiny_adaptive_filter.FilterCatNumericCheckbox(label="Party Size"),
}

All of the filters are documented in the module for the app author to look up, and provides an easy interface for them to use: the dictionary keys are the columns of the data set, and the values are any manual overrides the developer wants in their application.

# in the ui
shiny_adaptive_filter.filter_ui("adaptive")
...

# in the server
adaptive_filters = shiny_adaptive_filter.filter_server(
    "adaptive", df=data, override=override
)

# a reactive value that can be used anywhere else in the app
adaptive_filters_idx = adaptive_filters["filter_idx"]

The implementation we used in adaptive filters uses an old C trick of having a finish_init() method that is run after the developer passes in the inputs for the constructor.


class BaseFilter(ABC, Generic[T]):
    def __init__(self, *, label: str | None = None):
        ...

    def finish_init(
        self,
        data: Callable[[], pd.DataFrame] | pd.DataFrame,
        id: str,
        column_name: str,
        *,
        session: Session | None = None,
    ):
        ...

        return self

This is so the user only needs to pass in the type of filter they want to override, or the component label that will be displayed in the application. This decision was made so it balances user convenience and developer convenience, but sacrifices on one of the dogmas of object oriented programming, where a valid object gets created, but still cannot be used until another method gets called. Anytime the filter constructor gets called with the user inputs, we must call the .finish_init() method to have a use able filter component object.

shiny_adaptive_filter.FilterCatStringCheckbox(label=label)\
    .finish_init(df, id, col_str, session=session)

This tradeoff was made to avoid having the user pass in a partial (aka currying) or lambdas. Contrast the original override dictionary with either of the ones below, and you can see how sacrificing the object oriented dogma may be worth it to make the tool easier for users.

from functools import partial

override = {
    # using a partial
    "time": partial(shiny_adaptive_filter.FilterCatStringCheckbox, label="Time of Day"),

    # using a lambda
    "size": lambda data, id, colname, session: shiny_adaptive_filter.FilterCatNumericCheckbox(data, id, colname, label="Time of Day", session=session),
}

Conclusion

I like to remind my student students that just because the code works without error, doesn’t mean it’s correct, and just because it’s correct, doesn’t mean you can’t improve it.

I wanted a Shiny app to have filter behaviors that did not exist and set off creating a custom implementation to be able to share with others. This lead me to create custom filtering component behaviors, refactoring them into Shiny modules, and creating a python package to share with others. Along the way I got help from Joe Cheng, CTO at Posit, PBC and one of the main authors of Shiny, who taught me how to take my original proof of concept code, and make it respectable from a software engineering point of view. Can the codebase be improved? Absolutely. But I hope the tips in this post can help level up your software engineering skills as a data scientist.

Here’s a minimal example of the adaptive filters at work. Set the day checkbox to Fri and see how the other filters “adapt” to the results. If you want to give it a try yourself, you can install the filters from PyPI.

#| standalone: true
#| components: [editor, viewer]
#| viewerHeight: 700

import pandas as pd
from shiny import App, render, reactive, ui

import shiny_adaptive_filter as af

app_ui = ui.page_sidebar(
    ui.sidebar(
        af.filter_ui("adaptive"),
    ),
    ui.output_data_frame("render_df"),
)

def server(input, output, session):
    @reactive.calc
    def data_filtered():
        df = tips.loc[filter_idx()]
        return df

    @render.data_frame
    def render_df():
        return render.DataGrid(data_filtered())

    override = {
    "total_bill": None,
    "tip": None,
    }

    filter_return = af.filter_server(
        "adaptive",
        df=tips,
        override=override,
    )
    filter_idx = filter_return["filter_idx"]

data = {
    'total_bill': [16.99, 10.34, 21.01, 23.68, 24.59],
    'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
    'sex': ['Female', 'Male', 'Male', 'Male', 'Female'],
    'smoker': ['No', 'No', 'No', 'No', 'Yes'],
    'day': ['Sun', 'Sun', 'Sun', 'Fri', 'Sun'],
    'time': ['Lunch', 'Dinner', 'Dinner', 'Dinner', 'Dinner'],
    'size': [2, 3, 3, 2, 4]
}

tips = pd.DataFrame(data)

app = App(app_ui, server)


## file: requirements.txt
shiny_adaptive_filter