Building a reporting infrastructure with Quarto

Today’s post will guide you through creating an efficient reporting infrastructure with Quarto.
2023-01-27
On the left, text that says Python made easy with Posit. On the right, a minimalistic cartoon of a snake.

In a previous blog post, we discussed why Quarto was the ideal tool for report automation. Today’s post will guide you through creating an efficient reporting infrastructure based on Quarto. The end goal is a reporting infrastructure that is automated, comprehensive, and easy to maintain, reducing the time spent creating reports and answering questions about published reports.

 

Here are some tips for creating an efficient reporting infrastructure:

 

Maintain a good file structure

 

The ideal file structure for an organization’s network drive folders is highly subjective. No matter how you organize files, it is best practice to organize things consistently, allowing folders and their contents to be referencable in code and human-readable. Here are a few tips for a sound file structure:

  • Contain everything related to a single process or report in the same directory
  • Name files well. This guide has excellent advice for creating names that are sortable and readable by humans and code.
  • Use a template sub-folder structure for all directories.
  • Store different categories of content separately. Raw data, production code, validation files, output, etc., should all have their own folders within a directory.
    • Never modify raw data files. Instead, consider storing raw data in a parquet file, which functions similarly to a database and enables the storing of historical data as well as querying.
    • Always store SQL code in .sql files and never in scripts. Read the .sql file into R or Python for execution in automated processes.
    • If using the same functions across multiple processes, create a package for your code. Document and version control the package – perhaps with Posit Package Manager.

 

Use file templates

 

Build your reference document in Word or PowerPoint and read it into Quarto as a reference file. While Quarto can be used to build the report from start to finish, it is likely quicker to create a reference file with all the custom formatting and images needed. From there, content can be added to the appropriate slides and rendered by Quarto.

 

To create a file template, run the following command line code for PowerPoint and Word:

 

PowerPoint:

quarto pandoc -o template.pptx --print-default-data-file reference.pptx

 

Word:

quarto pandoc -o custom-reference-doc.docx \ --print-default-data-file reference.docx

 

Once we create the template file, we can move it to the active Quarto file directory (if not already present) and modify as needed. After modification, add the following to the Quarto file’s YAML header:

 

PowerPoint:

title: "Template Preview"
format:
  pptx:
    reference-doc: template.pptx

 

Word:

format:
  docx:
    reference-doc: custom-reference-doc.docx

 

Use the Quarto file to add content to the relevant slides or pages and output a polished document based on the template’s design. This method allows users to natively build their presentations and reports while using Quarto to populate. Time is saved, and report maintenance is easy in the future while still enabling automation.

 

Parameterize reporting

 

It is often necessary to report a defined set of metrics for different groups, teams, or business units in a professional setting. I’ve often seen multiple reports maintained and generated per grouping in situations like these. Consider a situation where we produce a set of metrics for ten business units. Each business unit is a separate report, meaning we need to generate ten PowerPoints. An organization may have a process where they pull data and paste it into a spreadsheet that contains a tab per business unit. Each tab has a table containing the metrics. If a change to the report is needed, it must be made in all spreadsheet tabs and PowerPoints.

 

In an efficient reporting infrastructure, we should maintain a single file capable of being iterated to create multiple reports. Ten business units would require only one Quarto file. We only need to make changes one time. This is possible thanks to Quarto parameters which enable different report variations to be created from a single Quarto file. Reports can be produced dynamically, whether multiple or a single report is needed.

 

To get started, we make a code chunk with the special comment #| tags: [parameters] and the initial parameter(s) with default values – this should precede all other code chunks. In the example below, business_unit is the variable we will iterate over in our reporting and any associated code.

 

business_unit = 'BU A'

 

Once we define our parameters, we should create our report like any Quarto document: define the layout, add the content, include code to generate tables and charts dynamically, etc. We reference the parameter in places where things specific to the business unit we create, like so:

 

import pandas as pd

# Use parameter to pull data for active business unit
query = 'select * from table where business_units = {}'.format(business_unit)
df = pd.read_sql_query(query)

# Generate metrics for active business unit
metric_one = df['Col A'].div('Col D')
print(metric_one)

 

After creating the Quarto file that generates our final report, we can now add the parameters that will be iterated to make the additional reports. There are a few ways to accomplish this, but my suggested method is to create a Bash/shell script.

 

The code sets the working directory to whatever folder the script is saved in and renders the defined Quarto template with each of the defined parameters in the params variable. Each rendered report is saved after the parameter used to generate it. The script below can be copied into a blank text file, altered, and saved. Windows and Linux users may need to change #! /bin/sh to #! /bin/bash.

 

#! /bin/sh
cd "$(dirname "$0")"

params=("BU A" "BU B" "BU C")

for n in "${params[@]}";
do
    quarto render /File Path/file.qmd --output "$n.pdf" -P parameter_name:"$n" 
done

 

With relative ease, we have eliminated the need to maintain multiple reports for the same metrics. We only need to maintain one file in the future. This method can be applied to any report that needs different variations created. One thing to note is --execute may need to be called within the script if we need to execute the code in the Quarto file.

 

Use variables for repeating items

 

Reports often have repeating information like version numbers or emails. Instead of typing the same information multiple times, we can use a dynamic variable. The benefit of dynamic variables is we only need to enter data once, and it can be repeatedly referenced, making maintenance effortless and reducing the likelihood of an error.

 

To use dynamic variables in Quarto, create a file in the working directory titled _quarto.yml. This file can be left blank. Next, add a second YAML file containing your variables titled _variables.yml. The YAML file should look like this:

 

version: 1.0
email:
  info: info@example.com
  support: support@example.com

 

After saving the YAML file, you can reference the variables in Quarto using short code notation like the one below:

 

Version {{< var version >}} is a minor upgrade.
Please contact us at {{< var email.info >}}.
For support contact {{< var email.support >}}

 

Notice how var precedes the variable name. If there is more than one variable, we can reference it by variable_name.sub_variable. For example, if our variable is email and we have a sub-variable called info, then we can reference the info variable by typing email.info.

 

Output to different formats effectively

 

Reports often need to be output to different file formats. You may need a single report to be rendered in both PowerPoint and PDF for separate audiences. Traditionally, situations like this would lead many to maintain multiple processes for the same report. With Quarto, we can maintain a single file and output to many different formats instead.

 

Conditional content

 

Preparing our Quarto document is the first step in outputting to different file formats. There may be some content we want to show or hide depending on the file format rendered. We can do this using the .content-visible and .content-hidden divs in Quarto.

 

Say we wanted only to show an interactive chart. If we render to HTML, we could include the following code chunk in our Quarto file:

 

::: {.content-visible when-format="html"}

Chart will only appear in HTML.

:::

 

The previous code chunk will only show the content if HTML is rendered. But say we wanted to show content for all file formats except for one, then we could add the following to our Quarto file:

 

::: {.content-visible unless-format="pdf"}

Chart will not appear in HTML.

:::

 

Similar to .content-visible seen in the previous code chunks, .content-hidden can also conditionally hide content based on the rendered file format. These divs enable content to be dynamically included as specified by the user, aiding output to different file formats.

 

Efficiently rendering different file formats

 

Within Quarto, the YAML header can be altered to change the output file format. As an example, we can go from an HTML file:

 

---
title: "Guide to Efficient Stakeholder Reporting"
format: html
---

 

to a pdf file by changing format: html to format: pdf.

 

---
title: "Guide to Efficient Stakeholder Reporting"
format: pdf
---

 

As easy as that was, it could have been more efficient. We don’t want to run and alter things manually; we want things to run automatically. We can efficiently automate rendering to different file formats like parameterized reports with Bash/shell scripting. Instead of maintaining multiple files or processes, you can now maintain one. The example script below may need to be modified if the file name needs to be changed.

 

#! /bin/sh
cd "$(dirname "$0")"

params=("html" "pdf")

for n in "${params[@]}";
do
    quarto render /File Path/file.qmd --to "$n" 
done

 

Make reports interactive

 

Reports are excellent at conveying static information to stakeholders but need to excel at answering follow-up questions. Embedding interactive components or adding an accompanying web application to your reporting can reduce follow-up questions and increase stakeholder engagement, resulting in less work and greater efficiency. Allowing stakeholders to interact with a graph to see specific dates, values, etc., can provide the data with much-needed context. Aside from potentially answering questions that may arise, it can lead to better questions that provide better insights and enhance your work’s value.

 

Embedding interactive components

 

Plotly Express, Leaflet, and Three.js can be used in Quarto to create interactive charts that help to contextualize data by providing additional detail over their static counterparts. An HTML file containing either interactive charts or an interactive version of the report can accompany the main report. While it is possible to embed interactive charts within some file formats like PowerPoint, that is outside today’s blog scope.

 

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected" # For use in VS Code

df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", 
                 marginal_y="violin", marginal_x="box", 
                 trendline="ols", template="simple_white")
fig.show()

 

Static image of an interactive chart that shows data from the iris dataset as a scatterplot with a regression line through each variable, box plots, and violin plots. The species are setosa, versicolor, and virginica and the variables are sepal width on the x axis and sepal length on the y axis.

 

Accompanying web applications

 

We can supplement reports with interactive web applications that provide greater context, metric drill-downs, in-depth analysis, and raw datasets. A well-built web application can explain why something happened instead of a static report, which usually tells what happened. Web apps also make data more accessible and engaging for technical and non-technical users by exposing relationships and trends within the data. Building a polished web app capable of responding to user input without being refreshed (reactivity) is easy with Shiny for Python. All Shiny for Python web apps can be deployed for stakeholder consumption using shinyapps.io or Posit Connect; the latter also supports other Python web app frameworks like Dash and Streamlit.

 

Here is a simplified example of how easy it is to create a reactive web app using Shiny for Python. The code block below provides a basic app with a chart and a table whose values change as the dropdown changes. An application like this can be used to dive deeper into an interesting y-o-y change providing insight into what caused the change and answering a potential question a stakeholder would have while viewing the report. You can find fully fledged (and rather impressive) Shiny for Python examples in the Shiny gallery – seriously, take a look. Someone built Wordle using Shiny for Python, and it is fantastic. You’ll also see a Shiny for Python web app running entirely in a web browser via Shinylive.

 

Shiny for Python Code
            from shiny import *
            import pandas as pd
            import numpy as np
            from plotnine import *

            df = pd.read_csv('sales.csv')
            df['Month'] = pd.to_datetime(df.Date).dt.to_period('m')

            app_ui = ui.page_fluid(
                ui.tags.style("#container {display: flex; flex-direction: column; align-items: center;}"),

                ui.tags.div(
                    ui.h2("Simple Demo"),
                    ui.row(ui.input_select(id="bu", label="Choose Business Units:",choices=['A', 'B', 'C'])),
                    ui.row(ui.column(5, ui.output_plot(id='rating', height='400px', width='600px'))),
                    ui.output_table('tbl1'),
                id="container")
            )

            def server(input: Inputs, output: Outputs, session: Session):
                @reactive.Calc
                def data():
                    dfn = df[df['Branch'] == input.bu()]
                    return dfn
                
                @output
                @render.plot
                def rating():
                    agg = data().groupby(['City', 'Gender', 'Payment']).agg({'Unit price':np.mean, 'Quantity':'sum', 'Total':'sum', 'Rating':np.mean }).reset_index()

                    return (ggplot(agg, aes(x='Gender', y= 'Unit price', fill = 'Payment')) 
                        + geom_col()
                        + scale_fill_manual(["#447099", "#EE6331", "#419599", "#72994E", "#9A4665"])
                        + theme_bw()
                        + labs(title = 'Unit Price by Gender and Payment Type', x='')
                        + theme(legend_title=element_blank(), legend_position='top')
                    )

                @output
                @render.table
                def tbl1():
                    t1 = data().groupby(['Month', 'Gender','Customer type', 'Payment']).agg({'Total':'sum', 'Quantity':'sum', 'Unit price':np.mean, 'Rating':np.mean,}).reset_index()

                    return t1 (
                                t1.style.set_table_attributes('class="dataframe shiny-table table w-auto"')
                                .hide(axis="index")
                                .format({"Total":"{0:0.1f}", "Unit price": "{0:0.1f}","Quantity": "{0:0.1f}", "Rating":"{0:0.1f}"})
                                .set_table_styles([dict(selector="th", props=[("text-align", "right")])])
                            )

            app = App(app_ui, server)

 

Build data validation into your process

 

Data validation and audit documentation can be built into your Quarto file using conditional content and file format rendering where it makes sense. If necessary, use a separate Quarto file for reconciliation and audit documentation executed following your reporting. Bundling reporting and reconciliation into a single and automated step can dramatically reduce the work required for audit. This is a great way to ensure documentation is not skipped, and materials are always current.

 

Use Quarto Projects

 

For more sizeable reporting needs where multiple Quarto documents are needed, it may make sense to create a Quarto project, which enables a few things:

  • Render all or some files in a directory with a single command (e.g. quarto render myproject)
  • Share a single YAML configuration across multiple documents – Redirect output artifacts to another directory
  • Freeze rendered output unless documents have changed

A project can be created by running the following in the terminal:

 

quarto create-project myproject

 

Once the project has been created, the _quarto.yml file found in the project directory can be edited. Other YAML files can be added to control several behaviors for the Quarto files within the directory. One of the best features of a Quarto project is the ability to render all documents within the project. If a project directory has ten Quarto files that need to be rendered, a terminal call can be used:

 

quarto render myproject --to pdf

 

Projects are a great way to minimize the management of larger reporting projects. Everything is sourced from a single directory, all options are stored in a single file, and a single call can render all files. This is peak efficiency!

 

Become knowledgeable in Quarto functionality

 

Quarto has many features to be aware of. Becoming knowledgeable about what can be done in Quarto ensures future reporting is always done efficiently and with ease. Here are a few useful features you may not have known about:

Spending time exploring the Quarto website is a worthwhile exercise to see everything Quarto is capable of and how to integrate it into your reporting infrastructure.

 

Publish and schedule reports using Posit Connect

 

Posit Connect makes automating code-based tasks and distributing insights incredibly easy. While a variety of data products can be published to Posit Connect, including web apps, APIs, reports, and models, it is an excellent tool for report automation.

 

Static (locally rendered output) and dynamic (updated with new data automatically) Quarto files can be published to Posit Connect in various ways: the command line, Git-backed content, and GitHub Actions. Once published, Quarto files can be scheduled for refresh and distribution to stakeholders through either email or the Posit Connect hub.

 

For example, build a monthly report, use Posit Connect to refresh it each month, and send it to an email distribution list without user involvement. You can’t get more efficient than that!

 

To learn more about Posit Connect and how it can improve your reporting infrastructure, schedule a demo.

 

Tags: Python