The gt package (the one that helps you make beautiful, publication-quality tables in R) has been updated to version 0.10.0. Now, is this a big release? Sure, of course, it is. But it’s not as big as the 0.9.0 one, which had tons of blog posts just to cover everything that was new. This one post is going to cover, as best as it can, all of the big new features in gt0.10.0. Let’s get this started by looking at nanoplots.
Nanoplots, tiny interactive plots in your gt table
Plots in a table need to be somewhat simple by design. Generally, there isn’t a lot of space to work with! We had these basic design requirements when we started developing the feature known as nanoplots:
compact of marks and labels
basic interactivity
different plot types
customizability
Through some iteration we arrived at something that satisfies all of the design criteria. The new cols_nanoplot() function is the entry point to generating nanoplots in a gt table. Let’s introduce you to it by way of an example (using the new illness dataset). The cols_nanoplot() function can take input data from any number of columns you specify in the columns argument. In the example below, the columns that start with ‘day’ (seven columns in total) each have a single numeric value. The values are taken in the order of column specification in columns and are used in that order in every nanoplot (we’re making the default "line" plot type here). A new column will be generated for the nanoplot, and we’re giving it a specific name ("nanoplots") and also a label (with md("*Progression*"); yes, we can use Markdown here).
library(gt)illness |> dplyr::slice_head(n =10) |>gt(rowname_col ="test") |>tab_header("Partial summary of daily tests performed on YF patient") |>tab_stubhead(label =md("**Test**")) |>cols_hide(columns =c(starts_with("norm"), units)) |>cols_nanoplot(columns =starts_with("day"),new_col_name ="nanoplots",new_col_label =md("*Progression*") ) |>cols_align(align ="center", columns = nanoplots) |>tab_footnote(footnote ="Measurements from Day 3 through to Day 8.",locations =cells_column_labels(columns = nanoplots) )
Partial summary of daily tests performed on YF patient
Test
Progression1
Viral load
WBC
Neutrophils
RBC
Hb
PLT
ALT
AST
TBIL
DBIL
1 Measurements from Day 3 through to Day 8.
If you hover over the data points in a nanoplot, you’ll see values for the data points. They are automatically formatted to be compact (limited space here!), and we also have advanced options to help you control the formatting (customizability!). Also, hovering over the left edge of the nanoplot plot area will show the value range.
That was a pretty simple example that barely scratches the surface of what you can do with nanoplots. Right now, there are three types of nanoplots available: "line", "bar", "boxplot". Here’s an example of the "bar" type of nanoplot, which uses the sza dataset to visualize solar altitude angles.
Average values every half hour from 05:30 to 12:00
jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
This example demonstrates the use of the nanoplot_options() helper function, which is to be invoked at the options argument of cols_nanoplot(). Through that helper, layers of the nanoplots can be selectively removed, the aesthetics of the remaining plot components can be modified, and display values can even be customized. We were able to modify the display of the bar values (on hover) with the y_val_fmt_fn argument; we just had to supply a function to perform that numeric formatting.
Let’s have one more example with nanoplots, this time involving box plots. For that, we use plot_type = "boxplot". We’ll take a slice of the pizzaplace dataset and create a simple table that displays a box plot of pizza sales for a selection of days. If you can get string-based data of the form "2.6,3.6,0,1.5" in a column, that’s valid input for a nanoplot. This is easy to do with dplyr::summarize(), and the preparatory work in the following example does just that.
The other trick was to convert the string-based 24-hour-clock time values (e.g., "11:38:36") to the number of seconds elapsed in a day. Doing so gives us continuous values that can be incorporated into each box plot. And, by supplying a function to the y_val_fmt_fn argument within nanoplot_options(), we can transform the integer seconds values back to clock times for display on hover.
These examples only show part of what’s possible with the feature. We intend to go much further with nanoplots in future releases. If you’d like to see a few more examples, take a look at the docs for cols_nanoplot().
Add columns/rows to your table, even start from an empty table
The nanoplots examples showed us something new in gt: making new columns. This wasn’t possible before but is very possible now. We can add new columns to a table with the cols_add() function, and it works quite a bit like the dplyrmutate() function. You supply name-value pairs where the name is the new column name, and the value part describes the data that will go into the column. The latter can: (1) be a vector where the length of the number of rows in the data table, (2) be a single value (which will be repeated all the way down), or (3) involve other columns in the table (as they represent vectors of the correct length).
The new columns are added to the end of the column series by default but can instead be added internally by using either the .before or .after arguments. If entirely empty (i.e., all NA) columns need to be added, you can use any of the NA types (e.g., NA, NA_character_, NA_real_, etc.) for such columns.
Let’s look at a simple example using a subset of the exibble dataset. We’ll add a single column to the right of all the existing columns and call it country. This new column needs eight values, and these will be supplied when using cols_add().
We can add multiple columns with a single use of cols_add(). The columns generated can be formatted and otherwise manipulated just as any column could be in a gt table. The following example extends the first one by adding more columns and immediately using them in various function calls like fmt_flag() and fmt_scientific().
It is possible to start with an empty table (i.e., no columns and no rows) and add one or more columns to that. The first cols_add() call for an empty table can have columns of arbitrary length, but note that subsequent uses of cols_add() must adhere to the rule of new columns being the same length as existing. Here, we start from nothing and then add two columns of values:
dplyr::tibble() |>gt() |>cols_add(numbers =1:5,spelled =vec_fmt_spelled_num(1:5) ) |>tab_header("Starting from Scratch.")
Starting from Scratch.
numbers
spelled
1
one
2
two
3
three
4
four
5
five
Rows can be added. And we can do that with the new rows_add() function. We supply the new row data through name-value pairs or two-sided formula expressions. The new rows are added to the bottom of the table by default but can be added internally by using either the .before or .after arguments. Let’s have an example of this:
This adds a single row, but you can use vectors having multiple values to add multiple rows with a single use of the function.
Another way to use rows_add() is to start from virtually nothing (really, just the definition of columns) and build up a table using sporadic invocations of rows_add(). This might be useful in interactive or programmatic applications. Here’s an example where two columns are defined with dplyr’s tibble() function (and no rows are present initially); with two calls of rows_add(), two separate rows are added:
Adding columns and rows while in the gt API is actually pretty convenient. While the examples here are limited in showing everything that’s possible, you can find a few more in the docs for cols_add() and rows_add().
Units notation provides a simple way to express measurement units
Something you might see often in tables are measurement units. These are typically found in the column labels of a table, and they let the reader know what units the values below have (this is DRY for display tables). Previously, you could provide simple units, but it wasn’t easy to formulate those that involved more specialized typesetting. We now have a better solution for this in gt with what we call units notation. With this syntax, gt will ensure that any measurement units are formatted correctly no matter what the output type is. We can now format units in the table body with fmt_units(), we can attach units to column labels with cols_units(), and we can use units notation in the already-available cols_label() and tab_spanner() functions.
The units notation involves a shorthand of writing units that feels familiar and is fine-tuned for the task at hand. Each unit is treated as a separate entity (parentheses and other symbols included), and the addition of subscript text and exponents is flexible and relatively easy to formulate. Here are some examples:
"m/s" and "m / s" both render as "m/s"
"m s^-1" will appear with a raised "-1" exponent (i.e., superscripted)
"m /s" gives the the same result, as "/<unit>" is equivalent to "<unit>^-1"
"E_h" will render an "E" with the "h" subscript
"t_i^2.5" provides a t with an "i" subscript and a "2.5" exponent
"m[_0^2]" will use overstriking to set both the subscript and superscript in the same area
"g/L %C6H12O6%" uses a chemical formula (enclosed in a pair of "%" characters) as a unit partial, and the formula will render correctly with subscripted numbers
Common units that are difficult to write using ASCII text may be implicitly converted to the correct characters (e.g., the "u" in "ug", "um", "uL", and "umol" will be converted to the Greek mu symbol; "degC" and "degF" will render as °C and °F.
We can transform shorthand symbol/unit names enclosed in ":" (e.g., ":angstrom:", ":ohm:", etc.) into proper symbols
Greek letters can added by enclosing the letter name in ":"; you can use lowercase letters (e.g., ":beta:", ":sigma:", etc.) and uppercase letters too (e.g., ":Alpha:", ":Zeta:", etc.)
The components of a unit (unit name, subscript, and exponent) can be fully or partially italicized/emboldened by surrounding text with "*" or "**"
The new cols_units() function lets you attach units to column labels, setting off the measurement units from the column label with a comma and a space (and this can be customized with .units_pattern). Here’s an example of that with a table generated from a summarized version of the pizzaplace dataset.
If you should have a column that contains text values already in units notation, that column could be formatted and subsequently rendered nicely by using the new fmt_units() function. It so happens that the illness dataset has a column (units) with values in the correct format. We’ll point fmt_units() toward that column, and that’ll make the rendered measurement units fit for publication.
You can use units notation in cols_label(); this approach lets us express both the label text and the measurement units in a single string. To mark text as that in units notation text, we wrap it with "{{" and "}}". Here’s an example of that using a portion of the towny dataset.
This can similarly be done with tab_spanner(). Simply use a string that has both label text and text in units notation in the label argument. Here is a towny-based example that shows how it’s done:
The notation here provides several conveniences for defining units, and it gives us nicely formatted units no matter what the table output format might be (i.e., HTML, LaTeX, RTF, etc.). Look for the How to use gt’s units notation. section in the documentation for functions that handle it (here is one instance of that in the cols_units() docs).
The from_column() helper function lets you get formatting parameters from adjacent columns
A very useful new helper function, from_column(), has been added so you can fetch values (for compatible arguments) from a column in the input table. For example, if you are using fmt_scientific(), and the number of significant figures should vary across the values to be formatted, a column containing those values for the n_sigfig argument can be referenced by from_column().
The new constants dataset contains data values that are either very small or very large, so scientific formatting is a strong requirement here. The dataset values also greatly differ in the degree of measurement precision. Two separate columns (sf_value and sf_uncert) account for this and contain the exact number of significant figures for each measurement value and the associated uncertainty value. We can use the n_sigfig argument of fmt_scientific() in conjunction with the from_column() helper to get the correct number of significant digits for each value.
We simply couldn’t use a static value for n_sigfig in fmt_scientific() and doing so would result in the presentation of misleading values.
We can use from_column() in tab_style(). Well, inside the stylizing helper functions like cell_text() that are used in tab_style(). Here’s a really nice sp500-based example that shows this in conjunction with cols_add():
Most of the formatting functions (fmt_*()) work with from_column(). To find out which arguments can be used with from_column(), look for the Compatibility of arguments with the from_column() helper function section in the formatting function’s documentation (here is one instance of that in the fmt_scientific() docs).
In closing
There’s so much great new stuff in gt, and we’ll keep working to make things better and easier for you. We are always listening to what you want, and we have a few ways you can reach us. Found something strange in gt? Have a cool idea? Then file an issue! Want to ask a question or discuss improvements before filing an issue? Try out the Discussions page in the gt repository for that.
For news on gt and other table packages (like Great Tables), follow the engaging @gt_package account on X/Twitter! We also have a Discord server which has a more casual atmosphere (and there’s plenty of table talk on there); we’d love to see you there!