Color coding your data in {gt} 0.9.0
There are many improvements in the new 0.9.0 release of gt! In fact, there is so much that is new that we couldn’t fit it all in a single blog post. This blog post (number three in a larger series on gt 0.9.0) focuses on the improvements to data_color(), a function that lets you perform data cell colorization.
A basic example on how to use data_color()
Let’s introduce the data_color() function with a simple example. For the sake of simplicity, let’s use gt’s exibble dataset for this:
exibble |>
gt() |>
data_color()| num | char | fctr | date | time | datetime | currency | row | group |
|---|---|---|---|---|---|---|---|---|
| 1.111e-01 | apricot | one | 2015-01-15 | 13:35 | 2018-01-01 02:22 | 49.950 | row_1 | grp_a |
| 2.222e+00 | banana | two | 2015-02-15 | 14:40 | 2018-02-02 14:33 | 17.950 | row_2 | grp_a |
| 3.333e+01 | coconut | three | 2015-03-15 | 15:45 | 2018-03-03 03:44 | 1.390 | row_3 | grp_a |
| 4.444e+02 | durian | four | 2015-04-15 | 16:50 | 2018-04-04 15:55 | 65100.000 | row_4 | grp_a |
| 5.550e+03 | NA | five | 2015-05-15 | 17:55 | 2018-05-05 04:00 | 1325.810 | row_5 | grp_b |
| NA | fig | six | 2015-06-15 | NA | 2018-06-06 16:11 | 13.255 | row_6 | grp_b |
| 7.770e+05 | grapefruit | seven | NA | 19:10 | 2018-07-07 05:22 | NA | row_7 | grp_b |
| 8.880e+06 | honeydew | eight | 2015-08-15 | 20:20 | NA | 0.440 | row_8 | grp_b |
What’s happened is that data_color() applies background colors to all cells of every column with the default palette in R (internally accessed through the grDevices::palette() function). The default method for applying color is "auto", and this is through the new method argument. With method = "auto", gt will decide on a column-by-column basis which colorization method to use. For numeric values, the method will be "numeric"; for character or factor values, the "factor" method is chosen. (We’ll get more into the various color computation methods a bit later in the post.)
An interesting thing about data_color() in gt 0.9.0 is that it works without having to supply any argument values! Previously, you needed to provide something for columns and a color-mapping function to colors. This made the function very difficult to use without first looking at a working example. We think that the new interface that prioritizes choosing a method will be better for most users (and you can still use a color-mapping function with the new fn argument).
Choosing a palette
Virtually nobody will want to rely on the default palette, so let’s take a look at some of the color-specification possibilities available in the new palette argument. It can take any of the following types of inputs:
- a vector of color names
- the name of an RColorBrewer palette
- the name of a viridis palette (e.g.,
"viridis","magma", etc.) - a discrete palette accessible from the paletteer package using the
<package>::<palette> syntax
Let’s try each of these with four separate calls of data_color() on a simple table:
dplyr::tibble(red_green = 1:10, brewer = 1:10, viridis = 1:10, zissou = 1:10) |>
gt() |>
data_color(columns = red_green, palette = c("red", "green")) |>
data_color(columns = brewer, palette = "Oranges") |>
data_color(columns = viridis, palette = "viridis") |>
data_color(columns = zissou, palette = "wesanderson::Zissou1") |>
cols_width(everything() ~ px(100))| red_green | brewer | viridis | zissou |
|---|---|---|---|
| 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 |
| 3 | 3 | 3 | 3 |
| 4 | 4 | 4 | 4 |
| 5 | 5 | 5 | 5 |
| 6 | 6 | 6 | 6 |
| 7 | 7 | 7 | 7 |
| 8 | 8 | 8 | 8 |
| 9 | 9 | 9 | 9 |
| 10 | 10 | 10 | 10 |
Notice how in the first column (red_green), there is interpolation between "red" (value 1) and "green" (value 10). The palette’s colors will be distributed evenly in the range of data available. This is the default behavior, and the range can be set with the domain argument. We can experiment with that using a new table:
dplyr::tibble(values = 1:10) |>
gt() |>
data_color(
palette = c("red", "green"),
domain = 3:7
)Warning: Some values were outside the color scale and will be treated as NA
| values |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
When constraining the domain like this, any values that are outside of it are treated as NA (we even get a warning about it) and given a gray color reserved for NA values. We can use the na_color argument to provide a custom color if "#808080" isn’t suitable.
dplyr::tibble(values = 1:10) |>
gt() |>
data_color(
palette = c("red", "green"),
domain = 3:7,
na_color = "steelblue"
)Warning: Some values were outside the color scale and will be treated as NA
| values |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
We only should provide a single color to na_color, but it’s worth noting that when providing any sort of color, it can be a color name (R/X11 or CSS 3.0) or a hexadecimal string in the form of "#RRGGBB" or "#RRGGBBAA".
Color mapping methods
The previous uses of data_color() all used the "numeric" method of color mapping. Let’s take a look at the different methods and how you would use them. It’s instructive to use examples, so here’s one that uses all four method types:
dplyr::tibble(
numeric = 1:10,
bin = 1:10,
quantile = 1:10,
factor = vec_fmt_spelled_num(c(1:5, 1:5))
) |>
gt() |>
data_color(
columns = numeric,
method = "numeric",
palette = "viridis"
) |>
data_color(
columns = bin,
method = "bin",
palette = "viridis",
bins = c(1, 5, 7, 10)
) |>
data_color(
columns = quantile,
method = "quantile",
palette = "viridis",
quantiles = 5
) |>
data_color(
columns = factor,
method = "factor",
palette = "viridis",
levels = vec_fmt_spelled_num(1:5)
) |>
cols_width(everything() ~ px(100))| numeric | bin | quantile | factor |
|---|---|---|---|
| 1 | 1 | 1 | one |
| 2 | 2 | 2 | two |
| 3 | 3 | 3 | three |
| 4 | 4 | 4 | four |
| 5 | 5 | 5 | five |
| 6 | 6 | 6 | one |
| 7 | 7 | 7 | two |
| 8 | 8 | 8 | three |
| 9 | 9 | 9 | four |
| 10 | 10 | 10 | five |
The first three columns use numbers from 1 to 10, and the different methods ("numeric", "bin", and "quantile") allow us to easily generate a color-mapping function with a few supporting arguments.
In the first column, using method = "numeric" creates a smooth ramp of colors across the "viridis" palette. The second column has the "bin" method applied, and this allows for the construction of bins in the bins argument. The "quantile" method used in the third column subdivides the values into equally-sized bins, settable through the quantiles argument. Finally, the "factor" method is best used for text-based values, as seen in the fourth column (though any type is valid). Factor levels are, by default, alphabetical, but the supporting levels argument lets you specify them directly.
Before gt 0.9.0, you were required to supply your own color-mapping function. This is still possible with the fn argument. Here’s an example of that using the col_numeric() function from the scales package:
countrypops |>
dplyr::filter(country_name == "Mongolia") |>
dplyr::select(-contains("code")) |>
tail(10) |>
gt() |>
fmt_integer(columns = population) |>
data_color(
columns = population,
fn = scales::col_numeric(
palette = "viridis",
domain = c(2.5E6, 3.4E6)
)
)Warning: Some values were outside the color scale and will be treated as NA
| country_name | year | population |
|---|---|---|
| Mongolia | 2015 | 3,026,864 |
| Mongolia | 2016 | 3,088,856 |
| Mongolia | 2017 | 3,148,917 |
| Mongolia | 2018 | 3,208,189 |
| Mongolia | 2019 | 3,267,673 |
| Mongolia | 2020 | 3,327,204 |
| Mongolia | 2021 | 3,383,741 |
| Mongolia | 2022 | 3,433,748 |
| Mongolia | 2023 | 3,481,145 |
| Mongolia | 2024 | 3,524,788 |
If you’re not familiar with the color-mapping functions available in the scales package, just know that invoking col_numeric() will return a function (which is what the fn argument actually requires) that takes a vector of numeric values and returns color values.
Using scales-based functions in fn can be very useful if you want to make use of the specialized arguments available in the col_*() functions. You could even supply your own custom function for performing more complex colorizing treatments!
Applying color to other columns
The data_color() function now lets you apply colorization indirectly to other columns. That is, you can apply colors to a column different from the one used to generate those specific colors. This can be done with the new target_columns argument. Let’s look at how it’s done with a countrypops-based table example.
countrypops |>
dplyr::filter(country_code_3 %in% c("FRA", "GBR")) |>
dplyr::filter(year %% 10 == 0) |>
dplyr::select(-contains("code")) |>
dplyr::mutate(color = "") |>
gt(groupname_col = "country_name") |>
fmt_integer(columns = population) |>
data_color(
columns = population,
target_columns = color,
method = "numeric",
palette = "viridis",
domain = c(4E7, 7E7)
) |>
cols_width(year ~ px(60), population ~ px(120), color ~ px(10)) |>
tab_options(column_labels.hidden = TRUE) |>
opt_vertical_padding(scale = 0.65)| France | ||
|---|---|---|
| 1960 | 47,412,964 | |
| 1970 | 52,007,169 | |
| 1980 | 55,274,184 | |
| 1990 | 58,261,012 | |
| 2000 | 60,918,661 | |
| 2010 | 65,026,211 | |
| 2020 | 67,601,110 | |
| United Kingdom | ||
| 1960 | 52,400,000 | |
| 1970 | 55,663,250 | |
| 1980 | 56,314,216 | |
| 1990 | 57,247,586 | |
| 2000 | 58,892,514 | |
| 2010 | 62,766,365 | |
| 2020 | 66,744,000 | |
So, the colors are based on the data in the population column, but the colors are actually placed in the color column (which was made intentionally ‘blank’ by setting it entirely with empty strings).
When specifying a single column in columns, we can use as many target_columns values as we want. Let’s make another table where we map the generated colors from the year column to all columns in the table. We’ll use the underrated "inferno" palette (from the "viridis" collection) for this one.
countrypops |>
dplyr::filter(country_code_3 %in% c("FRA", "GBR", "ITA")) |>
dplyr::select(-contains("code")) |>
dplyr::filter(year %% 5 == 0) |>
tidyr::pivot_wider(
names_from = "country_name",
values_from = "population"
) |>
gt() |>
fmt_integer(columns = c(everything(), -year)) |>
data_color(
columns = year,
target_columns = everything(),
palette = "inferno"
) |>
cols_width(
year ~ px(80),
everything() ~ px(160)
) |>
opt_all_caps() |>
opt_horizontal_padding(scale = 3) |>
opt_vertical_padding(scale = 0.75) |>
tab_options(
table_body.hlines.style = "none",
column_labels.border.top.color = "black",
column_labels.border.bottom.color = "black",
table_body.border.bottom.color = "black"
)| year | France | United Kingdom | Italy |
|---|---|---|---|
| 1960 | 47,412,964 | 52,400,000 | 50,199,700 |
| 1965 | 49,877,725 | 54,348,050 | 52,112,350 |
| 1970 | 52,007,169 | 55,663,250 | 53,821,850 |
| 1975 | 54,002,853 | 56,225,800 | 55,441,001 |
| 1980 | 55,274,184 | 56,314,216 | 56,433,883 |
| 1985 | 56,665,619 | 56,550,268 | 56,593,071 |
| 1990 | 58,261,012 | 57,247,586 | 56,719,240 |
| 1995 | 59,541,294 | 58,019,030 | 56,844,303 |
| 2000 | 60,918,661 | 58,892,514 | 56,942,108 |
| 2005 | 63,180,854 | 60,401,206 | 58,166,682 |
| 2010 | 65,026,211 | 62,766,365 | 59,819,407 |
| 2015 | 66,548,272 | 65,088,000 | 60,229,605 |
| 2020 | 67,601,110 | 66,744,000 | 59,438,851 |
Another interesting thing that can be done now in 0.9.0 is the task of indirectly applying color in pairs. To do this, we make sure that the resolved number of columns in columns matches the number of columns in target_columns.
The towny dataset has columns with population values at different census years. It also has an associated set of columns that provide the percent change (as fractional values) across census years. In this next example, we will do the following things:
- perform color mapping on those change values (in
columns) - apply the colors indirectly to the population figures (with
target_columns) - hide the columns used to generate the color mappings (with
cols_hide())
towny |>
dplyr::filter(census_div %in% c("Oxford", "Essex")) |>
dplyr::select(
name, starts_with("population"), ends_with("pct"),
-population_1996
) |>
gt(rowname_col = "name") |>
fmt_integer() |>
data_color(
columns = ends_with("pct"),
target_columns = starts_with("population"),
palette = c("red", "white", "green"),
domain = c(-0.5, 0.5),
na_color = "lightblue"
) |>
cols_hide(columns = ends_with("pct")) |>
cols_label_with(fn = function(x) gsub("population_", "", x)) |>
opt_vertical_padding(scale = 0.6)| 2001 | 2006 | 2011 | 2016 | 2021 | |
|---|---|---|---|---|---|
| Amherstburg | 20,339 | 21,748 | 21,556 | 21,936 | 23,524 |
| Blandford-Blenheim | 7,630 | 7,149 | 7,359 | 7,399 | 7,565 |
| East Zorra-Tavistock | 7,238 | 7,008 | 6,836 | 7,113 | 7,841 |
| Essex | 20,085 | 20,032 | 19,600 | 20,427 | 21,216 |
| Ingersoll | 10,977 | 11,760 | 12,146 | 12,757 | 13,693 |
| Kingsville | 19,619 | 20,908 | 21,362 | 21,552 | 22,119 |
| Lakeshore | 28,746 | 33,245 | 34,546 | 36,611 | 40,410 |
| LaSalle | 25,285 | 27,652 | 28,643 | 30,180 | 32,721 |
| Leamington | 27,138 | 28,833 | 28,403 | 27,595 | 29,680 |
| Norwich | 10,478 | 10,481 | 10,721 | 10,835 | 11,151 |
| Pelee | 256 | 287 | 171 | 235 | 230 |
| South-West Oxford | 7,782 | 7,589 | 7,544 | 7,634 | 7,583 |
| Tecumseh | 25,105 | 24,224 | 23,610 | 23,229 | 23,300 |
| Tillsonburg | 14,052 | 14,822 | 15,301 | 15,872 | 18,615 |
| Windsor | 208,402 | 216,473 | 210,891 | 217,188 | 229,660 |
| Woodstock | 33,061 | 35,822 | 37,754 | 41,098 | 46,705 |
| Zorra | 8,052 | 8,125 | 8,058 | 8,138 | 8,628 |
We used a few more gt functions to clean up the table somewhat, but the bulk of the presentation lies in the use of data_color(). Because this is a fairly complex example, we recommended running the code in a statement-by-statement manner to see how each function call changes the output table.
An important note to make here is that the order of columns in both the columns and target_columns arguments should match the intended mapping order. That is the case in the above example, but other situations might vary (thus, it’s important to keep this in mind).
Row-wise color mapping
Colorization can now occur in a row-wise manner. The key to making that happen is by using direction = "row". Let’s try this out using the sza dataset. After some very necessary dplyr and tidyr work, we’ll put that data into a gt table and apply color to values across each ‘month’ of data in that table. We won’t set a domain value and instead use the bounds of the data in each row.
sza |>
dplyr::filter(latitude == 20 & tst <= "1200") |>
dplyr::select(-latitude) |>
dplyr::filter(!is.na(sza)) |>
tidyr::pivot_wider(
names_from = tst,
values_from = sza,
names_sort = TRUE
) |>
gt(rowname_col = "month") |>
sub_missing(missing_text = "") |>
data_color(
direction = "row",
palette = "PuOr",
na_color = "white"
) |>
tab_options(table.font.size = px(12)) |>
opt_vertical_padding(scale = 0.75)| 0530 | 0600 | 0630 | 0700 | 0730 | 0800 | 0830 | 0900 | 0930 | 1000 | 1030 | 1100 | 1130 | 1200 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| jan | 84.9 | 78.7 | 72.7 | 66.1 | 61.5 | 56.5 | 52.1 | 48.3 | 45.5 | 43.6 | 43.0 | |||
| feb | 88.9 | 82.5 | 75.8 | 69.6 | 63.3 | 57.7 | 52.2 | 47.4 | 43.1 | 40.0 | 37.8 | 37.2 | ||
| mar | 85.7 | 78.8 | 72.0 | 65.2 | 58.6 | 52.3 | 46.2 | 40.5 | 35.5 | 31.4 | 28.6 | 27.7 | ||
| apr | 88.5 | 81.5 | 74.4 | 67.4 | 60.3 | 53.4 | 46.5 | 39.7 | 33.2 | 26.9 | 21.3 | 17.2 | 15.5 | |
| may | 85.0 | 78.2 | 71.2 | 64.3 | 57.2 | 50.2 | 43.2 | 36.1 | 29.1 | 26.1 | 15.2 | 8.8 | 5.0 | |
| jun | 89.2 | 82.7 | 76.0 | 69.3 | 62.5 | 55.7 | 48.8 | 41.9 | 35.0 | 28.1 | 21.1 | 14.2 | 7.3 | 2.0 |
| jul | 88.8 | 82.3 | 75.7 | 69.1 | 62.3 | 55.5 | 48.7 | 41.8 | 35.0 | 28.1 | 21.2 | 14.3 | 7.7 | 3.1 |
| aug | 83.8 | 77.1 | 70.2 | 63.3 | 56.4 | 49.4 | 42.4 | 35.4 | 28.3 | 21.3 | 14.3 | 7.3 | 1.9 | |
| sep | 87.2 | 80.2 | 73.2 | 66.1 | 59.1 | 52.1 | 45.1 | 38.1 | 31.3 | 24.7 | 18.6 | 13.7 | 11.6 | |
| oct | 84.1 | 77.1 | 70.2 | 63.3 | 56.5 | 49.9 | 43.5 | 37.5 | 32.0 | 27.4 | 24.3 | 23.1 | ||
| nov | 87.8 | 81.3 | 74.5 | 68.3 | 61.8 | 56.0 | 50.2 | 45.3 | 40.7 | 37.4 | 35.1 | 34.4 | ||
| dec | 84.3 | 78.0 | 71.8 | 66.1 | 60.5 | 55.6 | 50.9 | 47.2 | 44.2 | 42.4 | 41.8 |
When using direction = "row", we can see that each row has cell coloring that is relative to the range of values in the particular row. This is useful in those situations where you might feel the colorization should be made specific to the row.
One last thing, also to do with rows
The data_color() function now has a rows argument. Before that wasn’t there, and you had no choice but to color each and every row in the columns specified. Of course, sometimes you just want colorization in a specific region of the table. Here’s an example that demonstrates this (and we’re using the new metro dataset):
metro |>
dplyr::select(name, passengers, connect_other) |>
dplyr::arrange(desc(passengers)) |>
head(15) |>
gt(locale = "fr") |>
tab_header(
title = "Les stations de métro les plus fréquentées et
leurs nombre annuel de passagers",
subtitle = "Ceux qui sont à côté des gares sont surlignés en vert"
) |>
fmt_integer() |>
tab_row_group(
label = "a côté d'une gare",
rows = grepl("TGV", connect_other),
id = "gare"
) |>
data_color(
columns = passengers,
rows = grepl("TGV", connect_other),
method = "numeric",
palette = c("lightgreen", "green" |> adjust_luminance(steps = -2))
) |>
cols_hide(columns = connect_other) |>
cols_label(
name ~ "station de métro",
passengers = "passagers"
) |>
cols_width(
name ~ px(375),
passengers ~ px(150)
) |>
tab_style(
style = cell_text(align = "center"),
locations = cells_row_groups(groups = "gare")
) |>
opt_all_caps() |>
opt_align_table_header(align = "left") |>
opt_horizontal_padding(scale = 3) |>
opt_table_font(stack = "rounded-sans")| Les stations de métro les plus fréquentées et leurs nombre annuel de passagers | |
| Ceux qui sont à côté des gares sont surlignés en vert | |
| station de métro | passagers |
|---|---|
| a côté d'une gare | |
| Gare du Nord | 34 503 097 |
| Saint-Lazare | 33 128 384 |
| Gare de Lyon | 28 640 475 |
| Montparnasse—Bienvenüe | 20 407 224 |
| Gare de l'Est | 15 538 471 |
| Bibliothèque François Mitterrand | 11 104 474 |
| République | 11 079 708 |
| Les Halles | 10 623 876 |
| La Défense | 9 256 802 |
| Châtelet | 8 350 794 |
| Bastille | 8 069 243 |
| Belleville | 7 314 438 |
| Hôtel de Ville | 7 251 729 |
| Place d'Italie | 7 119 097 |
| Bobigny—Pablo Picasso | 6 561 327 |
Ce tableau de données là, c’est le fun!
In conclusion
We’ve wanted to improve the data_color() function of gt for a few years now, and we are so glad it is now a thing accomplished in version 0.9.0! The new version of this function is way more powerful than before (and hopefully easier to use too).
This is blog post number three of a series on gt version 0.9.0. There’s more to come, owing to the fact that this release of gt is a big one. We always want your feedback, and there are many different ways to get in touch with us. You can:
- file an issue on GitHub
- engage in discussions through the gt Discussions page, again on GitHub
- follow us on Twitter at @gt_package
- join the new
gt_packageDiscord server
Until next time!