Open source packages - Quarto, Shiny, and more Commercial enterprise offerings

Color coding your data in {gt} 0.9.0

Written by Rich Iannone

2023-05-17

Text: "Color coding your data, gt 0.9.0" and the gt hex sticker on the left, a table with cells of different colors on the right.

There are many improvements in the new 0.9.0 release of gt! In fact, there is so much that is new that we couldn’t fit it all in a single blog post. This blog post (number three in a larger series on gt 0.9.0) focuses on the improvements to data_color(), a function that lets you perform data cell colorization.

A basic example on how to use `data_color()`

Let’s introduce the data_color() function with a simple example. For the sake of simplicity, let’s use gt’s exibble dataset for this:

exibble |>
  gt() |>
  data_color()

num	char	fctr	date	time	datetime	currency	row	group
1.111e-01	apricot	one	2015-01-15	13:35	2018-01-01 02:22	49.950	row_1	grp_a
2.222e+00	banana	two	2015-02-15	14:40	2018-02-02 14:33	17.950	row_2	grp_a
3.333e+01	coconut	three	2015-03-15	15:45	2018-03-03 03:44	1.390	row_3	grp_a
4.444e+02	durian	four	2015-04-15	16:50	2018-04-04 15:55	65100.000	row_4	grp_a
5.550e+03	NA	five	2015-05-15	17:55	2018-05-05 04:00	1325.810	row_5	grp_b
NA	fig	six	2015-06-15	NA	2018-06-06 16:11	13.255	row_6	grp_b
7.770e+05	grapefruit	seven	NA	19:10	2018-07-07 05:22	NA	row_7	grp_b
8.880e+06	honeydew	eight	2015-08-15	20:20	NA	0.440	row_8	grp_b

What’s happened is that data_color() applies background colors to all cells of every column with the default palette in R (internally accessed through the grDevices::palette() function). The default method for applying color is "auto", and this is through the new method argument. With method = "auto", gt will decide on a column-by-column basis which colorization method to use. For numeric values, the method will be "numeric"; for character or factor values, the "factor" method is chosen. (We’ll get more into the various color computation methods a bit later in the post.)

An interesting thing about data_color() in gt 0.9.0 is that it works without having to supply any argument values! Previously, you needed to provide something for columns and a color-mapping function to colors. This made the function very difficult to use without first looking at a working example. We think that the new interface that prioritizes choosing a method will be better for most users (and you can still use a color-mapping function with the new fn argument).

Choosing a palette

Virtually nobody will want to rely on the default palette, so let’s take a look at some of the color-specification possibilities available in the new palette argument. It can take any of the following types of inputs:

a vector of color names
the name of an RColorBrewer palette
the name of a viridis palette (e.g., "viridis", "magma", etc.)
a discrete palette accessible from the paletteer package using the ⁠<package>::<palette>⁠ syntax

Let’s try each of these with four separate calls of data_color() on a simple table:

dplyr::tibble(red_green = 1:10, brewer = 1:10, viridis = 1:10, zissou = 1:10) |>
  gt() |>
  data_color(columns = red_green, palette = c("red", "green")) |>
  data_color(columns = brewer, palette = "Oranges") |>
  data_color(columns = viridis, palette = "viridis") |>
  data_color(columns = zissou, palette = "wesanderson::Zissou1") |>
  cols_width(everything() ~ px(100))

red_green	brewer	viridis	zissou
1	1	1	1
2	2	2	2
3	3	3	3
4	4	4	4
5	5	5	5
6	6	6	6
7	7	7	7
8	8	8	8
9	9	9	9
10	10	10	10

Notice how in the first column (red_green), there is interpolation between "red" (value 1) and "green" (value 10). The palette’s colors will be distributed evenly in the range of data available. This is the default behavior, and the range can be set with the domain argument. We can experiment with that using a new table:

dplyr::tibble(values = 1:10) |>
  gt() |>
  data_color(
    palette = c("red", "green"),
    domain = 3:7
  )

Warning: Some values were outside the color scale and will be treated as NA

values
1
2
3
4
5
6
7
8
9
10

When constraining the domain like this, any values that are outside of it are treated as NA (we even get a warning about it) and given a gray color reserved for NA values. We can use the na_color argument to provide a custom color if "#808080" isn’t suitable.

dplyr::tibble(values = 1:10) |>
  gt() |>
  data_color(
    palette = c("red", "green"),
    domain = 3:7,
    na_color = "steelblue"
  )

Warning: Some values were outside the color scale and will be treated as NA

values
1
2
3
4
5
6
7
8
9
10

We only should provide a single color to na_color, but it’s worth noting that when providing any sort of color, it can be a color name (R/X11 or CSS 3.0) or a hexadecimal string in the form of "#RRGGBB" or "#RRGGBBAA".

Color mapping methods

The previous uses of data_color() all used the "numeric" method of color mapping. Let’s take a look at the different methods and how you would use them. It’s instructive to use examples, so here’s one that uses all four method types:

dplyr::tibble(
  numeric = 1:10,
  bin = 1:10,
  quantile = 1:10,
  factor = vec_fmt_spelled_num(c(1:5, 1:5))
) |>
  gt() |>
  data_color(
    columns = numeric,
    method = "numeric",
    palette = "viridis"
  ) |>
  data_color(
    columns = bin,
    method = "bin",
    palette = "viridis",
    bins = c(1, 5, 7, 10)
  ) |>
  data_color(
    columns = quantile,
    method = "quantile",
    palette = "viridis",
    quantiles = 5
  ) |>
  data_color(
    columns = factor,
    method = "factor",
    palette = "viridis",
    levels = vec_fmt_spelled_num(1:5)
  ) |>
  cols_width(everything() ~ px(100))

numeric	bin	quantile	factor
1	1	1	one
2	2	2	two
3	3	3	three
4	4	4	four
5	5	5	five
6	6	6	one
7	7	7	two
8	8	8	three
9	9	9	four
10	10	10	five

The first three columns use numbers from 1 to 10, and the different methods ("numeric", "bin", and "quantile") allow us to easily generate a color-mapping function with a few supporting arguments.

In the first column, using method = "numeric" creates a smooth ramp of colors across the "viridis" palette. The second column has the "bin" method applied, and this allows for the construction of bins in the bins argument. The "quantile" method used in the third column subdivides the values into equally-sized bins, settable through the quantiles argument. Finally, the "factor" method is best used for text-based values, as seen in the fourth column (though any type is valid). Factor levels are, by default, alphabetical, but the supporting levels argument lets you specify them directly.

Before gt 0.9.0, you were required to supply your own color-mapping function. This is still possible with the fn argument. Here’s an example of that using the col_numeric() function from the scales package:

countrypops |>
  dplyr::filter(country_name == "Mongolia") |>
  dplyr::select(-contains("code")) |>
  tail(10) |>
  gt() |>
  fmt_integer(columns = population) |>
  data_color(
    columns = population,
    fn = scales::col_numeric(
      palette = "viridis",
      domain = c(2.5E6, 3.4E6)
    )
  )

Warning: Some values were outside the color scale and will be treated as NA

country_name	year	population
Mongolia	2015	3,026,864
Mongolia	2016	3,088,856
Mongolia	2017	3,148,917
Mongolia	2018	3,208,189
Mongolia	2019	3,267,673
Mongolia	2020	3,327,204
Mongolia	2021	3,383,741
Mongolia	2022	3,433,748
Mongolia	2023	3,481,145
Mongolia	2024	3,524,788

If you’re not familiar with the color-mapping functions available in the scales package, just know that invoking col_numeric() will return a function (which is what the fn argument actually requires) that takes a vector of numeric values and returns color values.

Using scales-based functions in fn can be very useful if you want to make use of the specialized arguments available in the ⁠col_*()⁠ functions. You could even supply your own custom function for performing more complex colorizing treatments!

Applying color to other columns

The data_color() function now lets you apply colorization indirectly to other columns. That is, you can apply colors to a column different from the one used to generate those specific colors. This can be done with the new target_columns argument. Let’s look at how it’s done with a countrypops-based table example.

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR")) |>
  dplyr::filter(year %% 10 == 0) |>
  dplyr::select(-contains("code")) |>
  dplyr::mutate(color = "") |>
  gt(groupname_col = "country_name") |>
  fmt_integer(columns = population) |>
  data_color(
    columns = population,
    target_columns = color,
    method = "numeric",
    palette = "viridis",
    domain = c(4E7, 7E7)
  ) |>
  cols_width(year ~ px(60), population ~ px(120), color ~ px(10)) |>
  tab_options(column_labels.hidden = TRUE) |>
  opt_vertical_padding(scale = 0.65)

France
1960	47,412,964
1970	52,007,169
1980	55,274,184
1990	58,261,012
2000	60,918,661
2010	65,026,211
2020	67,601,110
United Kingdom
1960	52,400,000
1970	55,663,250
1980	56,314,216
1990	57,247,586
2000	58,892,514
2010	62,766,365
2020	66,744,000

So, the colors are based on the data in the population column, but the colors are actually placed in the color column (which was made intentionally ‘blank’ by setting it entirely with empty strings).

When specifying a single column in columns, we can use as many target_columns values as we want. Let’s make another table where we map the generated colors from the year column to all columns in the table. We’ll use the underrated "inferno" palette (from the "viridis" collection) for this one.

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR", "ITA")) |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(year %% 5 == 0) |>
  tidyr::pivot_wider(
    names_from = "country_name",
    values_from = "population"
  ) |>
  gt() |>
  fmt_integer(columns = c(everything(), -year)) |>
  data_color(
    columns = year,
    target_columns = everything(),
    palette = "inferno"
  ) |>
  cols_width(
    year ~ px(80),
    everything() ~ px(160)
  ) |>
  opt_all_caps() |>
  opt_horizontal_padding(scale = 3) |>
  opt_vertical_padding(scale = 0.75) |>
  tab_options(
    table_body.hlines.style = "none",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    table_body.border.bottom.color = "black"
  )

year	France	United Kingdom	Italy
1960	47,412,964	52,400,000	50,199,700
1965	49,877,725	54,348,050	52,112,350
1970	52,007,169	55,663,250	53,821,850
1975	54,002,853	56,225,800	55,441,001
1980	55,274,184	56,314,216	56,433,883
1985	56,665,619	56,550,268	56,593,071
1990	58,261,012	57,247,586	56,719,240
1995	59,541,294	58,019,030	56,844,303
2000	60,918,661	58,892,514	56,942,108
2005	63,180,854	60,401,206	58,166,682
2010	65,026,211	62,766,365	59,819,407
2015	66,548,272	65,088,000	60,229,605
2020	67,601,110	66,744,000	59,438,851

Another interesting thing that can be done now in 0.9.0 is the task of indirectly applying color in pairs. To do this, we make sure that the resolved number of columns in columns matches the number of columns in target_columns.

The towny dataset has columns with population values at different census years. It also has an associated set of columns that provide the percent change (as fractional values) across census years. In this next example, we will do the following things:

perform color mapping on those change values (in columns)
apply the colors indirectly to the population figures (with target_columns)
hide the columns used to generate the color mappings (with cols_hide())

towny |>
  dplyr::filter(census_div %in% c("Oxford", "Essex")) |>
  dplyr::select(
    name, starts_with("population"), ends_with("pct"),
    -population_1996
  ) |>
  gt(rowname_col = "name") |>
  fmt_integer() |>
  data_color(
    columns = ends_with("pct"),
    target_columns = starts_with("population"),
    palette = c("red", "white", "green"),
    domain = c(-0.5, 0.5),
    na_color = "lightblue"
  ) |>
  cols_hide(columns = ends_with("pct")) |>
  cols_label_with(fn = function(x) gsub("population_", "", x)) |>
  opt_vertical_padding(scale = 0.6)

	2001	2006	2011	2016	2021
Amherstburg	20,339	21,748	21,556	21,936	23,524
Blandford-Blenheim	7,630	7,149	7,359	7,399	7,565
East Zorra-Tavistock	7,238	7,008	6,836	7,113	7,841
Essex	20,085	20,032	19,600	20,427	21,216
Ingersoll	10,977	11,760	12,146	12,757	13,693
Kingsville	19,619	20,908	21,362	21,552	22,119
Lakeshore	28,746	33,245	34,546	36,611	40,410
LaSalle	25,285	27,652	28,643	30,180	32,721
Leamington	27,138	28,833	28,403	27,595	29,680
Norwich	10,478	10,481	10,721	10,835	11,151
Pelee	256	287	171	235	230
South-West Oxford	7,782	7,589	7,544	7,634	7,583
Tecumseh	25,105	24,224	23,610	23,229	23,300
Tillsonburg	14,052	14,822	15,301	15,872	18,615
Windsor	208,402	216,473	210,891	217,188	229,660
Woodstock	33,061	35,822	37,754	41,098	46,705
Zorra	8,052	8,125	8,058	8,138	8,628

We used a few more gt functions to clean up the table somewhat, but the bulk of the presentation lies in the use of data_color(). Because this is a fairly complex example, we recommended running the code in a statement-by-statement manner to see how each function call changes the output table.

An important note to make here is that the order of columns in both the columns and target_columns arguments should match the intended mapping order. That is the case in the above example, but other situations might vary (thus, it’s important to keep this in mind).

Row-wise color mapping

Colorization can now occur in a row-wise manner. The key to making that happen is by using direction = "row". Let’s try this out using the sza dataset. After some very necessary dplyr and tidyr work, we’ll put that data into a gt table and apply color to values across each ‘month’ of data in that table. We won’t set a domain value and instead use the bounds of the data in each row.

sza |>
  dplyr::filter(latitude == 20 & tst <= "1200") |>
  dplyr::select(-latitude) |>
  dplyr::filter(!is.na(sza)) |>
  tidyr::pivot_wider(
    names_from = tst,
    values_from = sza,
    names_sort = TRUE
  ) |>
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  data_color(
    direction = "row",
    palette = "PuOr",
    na_color = "white"
  ) |>
  tab_options(table.font.size = px(12)) |>
  opt_vertical_padding(scale = 0.75)

	0530	0600	0630	0700	0730	0800	0830	0900	0930	1000	1030	1100	1130	1200
jan				84.9	78.7	72.7	66.1	61.5	56.5	52.1	48.3	45.5	43.6	43.0
feb			88.9	82.5	75.8	69.6	63.3	57.7	52.2	47.4	43.1	40.0	37.8	37.2
mar			85.7	78.8	72.0	65.2	58.6	52.3	46.2	40.5	35.5	31.4	28.6	27.7
apr		88.5	81.5	74.4	67.4	60.3	53.4	46.5	39.7	33.2	26.9	21.3	17.2	15.5
may		85.0	78.2	71.2	64.3	57.2	50.2	43.2	36.1	29.1	26.1	15.2	8.8	5.0
jun	89.2	82.7	76.0	69.3	62.5	55.7	48.8	41.9	35.0	28.1	21.1	14.2	7.3	2.0
jul	88.8	82.3	75.7	69.1	62.3	55.5	48.7	41.8	35.0	28.1	21.2	14.3	7.7	3.1
aug		83.8	77.1	70.2	63.3	56.4	49.4	42.4	35.4	28.3	21.3	14.3	7.3	1.9
sep		87.2	80.2	73.2	66.1	59.1	52.1	45.1	38.1	31.3	24.7	18.6	13.7	11.6
oct			84.1	77.1	70.2	63.3	56.5	49.9	43.5	37.5	32.0	27.4	24.3	23.1
nov			87.8	81.3	74.5	68.3	61.8	56.0	50.2	45.3	40.7	37.4	35.1	34.4
dec				84.3	78.0	71.8	66.1	60.5	55.6	50.9	47.2	44.2	42.4	41.8

When using direction = "row", we can see that each row has cell coloring that is relative to the range of values in the particular row. This is useful in those situations where you might feel the colorization should be made specific to the row.

One last thing, also to do with rows

The data_color() function now has a rows argument. Before that wasn’t there, and you had no choice but to color each and every row in the columns specified. Of course, sometimes you just want colorization in a specific region of the table. Here’s an example that demonstrates this (and we’re using the new metro dataset):

metro |>
  dplyr::select(name, passengers, connect_other) |>
  dplyr::arrange(desc(passengers)) |>
  head(15) |>
  gt(locale = "fr") |>
  tab_header(
    title = "Les stations de métro les plus fréquentées et
    leurs nombre annuel de passagers",
    subtitle = "Ceux qui sont à côté des gares sont surlignés en vert"
  ) |>
  fmt_integer() |>
  tab_row_group(
    label = "a côté d'une gare",
    rows = grepl("TGV", connect_other),
    id = "gare"
  ) |>
  data_color(
    columns = passengers,
    rows = grepl("TGV", connect_other),
    method = "numeric",
    palette = c("lightgreen", "green" |> adjust_luminance(steps = -2))
  ) |>
  cols_hide(columns = connect_other) |>
  cols_label(
    name ~ "station de métro",
    passengers = "passagers"
  ) |>
  cols_width(
    name ~ px(375),
    passengers ~ px(150)
  ) |>
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_row_groups(groups = "gare")
  ) |>
  opt_all_caps() |>
  opt_align_table_header(align = "left") |>
  opt_horizontal_padding(scale = 3) |>
  opt_table_font(stack = "rounded-sans")

station de métro	passagers
Les stations de métro les plus fréquentées et leurs nombre annuel de passagers
Ceux qui sont à côté des gares sont surlignés en vert
a côté d'une gare
Gare du Nord	34 503 097
Saint-Lazare	33 128 384
Gare de Lyon	28 640 475
Montparnasse—Bienvenüe	20 407 224
Gare de l'Est	15 538 471

Bibliothèque François Mitterrand	11 104 474
République	11 079 708
Les Halles	10 623 876
La Défense	9 256 802
Châtelet	8 350 794
Bastille	8 069 243
Belleville	7 314 438
Hôtel de Ville	7 251 729
Place d'Italie	7 119 097
Bobigny—Pablo Picasso	6 561 327

Ce tableau de données là, c’est le fun!

In conclusion

We’ve wanted to improve the data_color() function of gt for a few years now, and we are so glad it is now a thing accomplished in version 0.9.0! The new version of this function is way more powerful than before (and hopefully easier to use too).

This is blog post number three of a series on gt version 0.9.0. There’s more to come, owing to the fact that this release of gt is a big one. We always want your feedback, and there are many different ways to get in touch with us. You can:

file an issue on GitHub
engage in discussions through the gt Discussions page, again on GitHub
follow us on Twitter at @gt_package
join the new gt_package Discord server

Until next time!

Rich Iannone

Software Engineer at Posit, PBC

Richard is a software engineer and table enthusiast. He and R go way back and he's been getting better at writing code in Python too. For the most part, Rich enjoys creating open source packages in R and Python so that people can do great things in their own work.

Color coding your data in {gt} 0.9.0

A basic example on how to use `data_color()`

Choosing a palette

Color mapping methods

Applying color to other columns

Row-wise color mapping

One last thing, also to do with rows

In conclusion

Rich Iannone

Related Content

Deploying boosted tree models with Orbital

Building realistic fake datasets with Pointblank

Serving the Public: See How Government Agencies Use R, Python, Shiny, and Quarto to Drive Research and Data Science Modernization

Color coding your data in {gt} 0.9.0

A basic example on how to use data_color()

Choosing a palette

Color mapping methods

Applying color to other columns

Row-wise color mapping

One last thing, also to do with rows

In conclusion

Rich Iannone

Related Content

Deploying boosted tree models with Orbital

Building realistic fake datasets with Pointblank

Serving the Public: See How Government Agencies Use R, Python, Shiny, and Quarto to Drive Research and Data Science Modernization

A basic example on how to use `data_color()`