Open source packages - Quarto, Shiny, and more Commercial enterprise offerings

Merging columns together and fixing up column labels in {gt} `0.9.0`

Written by Rich Iannone

2023-06-14

MERGING COLUMNS AND FIXING COLUMN NAMES gt 0.9.0

The newest version of gt (0.9.0) has many refinements to what was already there. Sure, new functions were added, but we also took the time to fix up many existing functions and make them better. This (fifth) blog post about gt 0.9.0 is concerned with two major enhancements to existing functionality: (1) merging column data together and (2) modifying column labels to make them more suitable for presentation.

Column merging via `cols_merge()`

The cols_merge() function has been in the package since the first public release of gt. It’s useful! What it does is take input from two or more columns and allows the contents to be merged into a single column, using a pattern that specifies the arrangement.

Let’s try using this function in an example; only then will it all make some sense. With a portion of the metro dataset, we’ll format the numbers in the latitude and longitude columns, combine the data from both those columns into a single column (in this case, collapsing them into latitude column), and then clean up the column labels with cols_label().

library(gt)

Warning: package 'gt' was built under R version 4.4.3

metro_merge <-
  metro |>
  dplyr::filter(grepl("3rd", location)) |>
  dplyr::select(name, caption, latitude, longitude) |>
  dplyr::slice_head(n = 10) |>
  gt() |>
  fmt_number(
    columns = c(latitude, longitude),
    decimals = 4
  ) |>
  cols_merge(
    columns = c(latitude, longitude),
    pattern = "({1}, {2})"
  ) |>
  cols_label(
   name ~ "Station",
   latitude ~ "Coordinates"
  ) |>
  cols_hide(columns = caption)

metro_merge

Station	Coordinates
Arts et Métiers	(48.8655, 2.3561)
Réaumur—Sébastopol	(48.8661, 2.3525)
République	(48.8675, 2.3638)
Temple	(48.8665, 2.3606)
Strasbourg—Saint-Denis	(48.8697, 2.3539)
Chemin Vert	(48.8568, 2.3683)
Filles du Calvaire	(48.8642, 2.3661)
Saint-Sébastien—Froissart	(48.8605, 2.3674)
Rambuteau	(48.8612, 2.3534)

A table that had three columns now has two. What’s happening is that the formatted data from two columns is being combined in the latitude column (why that column? It’s the first column cited in the columns argument of cols_merge()), and the remaining column (longitude) undergoes the useful process of being hidden. The hide_columns argument of cols_merge() lets you modify this default behavior.

When merging column data, the thing gt will do is just put the values together, separated by a space character. Usually, this is not what you really want, so, a pattern specification can be used. In the above example, we used pattern = "({1}, {2})". Breaking this down, the "{1}" is the first column cited in columns, and the "{2}" is the second of the two. The rest of that string is made up of string literals. What we get in the end is the latitude and longitude values separated by a comma and a space, and all of that surrounded by parentheses.

This pattern syntax is great, so let’s look at a bit more of that in a continuation of the above example. Station names in the metro dataset sometimes have a caption value. To combine the caption with the station name in a readable format, we can use cols_merge() with the pattern "{1} ({2})" (where "{1}" corresponds to the name column, and "{2}" is the caption column). This is sensible, but as noted, not every station has an associated caption, so NAs unfortunately make their way into the table output:

metro_merge |>
  cols_unhide(columns = caption) |>
  cols_merge(
    columns = c(name, caption),
    pattern = "{1} ({2})"
  )

Station	Coordinates
Arts et Métiers (NA)	(48.8655, 2.3561)
Réaumur—Sébastopol (NA)	(48.8661, 2.3525)
République (NA)	(48.8675, 2.3638)
Temple (NA)	(48.8665, 2.3606)
Strasbourg—Saint-Denis (NA)	(48.8697, 2.3539)
Chemin Vert (NA)	(48.8568, 2.3683)
Filles du Calvaire (NA)	(48.8642, 2.3661)
Saint-Sébastien—Froissart (NA)	(48.8605, 2.3674)
Rambuteau (Centre Georges Pompidou)	(48.8612, 2.3534)

This is a problem, but gt 0.9.0 now has a solution for it. We can now use << >>⁠ to surround spans of text that will be removed if any of the contained {n} values yields a missing value. So if a particular station doesn’t have a caption value (i.e., it is NA), we will want to remove not only the missing value but also the parentheses we used in our pattern. The pattern should then change from "{1} ({2})" to "{1}<< ({2})>>". Let’s try it out!

metro_merge |>
  cols_unhide(columns = caption) |>
  cols_merge(
    columns = c(name, caption),
    pattern = "{1}<< ({2})>>"
  )

Station	Coordinates
Arts et Métiers	(48.8655, 2.3561)
Réaumur—Sébastopol	(48.8661, 2.3525)
République	(48.8675, 2.3638)
Temple	(48.8665, 2.3606)
Strasbourg—Saint-Denis	(48.8697, 2.3539)
Chemin Vert	(48.8568, 2.3683)
Filles du Calvaire	(48.8642, 2.3661)
Saint-Sébastien—Froissart	(48.8605, 2.3674)
Rambuteau (Centre Georges Pompidou)	(48.8612, 2.3534)

So much better! The cols_merge() function is now a bit more powerful thanks to the enhanced pattern syntax and functionality. Please check out the help article accessible through ?cols_merge for more details (particularly in the How the pattern works section). The syntax can even handle nested uses of << >> for more complex NA-handling situations.

Fixing up column labels with `cols_label_with()`

Column names do not often make for good column labels in tables for publication. Because of this, it’s very common to apply new labels to columns in a gt table with cols_label(). But sometimes you’re so close, you might want to replace underscores with spaces and put the separate words in title case. Using cols_label() to type out all the desired column labels can be such a drag. To make this situation better, gt 0.9.0 introduces the function called cols_label_with(). This new addition to the cols_*() family of functions lets you label columns using a function in its fn argument.

Let’s use sp500 to create a new gt table. We want all of the column labels to be entirely capitalized versions of the default labels, but instead of using cols_label() and rewriting each label manually in capital letters, we will use cols_label_with() and instruct it to apply the toupper() function to all column labels.

sp500 |>
  dplyr::filter(date >= "2015-12-01" & date <= "2015-12-11") |>
  dplyr::select(-c(adj_close, volume)) |>
  gt() |>
  cols_label_with(fn = toupper)

DATE	OPEN	HIGH	LOW	CLOSE
2015-12-11	2047.27	2047.27	2008.80	2012.37
2015-12-10	2047.93	2067.65	2045.67	2052.23
2015-12-09	2061.17	2080.33	2036.53	2047.62
2015-12-08	2073.39	2073.85	2052.32	2063.59
2015-12-07	2090.42	2090.42	2066.78	2077.07
2015-12-04	2051.24	2093.84	2051.24	2091.69
2015-12-03	2080.71	2085.00	2042.35	2049.62
2015-12-02	2101.71	2104.27	2077.11	2079.51
2015-12-01	2082.93	2103.37	2082.93	2102.63

Let’s have a look at another example, this time using a portion of the countrypops dataset. We can, if we prefer, apply multiple types of column label changes in sequence with multiple calls of cols_label_with(). Here, we use the make_clean_names() function from the janitor package and follow up with the removal of a numeral with gsub().

countrypops |>
  dplyr::filter(year == 2021) |>
  dplyr::filter(grepl("^C", country_code_3)) |>
  dplyr::select(-country_code_2, -year) |>
  head(8) |>
  gt() |>
  cols_move_to_start(columns = country_code_3) |>
  fmt_integer(columns = population) |>
  cols_label_with(
    fn = ~ janitor::make_clean_names(., case = "title")
  ) |>
  cols_label_with(
    fn = ~ gsub("[0-9]", "", .)
  )

Country Code	Country Name	Population
CAF	Central African Republic	5,112,100
CAN	Canada	38,239,864
CHE	Switzerland	8,704,546
CHL	Chile	19,456,334
CHN	China	1,412,360,000
CIV	Cote d'Ivoire	29,639,736
CMR	Cameroon	26,915,758
COD	Congo (DRC)	99,148,932

With the sza dataset, we can make a table where the column names are dates. The labels can be generated as month names in any language using cols_label_with() with vec_fmt_datetime() as the function given in fn.

sza |>
  dplyr::filter(latitude == 50) |>
  dplyr::mutate(date = paste0(
    "2000-",
    formatC(
      match(month, tolower(month.abb)),
      width = 2,
      flag = "0"
    ),
    "-01"
  )) |>
  dplyr::select(-c(latitude, month)) |>
  tidyr::pivot_wider(names_from = date, values_from = sza) |>
  gt(rowname_col = "tst") |>
  sub_missing(missing_text = "") |>
  cols_width(everything() ~ px(50)) |>
  data_color(
    palette = c("yellow", "darkblue"),
    domain = c(0, 90),
    na_color = "darkblue" |> adjust_luminance(steps = -1)
  ) |>
  tab_options(
    table_body.hlines.style = "none",
    table_body.border.bottom.color = "black",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    stub.border.color = "black"
  ) |>
  cols_label_with(fn = function(x) {
    vec_fmt_datetime(x, format = "MMM", locale = "de")
    }
  )

	Jan.	Feb.	März	Apr.	Mai	Juni	Juli	Aug.	Sept.	Okt.	Nov.	Dez.
0400							89.7
0430						86.7	85.7	89.7
0500					87.8	82.4	81.5	85.4
0530					83.2	78.0	77.1	80.8	88.3
0600				86.6	78.5	73.3	72.5	76.2	83.6
0630				81.8	73.7	68.6	67.8	71.4	78.8	87.6
0700			86.3	76.9	68.9	63.8	63.0	66.6	74.0	82.8
0730		89.5	81.8	72.2	64.1	59.0	58.2	61.8	69.2	78.2	87.1
0800		85.3	77.4	67.6	59.4	54.2	53.4	57.0	64.6	73.8	83.0	89.2
0830	86.5	81.5	73.3	63.2	54.7	49.5	48.7	52.4	60.1	69.6	79.0	85.5
0900	83.2	78.0	69.6	59.1	50.3	44.9	44.1	47.9	55.9	65.7	75.5	82.1
0930	80.2	74.9	66.2	55.4	46.2	40.6	39.7	43.7	52.0	62.1	72.3	79.1
1000	77.7	72.2	63.2	52.0	42.5	36.6	35.6	39.9	48.5	59.1	69.5	76.5
1030	75.7	70.0	60.9	49.3	39.4	33.1	32.1	36.6	45.7	56.5	67.3	74.5
1100	74.2	68.5	59.1	47.2	37.0	30.4	29.3	34.1	43.5	54.7	65.7	73.0
1130	73.3	67.5	58.0	45.9	35.5	28.6	27.5	32.5	42.1	53.5	64.7	72.1
1200	73.0	67.2	57.7	45.5	35.0	28.0	26.9	31.9	41.6	53.1	64.4	71.8

Here’s one more gt table example with cols_label_with(), and this time, we just have to use the venerable pizzaplace dataset. There are ways to use one instance of cols_label_with() with multiple functions called on the column labels. We’ll try that out here by providing an anonymous function (with the function(x) { ... } construction) that performs multiple mutations of x (the vector of column labels). In this example code, we’re even going to use the md() helper function to signal to gt that the column label should be interpreted as Markdown text.

pizzaplace |>
  dplyr::mutate(month = substr(date, 6, 7)) |>
  dplyr::group_by(month) |>
  dplyr::summarize(pizze_vendute = dplyr::n()) |>
  dplyr::ungroup() |>
  dplyr::mutate(frazione_della_quota = pizze_vendute / 4000) |>
  dplyr::mutate(date = paste0("2015/", month, "/01")) |>
  dplyr::select(-month) |>
  gt(rowname_col = "date") |>
  fmt_date(date, date_style = "month", locale = "it") |>
  fmt_percent(columns = frazione_della_quota) |>
  fmt_integer(columns = pizze_vendute) |>
  cols_width(everything() ~ px(100)) |>
  cols_label_with(
    fn = function(x) {
      janitor::make_clean_names(x, case = "title") |>
        toupper() |>
        stringr::str_replace_all("^|$", "**") |>
        md()
    }
  )

	PIZZE VENDUTE	FRAZIONE DELLA QUOTA
gennaio	4,232	105.80%
febbraio	3,961	99.02%
marzo	4,261	106.53%
aprile	4,151	103.77%
maggio	4,328	108.20%
giugno	4,107	102.68%
luglio	4,392	109.80%
agosto	4,168	104.20%
settembre	3,890	97.25%
ottobre	3,883	97.08%
novembre	4,266	106.65%
dicembre	3,935	98.38%

Bellissimo! We hope that the cols_label_with() is something you can use to avoid unnecessary repetition in your table-making code! Much more information can be found in the detailed help file for this function (use ?cols_label_with to access).

An improved `cols_label()` function

Aside from the new cols_label_with() function, we also took the time to make some targeted improvements to cols_label(). If you never really used it, what it does is allow for column labels to be modified. The basic model is that you provide the column names (i.e., those from the input data table) on the left and the column labels on the right.

A new thing that was added is the inclusion of a .fn argument. This gives you the opportunity to supply a function that is to be called on each label provided. For example, you can use the md() helper function (which signals to gt that the labels are given as Markdown text) in .fn once instead of calling it on each and every label.

metro |>
  dplyr::select(name, lines, passengers, connect_other) |>
  dplyr::arrange(desc(passengers)) |>
  dplyr::slice_head(n = 10) |>
  gt() |>
  cols_hide(columns = passengers) |>
  sub_missing(missing_text = "") |>
  cols_label(
    name = "Name of<br>Metro Station",
    lines = "Metro<br>Lines",
    connect_other = "Train<br>Services",
    .fn = md
  )

Name of Metro Station	Metro Lines	Train Services
Gare du Nord	4, 5	TGV, TER, Thalys, Eurostar
Saint-Lazare	3, 12, 13, 14	TGV, TER, Intercités
Gare de Lyon	1, 14	TGV, TGV Lyria, Renfe-SNCF, OUIGO, Frecciarossa
Montparnasse—Bienvenüe	4, 6, 12, 13	TGV, TER, Intercités, OUIGO
Gare de l'Est	4, 5, 7	TGV, TER, OUIGO, Nightjet
Bibliothèque François Mitterrand	14
République	3, 5, 8, 9, 11
Les Halles	4
La Défense	1
Châtelet	1, 4, 7, 11, 14

We have a new way of expressing column labels in cols_label(), and that is through the use of two-sided formulas (⁠<LHS> ~ <RHS> instead of ⁠<LHS> = <RHS>). By using tidyselect expressions on the left side of the formulas, we can easily relabel multiple columns with common label text. Let’s create an interesting gt table that exemplifies this (using the towny dataset):

towny |>
  dplyr::select(
    name, ends_with("2001"), ends_with("2006"), matches("2001_2006")
  ) |>
  dplyr::filter(population_2001 > 100000) |>
  dplyr::arrange(desc(pop_change_2001_2006_pct)) |>
  dplyr::slice_head(n = 10) |>
  gt() |>
  fmt_integer() |>
  fmt_percent(columns = matches("change"), decimals = 1) |>
  tab_spanner(label = "Population", columns = starts_with("population")) |>
  tab_spanner(label = "Density", columns = starts_with("density")) |>
  cols_label(
    ends_with("01") ~ "2001",
    ends_with("06") ~ "2006",
    matches("change") ~ md("Population Change,<br>2001 to 2006")
  ) |>
  cols_width(everything() ~ px(120))

name	Population		Density		Population Change, 2001 to 2006
name	2001	2006	2001	2006	Population Change, 2001 to 2006
Brampton	325,428	433,806	1,224	1,632	33.3%
Vaughan	182,022	238,866	668	877	31.2%
Markham	208,615	261,573	989	1,240	25.4%
Barrie	103,710	128,430	1,047	1,297	23.8%
Richmond Hill	132,030	162,704	1,310	1,614	23.2%
Oakville	144,738	165,613	1,042	1,192	14.4%
Mississauga	612,925	668,599	2,094	2,284	9.1%
Cambridge	110,372	120,371	977	1,065	9.1%
Burlington	150,836	164,415	810	883	9.0%
Guelph	106,170	114,943	1,214	1,315	8.3%

Note that we cannot use an = sign in any of the expressions within cols_label() in the above example. Because the LHS is not a single-column name, we must use formula syntax (i.e., with the ~). We are not limited to just using tidyselect expressions on the LHS. We can alternatively supply vectors of column names (and this is great for using repetitive labels across many columns).

Signing off

We want to make things better in gt (always). If you identify points of friction that impede your own gt workflows, let us know! You can file an issue on GitHub. Don’t be shy about sharing ideas or asking questions. We are there for you on the gt Discussions page.

We occasionally share things on Twitter at @gt_package, so you should follow us there. Our gt_package Discord server is active and also lots of fun (drop by and show us your table creations in there!).

Rich Iannone

Software Engineer at Posit, PBC

Richard is a software engineer and table enthusiast. He and R go way back and he's been getting better at writing code in Python too. For the most part, Rich enjoys creating open source packages in R and Python so that people can do great things in their own work.

Merging columns together and fixing up column labels in {gt} `0.9.0`

Column merging via `cols_merge()`

Fixing up column labels with `cols_label_with()`

An improved `cols_label()` function

Signing off

Rich Iannone

Related Content

Deploying boosted tree models with Orbital

Building realistic fake datasets with Pointblank

Serving the Public: See How Government Agencies Use R, Python, Shiny, and Quarto to Drive Research and Data Science Modernization

Merging columns together and fixing up column labels in {gt} `0.9.0`

Column merging via cols_merge()

Fixing up column labels with cols_label_with()

An improved cols_label() function

Signing off

Rich Iannone

Related Content

Deploying boosted tree models with Orbital

Building realistic fake datasets with Pointblank

Serving the Public: See How Government Agencies Use R, Python, Shiny, and Quarto to Drive Research and Data Science Modernization

Column merging via `cols_merge()`

Fixing up column labels with `cols_label_with()`

An improved `cols_label()` function