Merging columns together and fixing up column labels in {gt} `0.9.0`
The newest version of gt (0.9.0) has many refinements to what was already there. Sure, new functions were added, but we also took the time to fix up many existing functions and make them better. This (fifth) blog post about gt 0.9.0 is concerned with two major enhancements to existing functionality: (1) merging column data together and (2) modifying column labels to make them more suitable for presentation.
Column merging via cols_merge()
The cols_merge() function has been in the package since the first public release of gt. It’s useful! What it does is take input from two or more columns and allows the contents to be merged into a single column, using a pattern that specifies the arrangement.
Let’s try using this function in an example; only then will it all make some sense. With a portion of the metro dataset, we’ll format the numbers in the latitude and longitude columns, combine the data from both those columns into a single column (in this case, collapsing them into latitude column), and then clean up the column labels with cols_label().
library(gt)Warning: package 'gt' was built under R version 4.4.3
metro_merge <-
metro |>
dplyr::filter(grepl("3rd", location)) |>
dplyr::select(name, caption, latitude, longitude) |>
dplyr::slice_head(n = 10) |>
gt() |>
fmt_number(
columns = c(latitude, longitude),
decimals = 4
) |>
cols_merge(
columns = c(latitude, longitude),
pattern = "({1}, {2})"
) |>
cols_label(
name ~ "Station",
latitude ~ "Coordinates"
) |>
cols_hide(columns = caption)
metro_merge| Station | Coordinates |
|---|---|
| Arts et Métiers | (48.8655, 2.3561) |
| Réaumur—Sébastopol | (48.8661, 2.3525) |
| République | (48.8675, 2.3638) |
| Temple | (48.8665, 2.3606) |
| Strasbourg—Saint-Denis | (48.8697, 2.3539) |
| Chemin Vert | (48.8568, 2.3683) |
| Filles du Calvaire | (48.8642, 2.3661) |
| Saint-Sébastien—Froissart | (48.8605, 2.3674) |
| Rambuteau | (48.8612, 2.3534) |
A table that had three columns now has two. What’s happening is that the formatted data from two columns is being combined in the latitude column (why that column? It’s the first column cited in the columns argument of cols_merge()), and the remaining column (longitude) undergoes the useful process of being hidden. The hide_columns argument of cols_merge() lets you modify this default behavior.
When merging column data, the thing gt will do is just put the values together, separated by a space character. Usually, this is not what you really want, so, a pattern specification can be used. In the above example, we used pattern = "({1}, {2})". Breaking this down, the "{1}" is the first column cited in columns, and the "{2}" is the second of the two. The rest of that string is made up of string literals. What we get in the end is the latitude and longitude values separated by a comma and a space, and all of that surrounded by parentheses.
This pattern syntax is great, so let’s look at a bit more of that in a continuation of the above example. Station names in the metro dataset sometimes have a caption value. To combine the caption with the station name in a readable format, we can use cols_merge() with the pattern "{1} ({2})" (where "{1}" corresponds to the name column, and "{2}" is the caption column). This is sensible, but as noted, not every station has an associated caption, so NAs unfortunately make their way into the table output:
metro_merge |>
cols_unhide(columns = caption) |>
cols_merge(
columns = c(name, caption),
pattern = "{1} ({2})"
)| Station | Coordinates |
|---|---|
| Arts et Métiers (NA) | (48.8655, 2.3561) |
| Réaumur—Sébastopol (NA) | (48.8661, 2.3525) |
| République (NA) | (48.8675, 2.3638) |
| Temple (NA) | (48.8665, 2.3606) |
| Strasbourg—Saint-Denis (NA) | (48.8697, 2.3539) |
| Chemin Vert (NA) | (48.8568, 2.3683) |
| Filles du Calvaire (NA) | (48.8642, 2.3661) |
| Saint-Sébastien—Froissart (NA) | (48.8605, 2.3674) |
| Rambuteau (Centre Georges Pompidou) | (48.8612, 2.3534) |
This is a problem, but gt 0.9.0 now has a solution for it. We can now use << >> to surround spans of text that will be removed if any of the contained {n} values yields a missing value. So if a particular station doesn’t have a caption value (i.e., it is NA), we will want to remove not only the missing value but also the parentheses we used in our pattern. The pattern should then change from "{1} ({2})" to "{1}<< ({2})>>". Let’s try it out!
metro_merge |>
cols_unhide(columns = caption) |>
cols_merge(
columns = c(name, caption),
pattern = "{1}<< ({2})>>"
)| Station | Coordinates |
|---|---|
| Arts et Métiers | (48.8655, 2.3561) |
| Réaumur—Sébastopol | (48.8661, 2.3525) |
| République | (48.8675, 2.3638) |
| Temple | (48.8665, 2.3606) |
| Strasbourg—Saint-Denis | (48.8697, 2.3539) |
| Chemin Vert | (48.8568, 2.3683) |
| Filles du Calvaire | (48.8642, 2.3661) |
| Saint-Sébastien—Froissart | (48.8605, 2.3674) |
| Rambuteau (Centre Georges Pompidou) | (48.8612, 2.3534) |
So much better! The cols_merge() function is now a bit more powerful thanks to the enhanced pattern syntax and functionality. Please check out the help article accessible through ?cols_merge for more details (particularly in the How the pattern works section). The syntax can even handle nested uses of << >> for more complex NA-handling situations.
Fixing up column labels with cols_label_with()
Column names do not often make for good column labels in tables for publication. Because of this, it’s very common to apply new labels to columns in a gt table with cols_label(). But sometimes you’re so close, you might want to replace underscores with spaces and put the separate words in title case. Using cols_label() to type out all the desired column labels can be such a drag. To make this situation better, gt 0.9.0 introduces the function called cols_label_with(). This new addition to the cols_*() family of functions lets you label columns using a function in its fn argument.
Let’s use sp500 to create a new gt table. We want all of the column labels to be entirely capitalized versions of the default labels, but instead of using cols_label() and rewriting each label manually in capital letters, we will use cols_label_with() and instruct it to apply the toupper() function to all column labels.
sp500 |>
dplyr::filter(date >= "2015-12-01" & date <= "2015-12-11") |>
dplyr::select(-c(adj_close, volume)) |>
gt() |>
cols_label_with(fn = toupper)| DATE | OPEN | HIGH | LOW | CLOSE |
|---|---|---|---|---|
| 2015-12-11 | 2047.27 | 2047.27 | 2008.80 | 2012.37 |
| 2015-12-10 | 2047.93 | 2067.65 | 2045.67 | 2052.23 |
| 2015-12-09 | 2061.17 | 2080.33 | 2036.53 | 2047.62 |
| 2015-12-08 | 2073.39 | 2073.85 | 2052.32 | 2063.59 |
| 2015-12-07 | 2090.42 | 2090.42 | 2066.78 | 2077.07 |
| 2015-12-04 | 2051.24 | 2093.84 | 2051.24 | 2091.69 |
| 2015-12-03 | 2080.71 | 2085.00 | 2042.35 | 2049.62 |
| 2015-12-02 | 2101.71 | 2104.27 | 2077.11 | 2079.51 |
| 2015-12-01 | 2082.93 | 2103.37 | 2082.93 | 2102.63 |
Let’s have a look at another example, this time using a portion of the countrypops dataset. We can, if we prefer, apply multiple types of column label changes in sequence with multiple calls of cols_label_with(). Here, we use the make_clean_names() function from the janitor package and follow up with the removal of a numeral with gsub().
countrypops |>
dplyr::filter(year == 2021) |>
dplyr::filter(grepl("^C", country_code_3)) |>
dplyr::select(-country_code_2, -year) |>
head(8) |>
gt() |>
cols_move_to_start(columns = country_code_3) |>
fmt_integer(columns = population) |>
cols_label_with(
fn = ~ janitor::make_clean_names(., case = "title")
) |>
cols_label_with(
fn = ~ gsub("[0-9]", "", .)
)| Country Code | Country Name | Population |
|---|---|---|
| CAF | Central African Republic | 5,112,100 |
| CAN | Canada | 38,239,864 |
| CHE | Switzerland | 8,704,546 |
| CHL | Chile | 19,456,334 |
| CHN | China | 1,412,360,000 |
| CIV | Cote d'Ivoire | 29,639,736 |
| CMR | Cameroon | 26,915,758 |
| COD | Congo (DRC) | 99,148,932 |
With the sza dataset, we can make a table where the column names are dates. The labels can be generated as month names in any language using cols_label_with() with vec_fmt_datetime() as the function given in fn.
sza |>
dplyr::filter(latitude == 50) |>
dplyr::mutate(date = paste0(
"2000-",
formatC(
match(month, tolower(month.abb)),
width = 2,
flag = "0"
),
"-01"
)) |>
dplyr::select(-c(latitude, month)) |>
tidyr::pivot_wider(names_from = date, values_from = sza) |>
gt(rowname_col = "tst") |>
sub_missing(missing_text = "") |>
cols_width(everything() ~ px(50)) |>
data_color(
palette = c("yellow", "darkblue"),
domain = c(0, 90),
na_color = "darkblue" |> adjust_luminance(steps = -1)
) |>
tab_options(
table_body.hlines.style = "none",
table_body.border.bottom.color = "black",
column_labels.border.top.color = "black",
column_labels.border.bottom.color = "black",
stub.border.color = "black"
) |>
cols_label_with(fn = function(x) {
vec_fmt_datetime(x, format = "MMM", locale = "de")
}
)| Jan. | Feb. | März | Apr. | Mai | Juni | Juli | Aug. | Sept. | Okt. | Nov. | Dez. | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0400 | 89.7 | |||||||||||
| 0430 | 86.7 | 85.7 | 89.7 | |||||||||
| 0500 | 87.8 | 82.4 | 81.5 | 85.4 | ||||||||
| 0530 | 83.2 | 78.0 | 77.1 | 80.8 | 88.3 | |||||||
| 0600 | 86.6 | 78.5 | 73.3 | 72.5 | 76.2 | 83.6 | ||||||
| 0630 | 81.8 | 73.7 | 68.6 | 67.8 | 71.4 | 78.8 | 87.6 | |||||
| 0700 | 86.3 | 76.9 | 68.9 | 63.8 | 63.0 | 66.6 | 74.0 | 82.8 | ||||
| 0730 | 89.5 | 81.8 | 72.2 | 64.1 | 59.0 | 58.2 | 61.8 | 69.2 | 78.2 | 87.1 | ||
| 0800 | 85.3 | 77.4 | 67.6 | 59.4 | 54.2 | 53.4 | 57.0 | 64.6 | 73.8 | 83.0 | 89.2 | |
| 0830 | 86.5 | 81.5 | 73.3 | 63.2 | 54.7 | 49.5 | 48.7 | 52.4 | 60.1 | 69.6 | 79.0 | 85.5 |
| 0900 | 83.2 | 78.0 | 69.6 | 59.1 | 50.3 | 44.9 | 44.1 | 47.9 | 55.9 | 65.7 | 75.5 | 82.1 |
| 0930 | 80.2 | 74.9 | 66.2 | 55.4 | 46.2 | 40.6 | 39.7 | 43.7 | 52.0 | 62.1 | 72.3 | 79.1 |
| 1000 | 77.7 | 72.2 | 63.2 | 52.0 | 42.5 | 36.6 | 35.6 | 39.9 | 48.5 | 59.1 | 69.5 | 76.5 |
| 1030 | 75.7 | 70.0 | 60.9 | 49.3 | 39.4 | 33.1 | 32.1 | 36.6 | 45.7 | 56.5 | 67.3 | 74.5 |
| 1100 | 74.2 | 68.5 | 59.1 | 47.2 | 37.0 | 30.4 | 29.3 | 34.1 | 43.5 | 54.7 | 65.7 | 73.0 |
| 1130 | 73.3 | 67.5 | 58.0 | 45.9 | 35.5 | 28.6 | 27.5 | 32.5 | 42.1 | 53.5 | 64.7 | 72.1 |
| 1200 | 73.0 | 67.2 | 57.7 | 45.5 | 35.0 | 28.0 | 26.9 | 31.9 | 41.6 | 53.1 | 64.4 | 71.8 |
Here’s one more gt table example with cols_label_with(), and this time, we just have to use the venerable pizzaplace dataset. There are ways to use one instance of cols_label_with() with multiple functions called on the column labels. We’ll try that out here by providing an anonymous function (with the function(x) { ... } construction) that performs multiple mutations of x (the vector of column labels). In this example code, we’re even going to use the md() helper function to signal to gt that the column label should be interpreted as Markdown text.
pizzaplace |>
dplyr::mutate(month = substr(date, 6, 7)) |>
dplyr::group_by(month) |>
dplyr::summarize(pizze_vendute = dplyr::n()) |>
dplyr::ungroup() |>
dplyr::mutate(frazione_della_quota = pizze_vendute / 4000) |>
dplyr::mutate(date = paste0("2015/", month, "/01")) |>
dplyr::select(-month) |>
gt(rowname_col = "date") |>
fmt_date(date, date_style = "month", locale = "it") |>
fmt_percent(columns = frazione_della_quota) |>
fmt_integer(columns = pizze_vendute) |>
cols_width(everything() ~ px(100)) |>
cols_label_with(
fn = function(x) {
janitor::make_clean_names(x, case = "title") |>
toupper() |>
stringr::str_replace_all("^|$", "**") |>
md()
}
)| PIZZE VENDUTE | FRAZIONE DELLA QUOTA | |
|---|---|---|
| gennaio | 4,232 | 105.80% |
| febbraio | 3,961 | 99.02% |
| marzo | 4,261 | 106.53% |
| aprile | 4,151 | 103.77% |
| maggio | 4,328 | 108.20% |
| giugno | 4,107 | 102.68% |
| luglio | 4,392 | 109.80% |
| agosto | 4,168 | 104.20% |
| settembre | 3,890 | 97.25% |
| ottobre | 3,883 | 97.08% |
| novembre | 4,266 | 106.65% |
| dicembre | 3,935 | 98.38% |
Bellissimo! We hope that the cols_label_with() is something you can use to avoid unnecessary repetition in your table-making code! Much more information can be found in the detailed help file for this function (use ?cols_label_with to access).
An improved cols_label() function
Aside from the new cols_label_with() function, we also took the time to make some targeted improvements to cols_label(). If you never really used it, what it does is allow for column labels to be modified. The basic model is that you provide the column names (i.e., those from the input data table) on the left and the column labels on the right.
A new thing that was added is the inclusion of a .fn argument. This gives you the opportunity to supply a function that is to be called on each label provided. For example, you can use the md() helper function (which signals to gt that the labels are given as Markdown text) in .fn once instead of calling it on each and every label.
metro |>
dplyr::select(name, lines, passengers, connect_other) |>
dplyr::arrange(desc(passengers)) |>
dplyr::slice_head(n = 10) |>
gt() |>
cols_hide(columns = passengers) |>
sub_missing(missing_text = "") |>
cols_label(
name = "Name of<br>Metro Station",
lines = "Metro<br>Lines",
connect_other = "Train<br>Services",
.fn = md
)| Name of Metro Station |
Metro Lines |
Train Services |
|---|---|---|
| Gare du Nord | 4, 5 | TGV, TER, Thalys, Eurostar |
| Saint-Lazare | 3, 12, 13, 14 | TGV, TER, Intercités |
| Gare de Lyon | 1, 14 | TGV, TGV Lyria, Renfe-SNCF, OUIGO, Frecciarossa |
| Montparnasse—Bienvenüe | 4, 6, 12, 13 | TGV, TER, Intercités, OUIGO |
| Gare de l'Est | 4, 5, 7 | TGV, TER, OUIGO, Nightjet |
| Bibliothèque François Mitterrand | 14 | |
| République | 3, 5, 8, 9, 11 | |
| Les Halles | 4 | |
| La Défense | 1 | |
| Châtelet | 1, 4, 7, 11, 14 |
We have a new way of expressing column labels in cols_label(), and that is through the use of two-sided formulas (<LHS> ~ <RHS> instead of <LHS> = <RHS>). By using tidyselect expressions on the left side of the formulas, we can easily relabel multiple columns with common label text. Let’s create an interesting gt table that exemplifies this (using the towny dataset):
towny |>
dplyr::select(
name, ends_with("2001"), ends_with("2006"), matches("2001_2006")
) |>
dplyr::filter(population_2001 > 100000) |>
dplyr::arrange(desc(pop_change_2001_2006_pct)) |>
dplyr::slice_head(n = 10) |>
gt() |>
fmt_integer() |>
fmt_percent(columns = matches("change"), decimals = 1) |>
tab_spanner(label = "Population", columns = starts_with("population")) |>
tab_spanner(label = "Density", columns = starts_with("density")) |>
cols_label(
ends_with("01") ~ "2001",
ends_with("06") ~ "2006",
matches("change") ~ md("Population Change,<br>2001 to 2006")
) |>
cols_width(everything() ~ px(120))| name |
Population
|
Density
|
Population Change, 2001 to 2006 |
||
|---|---|---|---|---|---|
| 2001 | 2006 | 2001 | 2006 | ||
| Brampton | 325,428 | 433,806 | 1,224 | 1,632 | 33.3% |
| Vaughan | 182,022 | 238,866 | 668 | 877 | 31.2% |
| Markham | 208,615 | 261,573 | 989 | 1,240 | 25.4% |
| Barrie | 103,710 | 128,430 | 1,047 | 1,297 | 23.8% |
| Richmond Hill | 132,030 | 162,704 | 1,310 | 1,614 | 23.2% |
| Oakville | 144,738 | 165,613 | 1,042 | 1,192 | 14.4% |
| Mississauga | 612,925 | 668,599 | 2,094 | 2,284 | 9.1% |
| Cambridge | 110,372 | 120,371 | 977 | 1,065 | 9.1% |
| Burlington | 150,836 | 164,415 | 810 | 883 | 9.0% |
| Guelph | 106,170 | 114,943 | 1,214 | 1,315 | 8.3% |
Note that we cannot use an = sign in any of the expressions within cols_label() in the above example. Because the LHS is not a single-column name, we must use formula syntax (i.e., with the ~). We are not limited to just using tidyselect expressions on the LHS. We can alternatively supply vectors of column names (and this is great for using repetitive labels across many columns).
Signing off
We want to make things better in gt (always). If you identify points of friction that impede your own gt workflows, let us know! You can file an issue on GitHub. Don’t be shy about sharing ideas or asking questions. We are there for you on the gt Discussions page.
We occasionally share things on Twitter at @gt_package, so you should follow us there. Our gt_package Discord server is active and also lots of fun (drop by and show us your table creations in there!).