Commercial enterprise offerings Open source packages - Quarto, Shiny, and more

Get data out of excel and into R with readxl

Hadley Wickham Headshot
Written by Hadley Wickham
2015-04-15

I’m pleased to announced that the first version of readxl is now available on CRAN. Readxl makes it easy to get tabular data out of excel. It:

  • Supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the insanely fast RapidXML C++ library.

  • Has no external dependencies so it’s easy to use on all platforms.

  • Re-encodes non-ASCII characters to UTF-8.

  • Loads datetimes into POSIXct columns. Both Windows (1900) and Mac (1904) date specifications are processed correctly.

  • Blank columns are automatically dropped.

  • Returns output with class c("tbl_df", "tbl", "data.frame") so if you also use dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).

You can install it by running:

install.packages("readxl")

There’s not really much to say about how to use it:

library(readxl)
# Use a excel file included in the package
sample <- system.file("extdata", "datasets.xlsx", package = "readxl")

# Read by position
head(read_excel(sample, 2))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Or by name:
excel_sheets(sample)
#> [1] "iris"     "mtcars"   "chickwts" "quakes"
head(read_excel(sample, "mtcars"))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

You can see the documentation for more info on the col_names, col_types and na arguments.

Readxl is still under active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.

Hadley Wickham Headshot

Hadley Wickham

Chief Scientist, Posit
Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science.