Commercial enterprise offerings Open source packages - Quarto, Shiny, and more

rvest 0.3.0

Hadley Wickham Headshot
Written by Hadley Wickham
2015-09-24

Please note that the information presented in this post reflects the package as it stood when initially released, and may now be outdated. For the most up-to-date information, kindly refer to https://rvest.tidyverse.org/.

I’m pleased to announce rvest 0.3.0 is now available on CRAN. Rvest makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with pipes so that you can express complex operations by composed simple pieces. Install it with:

install.packages("rvest")

What’s new

The biggest change in this version is that rvest now uses the xml2 package instead of XML. This makes rvest much simpler, eliminates memory leaks, and should improve performance a little.

A number of functions have changed names to improve consistency with other packages: most importantly html() is now read_html(), and html_tag() is now html_name(). The old versions still work, but are deprecated and will be removed in rvest 0.4.0.

html_node() now throws an error if there are no matches, and a warning if there’s more than one match. I think this should make it more likely to fail clearly when the structure of the page changes. If you don’t want this behaviour, use html_nodes().

There were a number of other bug fixes and minor improvements as described in the release notes.

Hadley Wickham Headshot

Hadley Wickham

Chief Scientist, Posit
Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science.