The Apache Arrow project enables data scientists using R, Python, and other languages to work with large datasets efficiently and with interactive speed. Arrow is so fast at some workflows that it seems to defy reality–or at least the limits of R’s capabilities. This talk examines the unique characteristics of the Arrow project that enable it to redefine what is possible in R. The talk also highlights some of the latest developments in the arrow R package, including how you can query and manipulate multi-file datasets, and it presents strategies for speeding up workflows by up to 100x.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great data science. By subscribing, you'll get alerted whenever we publish something new.