Announcing pins 1.2.0
I’m delighted to announce that pins for R 1.2.0 is now available on CRAN, and that pins for Python 0.8.1 is available from PyPI. The pins package publishes data, models, and other R or Python objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of pin boards, including folders (to share on a networked drive or with services like DropBox), Posit Connect, Amazon S3, Azure blob storage, and Google Cloud Storage. Pins can be versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.
You can install pins for R with:
install.packages("pins")You can install pins for Python with:
python -m pip install pinsThis post highlights several important improvements we want to make sure you know about. To see all the changes in pins, including more minor maintenance and bug fixes, check out the release notes for R and for Python.
Read and write pins with Parquet
The pins package supports writing in different file formats, such as .rds or .joblib for binary objects, JSON, and CSV. The R package has had support for Arrow for a long time, but this release adds Parquet as well. I personally have been confused at times about the differences between Parquet and Arrow, so I’ll add here that Arrow is primarily an in-memory format, whereas Parquet is a storage format. With pins, we are all about storage, so it makes sense to use Parquet!
library(pins)
board <- board_folder("parquet-demo", versioned = TRUE)
board |>
pin_write(
head(palmerpenguins::penguins),
"my-favorite-penguins",
type = "parquet"
) Creating new version '20230518T175308Z-ea043'
Writing to pin 'my-favorite-penguins'
The Palmer penguins dataset includes factor, integer, and numeric columns. When we store it using Parquet rather than a plain-text format like CSV, these types are all maintained for us and can even be read from Python!
from pins import board_folder
board = board_folder("parquet-demo")
board.pin_read("my-favorite-penguins") species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
3 Adelie Torgersen NaN ... NaN NaN 2007
4 Adelie Torgersen 36.7 ... 3450.0 female 2007
5 Adelie Torgersen 39.3 ... 3650.0 male 2007
[6 rows x 8 columns]
Check out our advice (which also applies to Python) about choosing how to store your pins.
Read from Connect vanity URLs
Many users of our professional publishing platform Posit Connect take advantage of pins to share data and models. One change in the new version of the R package is the addition of board_connect_url() for Connect vanity URLs.
board2 <- board_connect_url(c(
my_vanity_url_pin = "https://colorado.posit.co/rsc/great-numbers/"
))
board2 |> pin_read("my_vanity_url_pin") [1] 1 2 3 4 5 6 7 8 9 10
You can use any preferred name here instead of my_vanity_url_pin. The Connect vanity URL does not need to be public, and instead, this new board type uses connect_auth_headers() to pass in your Posit Connect authentication. This new board was made possible by a change to board_url() to add a headers argument, which also allows you to read from pins in a private GitHub repo or on GitHub Enterprise.
The board_url() function in Python doesn’t yet support passing headers directly, so if this is something you would like to see as a Python user, please open an issue!
Avoid writing duplicate pins
We have heard from users that it can be frustrating to write pins, perhaps as part of a reporting or ETL pipeline, that fill up a disk with duplicate versions of the same pin. In this new version of the R package, the pin_write() function gains a new argument force_identical_write which defaults to FALSE:
board |>
pin_write(
head(palmerpenguins::penguins),
"my-favorite-penguins",
type = "parquet"
) ! The hash of pin "my-favorite-penguins" has not changed.
• Your pin will not be stored.
It didn’t write! The pins package now checks the hash of the pin contents and will not write an additional version of the pin contents that have not changed. The pin metadata is not hashed or checked, so if I want to update the metadata even when the pin contents are not changed, now I need to do this:
board |>
pin_write(
head(palmerpenguins::penguins),
"my-favorite-penguins",
type = "parquet",
force_identical_write = TRUE
) Creating new version '20230518T175311Z-ea043'
Writing to pin 'my-favorite-penguins'
This argument can be used anytime you do want to write a new version of a pin, even with identical pin contents. This is a breaking change, with new behavior compared to how pins behaved before, but we have already heard from pins users that this quality-of-life improvement is welcome! In pins for Python, there is not yet an argument to control duplicate writes, so please open an issue if this is important to your work.
Acknowledgements
We’d like to thank all the folks who have contributed to the pins R and Python packages since their last releases, whether via filing issues or contributing code or documentation:
R package: @amashadihossein, @hadley, @jennybc, @juliasilge, @MichaelSchatz, @mzorko, @nick-youngblut, @RachaelDempsey, @slodge, @tsharaf, and @wibeasley
Python package: @AnthonyTedde, @edavidaja, @henningsway, @hhp94, @isabelizimm, @juliasilge, @kellobri, @machow, @mxblsdl, @ni2scmn, @robinsones, and @SamEdwardes