Announcing pins for Python

This article was first published on Python on RStudio , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

We’re excited to announce the release of pins for Python!

pins removes the hassle of managing data across projects, colleagues, and teams by
providing a central place for people to store, version and retrieve data.
If you’ve ever chased a CSV through a series of email exchanges, or had to decide between
data-final.csv and data-final-final.csv, then pins is for you.

pins stores data on a board, which can be a local folder, or on RStudio Connect or a
cloud provider like Amazon S3.
Each individual object (such as a dataframe, model, or another pickle-able Python object), together with some metadata, is called a pin.

The Python pins library works with its R counterpart,
so that teams working across R and Python have a unified strategy for sharing data.
This work emerged as part of RStudio’s investment in Python open source, in order to
support bilingual data science teams.

Getting Started

The first step to using pins is installing it from PyPI.

python -m pip install pins

In the examples below, I’ll walk through the basics of pins using a temporary directory
for a board, with board_temp(). This gets deleted after you close Python, so it is
not ideal for collaboration! You can use other boards, like board_rsconnect(), board_folder(), and board_s3(), in more realistic settings.

import pins
from pins.data import mtcars

board = pins.board_temp()

You can “pin” (save) data to a board with the .pin_write() method. It requires three
arguments: an object, a name, and a pin type:

board.pin_write(mtcars.head(), "mtcars", type="csv")
#> Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220601T175057Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 6, 1, 17, 50, 57, 80318), hash='120a54f7e0818041'), name='mtcars', user={})
#> 
#> Writing to pin 'mtcars'

Above, we saved the data as a CSV, but depending on
what you’re saving and who else you want to read it, you might use the
type argument to instead save it as a feather, parquet, or joblib file.

You can later retrieve the pinned data with .pin_read():

board.pin_read("mtcars")
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 0 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 1 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 2 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 4 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

You can search for data using .pin_search() and .pin_list().

# prints out a list of all pins
# board.pin_list()

# searches for pins containing "cars"
board.pin_search("cars")
#>      name type  ... file_size                                               meta
#> 0  mtcars  csv  ...       249  Meta(title='mtcars: a pinned 5 x 11 DataFrame'...
#> 
#> [1 rows x 6 columns]

Two more pieces of important functionality exist:

  • .pin_write() won’t delete existing data, but versions your data.
  • .pin_read() caches your data, so subsequent reads are much faster.

See getting started in the
pins documentation for more information.

Interoperability with R pins

Pins stored with Python can be read with R, and vice-versa.

For example, here is R code that reads the mtcars pin we wrote to the board above.
Note that TEMP_PATH refers to the temporary directory we created in this blog post for our Python board.

library(pins)

board <- board_folder(TEMP_PATH)
board %>% pin_read("mtcars")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

This is especially useful when colleagues prefer one language over the other. For real collaborative work like this, you would use a board like board_rsconnect() or board_s3().

Going further

The real power of pins comes when you share a board with multiple people.
To get started, you can use board_folder() with a directory on a shared
drive or in DropBox, or if you use
RStudio Connect you can use
board_rsconnect():

board = pins.board_rsconnect()
board.pin_write(tidy_sales_data, "michael/sales-summary", type="csv")

Then, someone else (or an automated report) can read and use your
pin:

board = pins.board_rsconnect()
board.pin_read("michael/sales-summary")

The pins package also includes boards that allow you to share data on
services like Amazon’s S3 (board_s3()), with plans to support other backends such as Google Cloud Storage and Azure’s blob storage.

Get in touch

We are so happy about releasing pins for Python, and we want
to make sure it supports your workflow. Join our discussion on
RStudio Community to let us know what you’re working on,
and how pins could help!

To leave a comment for the author, please follow the link and comment on their blog: Python on RStudio .

Want to share your content on python-bloggers? click here.