All Projects β†’ ropensci β†’ Piggyback

ropensci / Piggyback

Licence: other
πŸ“¦ for using large(r) data files on GitHub

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Piggyback

Available
Check if a package name is available to use
Stars: ✭ 116 (-4.92%)
Mutual labels:  r-package, rstats
Rgbif
Interface to the Global Biodiversity Information Facility API
Stars: ✭ 113 (-7.38%)
Mutual labels:  r-package, rstats
Elevatr
An R package for accessing elevation data
Stars: ✭ 95 (-22.13%)
Mutual labels:  r-package, rstats
Gistr
Interact with GitHub gists from R
Stars: ✭ 90 (-26.23%)
Mutual labels:  r-package, rstats
Roomba
General purpose API response tidier
Stars: ✭ 117 (-4.1%)
Mutual labels:  r-package, rstats
Refmanager
R package RefManageR
Stars: ✭ 90 (-26.23%)
Mutual labels:  r-package, rstats
Gramr
RStudio Addin, function, & shiny app for the write-good linter πŸ“
Stars: ✭ 116 (-4.92%)
Mutual labels:  r-package, rstats
Rzmq
R package for ZMQ
Stars: ✭ 83 (-31.97%)
Mutual labels:  r-package, rstats
Modistsp
An "R" package for automatic download and preprocessing of MODIS Land Products Time Series
Stars: ✭ 118 (-3.28%)
Mutual labels:  r-package, rstats
Rorcid
A programmatic interface the Orcid.org API
Stars: ✭ 101 (-17.21%)
Mutual labels:  r-package, rstats
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+966.39%)
Mutual labels:  r-package, rstats
Ssh
Native SSH client in R based on libssh
Stars: ✭ 111 (-9.02%)
Mutual labels:  r-package, rstats
Trackmd
Tools for tracking changes in Markdown format within RStudio
Stars: ✭ 89 (-27.05%)
Mutual labels:  r-package, rstats
Ckanr
R client for the CKAN API
Stars: ✭ 91 (-25.41%)
Mutual labels:  r-package, rstats
Git2rdata
An R package for storing and retrieving data.frames in git repositories.
Stars: ✭ 84 (-31.15%)
Mutual labels:  r-package, rstats
Pkgverse
πŸ“¦πŸ”­πŸŒ  Create your own universe of packages Γ  la tidyverse
Stars: ✭ 108 (-11.48%)
Mutual labels:  r-package, rstats
Qcoder
Lightweight package to do qualitative coding
Stars: ✭ 82 (-32.79%)
Mutual labels:  r-package, rstats
Spelling
Tools for Spell Checking in R
Stars: ✭ 82 (-32.79%)
Mutual labels:  r-package, rstats
Monkeylearn
⛔️ ARCHIVED ⛔️ πŸ’ R package for text analysis with Monkeylearn πŸ’
Stars: ✭ 95 (-22.13%)
Mutual labels:  r-package, rstats
Umapr
UMAP dimensionality reduction in R
Stars: ✭ 115 (-5.74%)
Mutual labels:  r-package, rstats

piggyback

lifecycle Travis-CI Build Status Coverage status AppVeyor build status CRAN status Peer Review Status DOI DOI

Because larger (> 50 MB) data files cannot easily be committed to git, a different approach is required to manage data associated with an analysis in a GitHub repository. This package provides a simple work-around by allowing larger (up to 2 GB per file) data files to piggyback on a repository as assets attached to individual GitHub releases. These files are not handled by git in any way, but instead are uploaded, downloaded, or edited directly by calls through the GitHub API. These data files can be versioned manually by creating different releases. This approach works equally well with public or private repositories. Data can be uploaded and downloaded programmatically from scripts. No authentication is required to download data from public repositories.

Installation

Install from CRAN via

install.packages("piggyback")

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ropensci/piggyback")

Quickstart

See the piggyback vignette for details on authentication and additional package functionality.

Piggyback can download data attached to a release on any repository:

library(piggyback)
pb_download("data/mtcars.tsv.gz", repo = "cboettig/piggyback-tests", dest = tempdir())

Downloading from private repos or uploading to any repo requires authentication, so be sure to set a GITHUB_TOKEN (or GITHUB_PAT) environmental variable, or include the .token argument. Omit the file name to download all attached objects. Omit the repository name to default to the current repository. See introductory vignette or function documentation for details.

We can also upload data to any existing release (defaults to latest):

## We'll need some example data first.
## Pro tip: compress your tabular data to save space & speed upload/downloads
readr::write_tsv(mtcars, "mtcars.tsv.gz")

pb_upload("mtcars.tsv.gz", repo = "cboettig/piggyback-tests")

Tracking data files

For a Git LFS style workflow, just specify the type of files you wish to track using pb_track(). Piggyback will retain a record of these files in a hidden .pbattributes file in your repository, and add these to .gitignore so you don’t accidentally commit them to GitHub. pb_track will also return a list of such files that you can easily pass to pb_upload():

# track csv files, compressed data, and geotiff files:
pb_track(c("*.csv", "*.gz", "*.tif")) %>%
pb_upload()

You can easily download the latest version of all data attached to a given release with pb_download() with no file argument (analogous to a git pull for data):

pb_download()

Git LFS and other alternatives

piggyback acts like a poor soul’s Git LFS. Git LFS is not only expensive, it also breaks GitHub’s collaborative model – basically if someone wants to submit a PR with a simple edit to your docs, they cannot fork your repository since that would otherwise count against your Git LFS storage. Unlike Git LFS, piggyback doesn’t take over your standard git client, it just perches comfortably on the shoulders of your existing GitHub API. Data can be versioned by piggyback, but relative to git LFS versioning is less strict: uploads can be set as a new version or allowed to overwrite previously uploaded data.

But what will GitHub think of this?

GitHub documentation at the time of writing endorses the use of attachments to releases as a solution for distributing large files as part of your project:

Of course, it will be up to GitHub to decide if this use of release attachments is acceptable in the long term.

Also see our vignette comparing alternatives.


Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].