How to submit a genomics package with lots of data to CRAN
3.0 years ago
mk ▴ 230

I've got a package that I'd like to submit to CRAN, but the data is huge, about 20 MB of compressed internal (used for unit tests, not exported) and about 7 MB of compressed exported data (used in vignettes).

I could maybe cut this down a bit by altering my compression but not by much. This package offers a pipeline that involves multiple manifold learning, classification, and pathway inference steps, and the testing involves high-dimensional objects (caveat: although currently in the 10 MB range, these could be made much smaller and still serve their purpose).

It's my understanding that package data (both in /data and R/sysdata.rda) should not exceed 5 MB in size. What are my options?

3.0 years ago

You should create a separate data package along with your main pkg

Thanks @Satosh Anand. It looks like Dave Kleinschmidt has permanently hosted the data package in that example on github, since it can't be hosted in CRAN.

I've done a bit of digging and found the following here:

Packages on which a CRAN package depends should be available from a mainstream repository: if any mentioned in ‘Suggests’ or ‘Enhances’ fields are not from such a repository, where to obtain them at a repository should be specified in an ‘Additional_repositories’ field of the DESCRIPTION file (as a comma-separated list of repository URLs) or for other means of access, described in the ‘Description’ field. A package listed in ‘Suggests’ or ‘Enhances’ should be used conditionally in examples or tests if it cannot straightforwardly be installed on the major R platforms

According to this post on SO, it seems that Github would qualify as one "mainstream repository" required by the Policy.


