As part of an accepted paper I would like to share both the code and data files so readers can reproduce my results. The code is easy to share via a public repository (e.g. GitHub). However, I am unsure about how to share the data. This is a computational project, so we use published genomic sequence data, process it (mapping/filtering etc), and analyze. I would like to provide the users with the processed data files prior to the downstream analysis (the downstream analysis can be run using the code I will provide). So my question is:
Where can/should I deposit the processed data files for sharing?
Many public scientific data repositories declare that they only accept new data which has not been published already. But the datasets I use have already been published (at least as raw data, or processed with a different pipeline).
Just to be clear, the motivation here is to save users the need to reprocess the entire raw data, which could require significant effort and computational resources.