Question: Best Practices For Publication Of mRNA-Seq Data
Dave Bridges
Dave Bridges wrote:

Unless I am mistaken, there does not appear to be a stable repository for archiving RNAseq data, akin to how GEO and ArrayExpress archive microarray data. We are preparing to publish some mRNASeq results and I would like some input as to the best ways to both present and archive the raw and processed data. Currently the thinking (aside from GSEA in the paper) is to include the cufflinks processed data as a supplementary table. The main questions I have are:

  • How to make the raw short-reads available?
  • Is it worthwhile to make the aligned (bam) files available and if so, how?
  • Other than software, software versions, non-default parameters and hardware information, what technical data should be provided in the manuscript
  • If I am going to archive the short reads or alignment files, what is the best way to attach the relevant metadata about the samples?
Mikael Huss
Mikael Huss wrote:

I have submitted RNA-seq data to GEO ( and ArrayExpress ( They have pretty detailed standards for how to submit, including how to set up the metadata, so I think you should start there.

Whether to provide BAM files or not is a matter of taste, I think. Obviously whether people will use them depends on how much they trust the aligner you've used. I have personally used BAM files from GEO because I didn't think it was worth the effort to remap them.

I think the same goes for providing Cufflinks results. Of course it is helpful in relation to your article, because people will find it necessary if they want to reproduce the results in your paper. If they want to re-analyze your data, they will probably run their own tools on them.

GEO and ArrayExpress will, as far as I recall, allow you to upload BAM files and processed data tables (like Cufflinks output) together with the FASTQ files.

So in short, to your questions:

  • Upload to ArrayExpress or GEO
  • It might be - upload those as well to the same place
  • If you are doing differential expression analysis, explain in detail how that was done. Can't think of anything else right now
  • See instructions from ArrayExpress or GEO
all true. Just "Alignment files (e.g. BAM, SAM) should not be supplied as processed data files."

all true. Just "Alignment files (e.g. BAM, SAM) should not be supplied as processed data files."

I think they think its too much.

I think they think its too much.

ADD REPLYlink written 6.4 years ago by Ido Tamir

Any sequence files submitted to GEO end up in SRA with a link.

ADD REPLYlink written 5.7 years ago by Sean Davis25k

Ah, I missed that. Thanks. Still, I'm positive I have used alignment files from GEO before (may have been Eland output rather than BAM/SAM) so maybe they used to allow it but feel it's too much now (which is understandable).

ADD REPLYlink written 6.4 years ago by Mikael Huss4.6k

I would also add quantified data (isoform/genes) - makes it a lot easier for the rest of us to quickly test hypothesis on your data.

ADD REPLYlink written 17 months ago by kristoffer.vittingseerup1.6k
