Best Practices For Publication Of mRNA-Seq Data
1
4
Entering edit mode
12.2 years ago
Dave Bridges ★ 1.4k

Unless I am mistaken, there does not appear to be a stable repository for archiving RNAseq data, akin to how GEO and ArrayExpress archive microarray data. We are preparing to publish some mRNASeq results and I would like some input as to the best ways to both present and archive the raw and processed data. Currently the thinking (aside from GSEA in the paper) is to include the cufflinks processed data as a supplementary table. The main questions I have are:

  • How to make the raw short-reads available?
  • Is it worthwhile to make the aligned (bam) files available and if so, how?
  • Other than software, software versions, non-default parameters and hardware information, what technical data should be provided in the manuscript
  • If I am going to archive the short reads or alignment files, what is the best way to attach the relevant metadata about the samples?
publication rna-seq next-gen • 5.0k views
ADD COMMENT
8
Entering edit mode
12.2 years ago

I have submitted RNA-seq data to GEO (http://www.ncbi.nlm.nih.gov/geo/info/seq.html) and ArrayExpress (http://www.ebi.ac.uk/microarray/doc/help/UHTS_submissions.html). They have pretty detailed standards for how to submit, including how to set up the metadata, so I think you should start there.

Whether to provide BAM files or not is a matter of taste, I think. Obviously whether people will use them depends on how much they trust the aligner you've used. I have personally used BAM files from GEO because I didn't think it was worth the effort to remap them.

I think the same goes for providing Cufflinks results. Of course it is helpful in relation to your article, because people will find it necessary if they want to reproduce the results in your paper. If they want to re-analyze your data, they will probably run their own tools on them.

GEO and ArrayExpress will, as far as I recall, allow you to upload BAM files and processed data tables (like Cufflinks output) together with the FASTQ files.

So in short, to your questions:

  • Upload to ArrayExpress or GEO
  • It might be - upload those as well to the same place
  • If you are doing differential expression analysis, explain in detail how that was done. Can't think of anything else right now
  • See instructions from ArrayExpress or GEO
ADD COMMENT
2
Entering edit mode

all true. Just "Alignment files (e.g. BAM, SAM) should not be supplied as processed data files." http://www.ncbi.nlm.nih.gov/geo/info/seq.html

I think they think its too much.

ADD REPLY
1
Entering edit mode

Any sequence files submitted to GEO end up in SRA with a link.

ADD REPLY
0
Entering edit mode

Ah, I missed that. Thanks. Still, I'm positive I have used alignment files from GEO before (may have been Eland output rather than BAM/SAM) so maybe they used to allow it but feel it's too much now (which is understandable).

ADD REPLY
0
Entering edit mode

I would also add quantified data (isoform/genes) - makes it a lot easier for the rest of us to quickly test hypothesis on your data.

ADD REPLY

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6