7.1 years ago by
Washington University, St Louis, USA
Ironically, the answer to your question might be GEO. Other than SRA its the largest collection of RNAseq data that I know of. I'm not sure what platform you are looking for or what species your cell line is for. But, lets assume you want Illumina RNAseq data for human lines. You might start by searching GEO platforms for "Illumina homo sapiens". This identifies 8 platforms, three of which have substantial numbers of samples submitted to GEO:
- GPL9115: Illumina Genome Analyzer II (Homo sapiens) = 3466 samples
- GPL10999: Illumina Genome Analyzer IIx (Homo sapiens) = 2274 samples
- GPL11154: Illumina HiSeq 2000 (Homo sapiens) = 1695 samples
You can then search for one of these platforms plus the name of your cell line of interest and hope you get lucky. An example query might look like:
http://www.ncbi.nlm.nih.gov/gds?term=(GPL9115[GEO Accession]) AND MCF7
Another option is to search for records where the Platform Technology Type = "high-throughput sequencing":
http://www.ncbi.nlm.nih.gov/gds?term=(high-throughput sequencing[Platform Technology Type]) AND MCF7
NOTE: GEO seems to still define "platforms" in the next-gen-sequence space quite crudely by simply the sequencer and not the type of sequencing done. A GEO platform of GPL96 (Affymetrix U133A) would definitely indicate an RNA expression dataset with clearly defined parameters. But, the platform of GPL9115 might (and does) indicate any of RNA-seq, ChIP-seq, miRNA sequencing, ChIA-PET, DamIP-seq, bisulfite sequencing, etc. To say nothing of differences in read length, paired vs single-end, polyA selection method, etc. So read carefully before proceeding with any dataset.
Finally, if you know for a fact that your special cell line has been RNA-seq'd but can't find it in SRA or GEO you may have to contact the authors (if the study has been published). Many NGS studies are still not being made available. But, they should be...