Question: Rna-Seq Data In Public Database
4
gravatar for camelbbs
6.6 years ago by
camelbbs650
China
camelbbs650 wrote:

I just want to ask if there is a database like GEO storing microarray data, that stores RNA-seq data and is public.

I know SRA, but the source is not enough. Any other better one? I want to search for some special cell lines that have been sequenced by RNA-seq. Where I can find them? Thanks!!!

rna-seq • 19k views
ADD COMMENTlink modified 5.8 years ago by alaincoletta110 • written 6.6 years ago by camelbbs650

InSilico DB has just released a beta integration with Ingenuity iReport. You can export public RnaSeq data from GEO/SRA and get a free iReport preview. 

ADD REPLYlink written 5.0 years ago by alaincoletta110
6
gravatar for Obi Griffith
6.6 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

Ironically, the answer to your question might be GEO. Other than SRA its the largest collection of RNAseq data that I know of. I'm not sure what platform you are looking for or what species your cell line is for. But, lets assume you want Illumina RNAseq data for human lines. You might start by searching GEO platforms for "Illumina homo sapiens". This identifies 8 platforms, three of which have substantial numbers of samples submitted to GEO:

  • GPL9115: Illumina Genome Analyzer II (Homo sapiens) = 3466 samples
  • GPL10999: Illumina Genome Analyzer IIx (Homo sapiens) = 2274 samples
  • GPL11154: Illumina HiSeq 2000 (Homo sapiens) = 1695 samples

You can then search for one of these platforms plus the name of your cell line of interest and hope you get lucky. An example query might look like: http://www.ncbi.nlm.nih.gov/gds?term=(GPL9115[GEO Accession]) AND MCF7

Another option is to search for records where the Platform Technology Type = "high-throughput sequencing": http://www.ncbi.nlm.nih.gov/gds?term=(high-throughput sequencing[Platform Technology Type]) AND MCF7

NOTE: GEO seems to still define "platforms" in the next-gen-sequence space quite crudely by simply the sequencer and not the type of sequencing done. A GEO platform of GPL96 (Affymetrix U133A) would definitely indicate an RNA expression dataset with clearly defined parameters. But, the platform of GPL9115 might (and does) indicate any of RNA-seq, ChIP-seq, miRNA sequencing, ChIA-PET, DamIP-seq, bisulfite sequencing, etc. To say nothing of differences in read length, paired vs single-end, polyA selection method, etc. So read carefully before proceeding with any dataset.

Finally, if you know for a fact that your special cell line has been RNA-seq'd but can't find it in SRA or GEO you may have to contact the authors (if the study has been published). Many NGS studies are still not being made available. But, they should be...

ADD COMMENTlink written 6.6 years ago by Obi Griffith17k

thanks a lot ..............

ADD REPLYlink written 6.6 years ago by camelbbs650

BTW, another thing. Does GEO include all the SRA info (except the data) ? When I check SRA, I found they have GEO query.

ADD REPLYlink written 6.6 years ago by camelbbs650

I'm not sure. But, I suspect you will find a whole variety of situations where sometimes data is in SRA and linked from GEO or vice versa and other times data has just been submitted to one or the other (or neither).

ADD REPLYlink written 6.6 years ago by Obi Griffith17k
3
gravatar for Markus Krupp
6.6 years ago by
Markus Krupp100
Markus Krupp100 wrote:

...just a little addition to Obi's reply.

You can use the GEO advanced search interface: http://www.ncbi.nlm.nih.gov/gds/advanced/ ...here you can choose between several fields and also an option to list the corresponding indices.

e.g. choosing the field "Platform Technology Type" and clicking "show index list" will end up with 26597 entries when selecting "high throughput sequencing" index.

...use combinations of those options within the advanced search interface and you will end up with a good repertoire of RNA-seq data.

ADD COMMENTlink written 6.6 years ago by Markus Krupp100

Thanks, that's helpful!

ADD REPLYlink written 6.6 years ago by camelbbs650
2
gravatar for Istvan Albert
6.6 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

Running a large scale data distribution service is an expensive operation. Since paying for a data download is not something people would do it shouldn't come as a surprise that there aren't that many options to choose from.

Beside SRA the only large scale data source that comes to mind are the Encode data downloads.

ADD COMMENTlink written 6.6 years ago by Istvan Albert ♦♦ 80k

Thanks. I will SRA maybe the one.

ADD REPLYlink written 6.6 years ago by camelbbs650
2
gravatar for alaincoletta
5.8 years ago by
alaincoletta110
Belgium
alaincoletta110 wrote:

Check InSilico DB (https://insilicodb.org): 100,000s of manually curated profiles pre-processed and ready to analyse freely available. RNA-Seq data is pre-processed with tophat-cufflinks-cuffdiff-cummeRbund. and ready to analyse. See https://insilicodb.org/differential-gene-expression-heatmap-from-rnaseq-data-using-cummerbund/ for a step-by-step example. The data comes from GEO and SRA, but it's been curated and pre-processed.

Highly accessed Genome Biology paper: http://genomebiology.com/2012/13/11/R104

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by alaincoletta110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 757 users visited in the last hour