Data Set Suitable For Comparing Wgs, Exome, And Rna-Seq Data Generated From The Same Samples
Entering edit mode
10.1 years ago

Is anyone aware of a publication or pre-publication data set that includes the following for a single human tumor/normal sample pair:

  • Illumina whole genome sequence (WGS) to at least 30-50X depth for both tumor and normal (e.g. blood). Preferably this would be HiSeq paired end data generated with v3 chemistry
  • Illumina exome sequence data for the same tumor/normal pair but to considerably higher depth (say at least 150-200X)
  • Illumina RNA-seq data generated for the same tumor sample. Bonus points if there is also RNA-seq for a matched normal sample (same tissue as the tumor as opposed to blood DNA that would typically be used for a normal comparison at the DNA level).

Since this data would primarily be used for methods development, the tumor/normal pair could be from a tumor cell line and matched lymphoblastoid 'normal' cell line derived from the same individual. Some data along these lines is being made available here:

TCGA Mutation/Variation Calling Benchmark 4 at CGHub

However, this is whole genome data only. No exome or RNA-seq data yet. Plenty of RNA-seq data can be found in GEO and elsewhere but I'm not aware of any projects where there is corresponding WGS or Exome data. Plenty of exome data and WGS data are being generated for TCGA but again I'm not aware of any publications describing combinations of all three types.

Large scale cancer sequencing projects that might have performed such a comparison:
The Cancer Genome Atlas (TCGA)
Cancer Genome Project (CGP)
International Cancer Genome Consortium (ICGC)

exome rna-seq data wgs • 7.1k views
Entering edit mode

Didn't explore it in detail, but I think this is what ICGC is trying to do - put together all data in one place for the same set of patients. Dataset summary

Entering edit mode

Hi Malachi,

Did you find some rnaseq data from normal/tumor pair on any cancers, which are processed.

Entering edit mode

But can I download matched normal/tumor paired exome data from icgc ? I want to work with bam files, is it possible to download them from icgc? I tried from TCGA but for WES the samples are not open. I want to get access to normal/tumor serous ovarian cancer exome data. I want the aligned files. TCGA does not have it open but does any other portal have them open?

Entering edit mode
7.3 years ago

We eventually created such a data set ourselves and made it publicly available via FTP here

The data are from a matched tumor/'normal' pair of cell lines: HCC1395 and HCC1395/BL whole genome (WGS), exome, and/or RNA-seq data. All data are 2x100 bp reads generated on an Illumina HiSeq 2000 instrument. The exome data was generated by use of a NimbleGen SeqCap EZ Human Exome Library v3.0 reagent.

If you find this data useful, please cite:

Griffith et al. Genome Modeling System: A Knowledge Management Platform for Genomics. PLoS Comput Biol. 2015 Jul 9;11(7):e1004274. doi: 10.1371/journal.pcbi.1004274. eCollection 2015 Jul. PubMed PMID: 26158448;

Since these data corresponds to cell line material we were able to make them available without using dbGaP.


Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6