Tool: Human NGS Cancer Data for tool development, algorithm benchmarking, teaching, pipeline evaluation, etc.
gravatar for Malachi Griffith
3.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

We recently published a paper and made available a comprehensive human NGS cancer dataset for tool development, algorithm benchmarking, teaching, pipeline evaluation, etc.

This data is available for download directly from our FTP site

Briefly, we sequenced a breast cancer cell line and matched normal lymphoblastoid cell line derived from the same individual.  WGS, exome and RNA-seq data was produced for both of these samples.  The data is all 2x100 bp Illumina reads from the HiSeq2000 platform.

A total of 10 lanes of HiSeq 2000 (v3 chemistry) sequence data consisting of ~1.8 billion 2x100bp reads were produced for HCC1395 and HCC1395/BL. Whole genome sequencing, exome sequencing and RNA-seq were performed as described previously. HCC1395 and HCC1395/BL were sequenced to average coverage levels of 56x (WGS)/155x (exome) and 31X (WGS)/124x (exome), respectively. RNA sequencing achieved 20x coverage of >50% of known junctions for 8,640 genes for HCC1395 and 9,437 genes for HCC1395/BL respectively. (source)     

We provide this data in several versions.  One is all of the data, but we also provide versions that have been downsampled to 1/100th, 1/1000th, and exome only.

A detailed description of all data files is provided here.

We describe a basic analysis of this data in the publication listed below.  While this data represents only a single tumor/normal pair, we hope that this data will be useful to people who are: (a) developing alignment or variant calling algorithms/tools, (b) running educational workshops, and (c) benchmarking pipelines.  

If you find this data useful, please cite:

PLoS Comput Biol. 2015 Jul 9;11(7) (full open access article). 

data tool benchmarking cancer wgs • 2.2k views
ADD COMMENTlink modified 3.5 years ago • written 3.6 years ago by Malachi Griffith17k

I noticed that you have showed some examples for the further integration. "for example, identify which variants at the DNA level are expressed at the RNA level and which events affect known cancer driver genes or druggable targets." Is there other methods or ideas to deeply integrate the WGS, exome and RNA-seq data? Thank you.

ADD REPLYlink written 3.5 years ago by Zhilong Jia1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1253 users visited in the last hour