Question: Publicly Available Tumor/Normal Illumina Data For Evaluation Of Somatic Variant Callers
gravatar for Malachi Griffith
7.8 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

Is anyone aware of some publicly available paired-end Illumina data that is unencumbered by data use agreements so that they could be used in a teaching context?

I would need the data to correspond to a matched tumor/normal sample pair. Ideally sequenced on the Illumina HiSeq platform.

There are of course many, many papers describing analysis of these kinds of data. My own center has many such data sets. But they are all significantly encumbered by data access restrictions that prevent them from being used publicly or disseminated as part of a course/workshop. I need to be able use the data in classes and ideally for the students to be able to access that data later to practice their skills on.

Perhaps a tumor cell line that was sequenced along with an EBV transformed lymphoblastoid 'normal' cell line from the same individual? Or perhaps a patient sample that was consented in very broad terms with the express purpose of making the data publicly available with minimal restrictions on use (including explicit approval for teaching purposes)?

Please feel free to suggest something that doesn't quite meet might my criteria but you think is likely to be the next best thing.

somatic variant • 4.5k views
ADD COMMENTlink modified 2.6 years ago by yhrytsenko0 • written 7.8 years ago by Malachi Griffith18k

Does anyone know publicly available data (ex: fastq files) for tumor/normal pairs from mice? Thank you in advance!

ADD REPLYlink written 2.6 years ago by yhrytsenko0
gravatar for Irsan
7.8 years ago by
Irsan7.2k wrote:

You can go to the SRA from NCBI and search for SRA-entries with filters on DNA, whole genome and public access. When I did that I crossed a study with tumor-normal pairs here You might want to install the Aspera client for file transfer (faster downloads) but you can also download the data without

If it is not critical to have tumor-normal pairs you can try the data from the 1000 genomes project available from the ftp. Every directory from there corresponds to an individual providing you with the original reads and (exome) alignment. No tumor-normal pairs though but no restriction on the data use.

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Irsan7.2k

That link goes to cancer cell line data. Anyone know of public matched tumor/normal sample data?  Cell lines tend to acquire traits that are not in the original sample. Thanks.

ADD REPLYlink written 5.3 years ago by fwuffy100

Were you able to find any such study ?

ADD REPLYlink written 4.2 years ago by always_learning1.0k

Here's a study that fits the original request -- freely released tumor-normal pairs from 7 patients: An open access pilot freely sharing cancer genomic data from participants in Texas

ADD REPLYlink written 4.1 years ago by Eric T.2.6k

@Eric, do you by any chance came across data for tumor-normal pairs from mice? Thank you!

ADD REPLYlink written 2.6 years ago by yhrytsenko0

The sequencing data for benign, primary tumor and metastatic samples from 103 mice from McCreery et al. 2015 is available through ENA:

ADD REPLYlink written 2.6 years ago by Eric T.2.6k

This is a very nice dataset. I needed to check Tumor ID in Supplementary Table 1 and "Submitter's sample name" on EBI to be able to match the annotation of samples and know which fastq files belong to which class of tumor. On EBI website, you click on "Select columns" and choose "Submitter's sample name".

ADD REPLYlink modified 20 months ago • written 20 months ago by othman.soufan0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1953 users visited in the last hour