Question: Variant Control Data
gravatar for richardc.gsc
6.4 years ago by
richardc.gsc150 wrote:

Hi all,

We have a few in house genomic samples where we validated a couple of hundred variants. To test new tools that are forever being published we could really use a control data set where there is a large number (thousands?) of validated variants. Ideally, the variants would be validated in a publicly available cell line so that we could run samples on our own machines and use the control data to calibrate any new tools we put in to the pipeline.

Is anyone aware of a good sample, with validated variants to use to calibrate machines and bioinformatic tools?

I realize we could use simulated data for this but right now I'm just interested in real data that we could generate with our sequencers.


variant-calling • 1.2k views
ADD COMMENTlink modified 6.4 years ago by Jordan1.1k • written 6.4 years ago by richardc.gsc150
gravatar for Dan Gaston
6.4 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

People often use NA12878 from the 1000 genomes project for validating their SNP calling algorithms, probably the most extensively sequenced genome on the planet. Multiple technologies and large scale Sanger validation of many variant calls. Daniel MacArthur used it and validated a ton of LOF variants for example. Might want to start there as it is a commonly used sample and publicly available.

ADD COMMENTlink written 6.4 years ago by Dan Gaston7.1k
gravatar for Jordan
6.4 years ago by
Jordan1.1k wrote:

I think you should take a look at TCGA data. Go to Data Matrix and select any kind of cancer and which platforms you need.

For e.g., if you need only somatic variants, select Somatic Mutations under Data Type, Availability: Available, and you can select Tumor Matched or Normal Matched, based on your needs. Then just select the ones you want to download. I think the Somatic Mutations are in .maf format.

Hope this helps.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Jordan1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2018 users visited in the last hour