Question: Cnv Detection - Alignment Files With Reported Cnv Needed
gravatar for Pascal
7.9 years ago by
Pascal1.5k wrote:


I'm writing a simple CNV detector (DoC method based) and I would like to verify it's working well. Could you recommend some alignment .bam files with reported CNV I could test my prototype with?


cnv • 2.3k views
ADD COMMENTlink written 7.9 years ago by Pascal1.5k
gravatar for Ryan D
7.8 years ago by
Ryan D3.3k
Ryan D3.3k wrote:

If you want to look at established and polymorphic CNPs that are known to exist, there are a number of files at the Database of Genomic Variants which can be downloaded and modified. Many of them use the most common reference samples from Hapmap and have finely mapped and validated losses and gains. The website at which to download these is here:

Also the current table matching hg19 from DGV has a lot of CNVs and the method by which they were identified:

As Chris said, if you have some array data on your samples and you have some of the well-characterized ones, you will have an easier time validating whether or not your stuff is working. The "gold standard" of CNVs used by one paper (for array, not sequencing data) is described here:

You may also check the answer to this questions and potentially try seeing if any gold standard sample BAM file from 1000G might suit your purposes:

What Are The 'Copy Number Detection' Tools Out There For Exome Capture Ngs Data.

ADD COMMENTlink modified 1 day ago by RamRS24k • written 7.8 years ago by Ryan D3.3k
gravatar for Chris Miller
7.9 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

I'm not aware of any "gold standard" CNV bams. (though it's certainly possible that I just haven't heard about them).

The usual approach to evaluating performance of CNV detection that I've seen in the literature is two fold:

1) Simulation: generate some reads that contain a CNV. Ideally, this should take into account a bunch of factors like GC-bias, mapability, and random variance in the coverage of the genome. Can your method reliably detect these at different CNV sizes and depths of coverage?

2) Concordance with arrays: Look at a sample that has both sequencing reads and high-resolution array data (Affy SNP 6.0 is common, but high-res Agilent or Illumina arrays would be fine too). Make sure that you're detecting the same events, at least at the gross scale.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Chris Miller21k

You'll also probably want to look at this previous question, which gives much the same answer: Validated Copy Number Variation(Cnv) Standard

ADD REPLYlink modified 1 day ago by RamRS24k • written 7.9 years ago by Chris Miller21k

Thanks Chris for the answer and for pointing to the other thread! If you have any recommendation for simulating CNV (point 1 of your answer) feel free to tell me.

ADD REPLYlink written 7.9 years ago by Pascal1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2165 users visited in the last hour