Question

Cnv Detection - Alignment Files With Reported Cnv Needed

2

Entering edit mode

12.5 years ago

Pascal ★ 1.5k

Hi.

I'm writing a simple CNV detector (DoC method based) and I would like to verify it's working well. Could you recommend some alignment .bam files with reported CNV I could test my prototype with?

Regards.

cnv • 3.2k views

ADD COMMENT • link updated 12.5 years ago by Ryan D ★ 3.4k • written 12.5 years ago by Pascal ★ 1.5k

Ram · Answer 1 · 2011-11-16

If you want to look at established and polymorphic CNPs that are known to exist, there are a number of files at the Database of Genomic Variants which can be downloaded and modified. Many of them use the most common reference samples from Hapmap and have finely mapped and validated losses and gains. The website at which to download these is here:

http://projects.tcag.ca/variation/tableview.asp?table=DGV_Content_Summary.txt

Also the current table matching hg19 from DGV has a lot of CNVs and the method by which they were identified:

http://projects.tcag.ca/variation/downloads/variation.hg19.v10.nov.2010.txt.

As Chris said, if you have some array data on your samples and you have some of the well-characterized ones, you will have an easier time validating whether or not your stuff is working. The "gold standard" of CNVs used by one paper (for array, not sequencing data) is described here:

http://www.nature.com/nbt/journal/v29/n6/full/nbt.1852.html#/supplementary-information

You may also check the answer to this questions and potentially try seeing if any gold standard sample BAM file from 1000G might suit your purposes:

What Are The 'Copy Number Detection' Tools Out There For Exome Capture Ngs Data.

Ram · Answer 2 · 2011-11-02

2

Entering edit mode

12.5 years ago

Chris Miller 22k

I'm not aware of any "gold standard" CNV bams. (though it's certainly possible that I just haven't heard about them).

The usual approach to evaluating performance of CNV detection that I've seen in the literature is two fold:

1) Simulation: generate some reads that contain a CNV. Ideally, this should take into account a bunch of factors like GC-bias, mapability, and random variance in the coverage of the genome. Can your method reliably detect these at different CNV sizes and depths of coverage?

2) Concordance with arrays: Look at a sample that has both sequencing reads and high-resolution array data (Affy SNP 6.0 is common, but high-res Agilent or Illumina arrays would be fine too). Make sure that you're detecting the same events, at least at the gross scale.

ADD COMMENT • link 12.5 years ago by Chris Miller 22k

0

Entering edit mode

You'll also probably want to look at this previous question, which gives much the same answer: Validated Copy Number Variation(Cnv) Standard

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 12.5 years ago by Chris Miller 22k

0

Entering edit mode

Thanks Chris for the answer and for pointing to the other thread! If you have any recommendation for simulating CNV (point 1 of your answer) feel free to tell me.

ADD REPLY • link 12.5 years ago by Pascal ★ 1.5k