Question

General Question: Human Cnv/Structural Variants Algorithms Using Next-Gen Data Cannot Reach Consensus

3

Entering edit mode

12.1 years ago

Bioscientist ★ 1.7k

This is pretty much a general question in human CNV/Structural variants field (with next-gen data, NOT arrays).

As shown in 1000genome project, groups develop different algorithm-based approach to identify structural variants (mainly three algorithms: paired-end, read-depth and split-read).

However results from these approaches barely overlap with each other (of course they have different preferences, say, split-read is powerful for those small indels); and seems the false positive is quite high (or we simply don't know their false positive, because we cannot use alternative approach to validate those small structural variants like we use array CGH for large ones)

Or in simple words, I don't trust even those mainstream, or widely used approaches like Breakdancer, CNVnator (I only relatively show confidence in Pindels, because it provides nucleotide-resolution breakpoints). Do you trust them?

If not, then what should we do? To carry out some post-processing or filtering to reduce the potential false positive? For example, to adjust the read-depth threshold for read-depth-based approaches; or only limit our attention to calls supported by uniquely-mapping discordant paired-end reads for paired-end-based approaches?

Or do we need to develop our own codes for our specific research? What softwares do you guys use? (say CNVnator, Breakdancer)

Personally I would say, when someday sequencing is powerful enough to accurately produce long-enough reads, then we can say goodbye to these mapping-based methods, because we can simply assemble all reads, also in the absence of problems caused by repetitive sequences in human genome.

cnv structural next-gen sequencing • 3.2k views

ADD COMMENT • link updated 12.1 years ago by Dm Church ▴ 30 • written 12.1 years ago by Bioscientist ★ 1.7k

score 0 · Answer 1 · 2012-03-21

0

Entering edit mode

12.1 years ago

Dm Church ▴ 30

Calling structural variants is indeed challenging, and software is being developed and tweaked all of the time. This is why dbVar (http://www.ncbi.nlm.nih.gov/dbvar) tries to capture the experimental evidence (as best as it can) that went into the variant calls. This repository also allows for collections of studies so you can start doing meta analysis and comparison of different studies and methods.

ADD COMMENT • link 12.1 years ago by Dm Church ▴ 30

0

Entering edit mode

thx. But about dbvar, I think comparison with dbvar is based on the hypothesis that human structural variants are mostly common SVs, then we are expecting our identified SVs can be found in the database. What if most of our SVs are rare ones? Has this hypothesis been proved?

ADD REPLY • link 12.1 years ago by Bioscientist ★ 1.7k