Question: What Are The 'Copy Number Detection' Tools Out There For Exome Capture Ngs Data.
16
gravatar for Prateek
7.6 years ago by
Prateek1.0k
Boston, MA
Prateek1.0k wrote:

Do you know of any CNV detection tools for NGS paired-end exome data - coverage method (window based) or paired-end mapping method (clustering based)? I am aware its a tough problem to solve and have looked at some tools for whole genome but couldn't find one for exome.

I would also welcome discussion about how existing tool could be re-purposed for exome through post-processing (like ignoring exon boundaries).

Finally, please feel free to point out tools for structural variants (inversions, translocations etc.) too.

ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 7.6 years ago by Prateek1.0k
11
gravatar for Ryan D
7.6 years ago by
Ryan D3.3k
USA
Ryan D3.3k wrote:

Take a look at the supplementary information from the 1000G paper located here:

http://www.nature.com/nature/journal/v470/n7332/full/nature09708.html

They use something like 15-17 algorithms including read-pair analysis (RP), read depth analysis (RD), split read analysis (SR), and sequences assembly (AS).

Those are broken down in Tables 2A and 2B of the supplement: http://www.nature.com/nature/journal/v470/n7332/extref/nature09708-s1.pdf

In brief:

Read depth: Event-wise testing, CNVnator

Read pair: Spanner, PEMer, BreakDancer

Split read: Mosaik, Pindel

PD read pair/read depth: Spanner, Genome STRIP

There is also a 1000 Genomes tutorial on structural variants by Jan Korbel:

Video:

<iframe></iframe>

Slides: http://www.genome.gov/Pages/Research/DER/1000GenomesProjectTutorials/StructuralVariants-JanKorbel.pdf

A bit dated, but it can get you started.

ADD COMMENTlink written 7.6 years ago by Ryan D3.3k
11
gravatar for SBinson
6.9 years ago by
SBinson110
SBinson110 wrote:

cn.MOPS works well for this task: http://nar.oxfordjournals.org/content/40/9/e69

ADD COMMENTlink written 6.9 years ago by SBinson110

Another vote for cn.mops. I wish I had known about it when I supplied my original answer.

ADD REPLYlink written 6.3 years ago by Daniel Swan13k

And another vote for cn.mops, big bonus point for me that it even works with small dataset (5-7 samples).

ADD REPLYlink written 6.1 years ago by ron_veg50

cn.mops performed very well for detecting CNVs in free circulating cancer DNA.

ADD REPLYlink written 5.4 years ago by okko.clevert220

cn.mops works very well for analyzing exom sequencing data from cancer genomes

ADD REPLYlink written 5.4 years ago by sepp.hochreiter0

I've had great luck with CN.mops and it's relatively easy to use, even for an R newbie. Also Günter (the software's author) is very helpful and responsive!

ADD REPLYlink written 4.6 years ago by steven_friedenberg10
7
gravatar for Chris Miller
7.6 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:
ADD COMMENTlink written 7.6 years ago by Chris Miller21k
5
gravatar for Vitis
7.6 years ago by
Vitis2.2k
New York
Vitis2.2k wrote:

There are several strategies to find structural variants (SVs) with genomic or exome NGS data. First, using paired-end data, you can mine the distribution of insert sizes between read pairs and infer SVs by identifying unusual insert sizes. Second, you may scan through the genome/exome to find regions with unusually high and low coverage. This is the only approach with which you can estimate the copy number (don't how accurate that is). Then you can also use the reads that get split when mapping, which may fall into SV regions. Finally, de novo assembly followed by traditional comparative genomics approaches can also help with SV discovery. Of course, you can combine all these approaches together and find the candidates with highest confidence.

I heard CNVnator is a pretty good coverage-based tool for genomic data, but not sure whether it's gonna perform well with the exome data. Considering the size and distribution of exons, split read method seems to be attractive. My personal experience involves a genomic data set, we assembled the genomic reads de novo, and used traditional method like MUMmer to identify the SVs and verified by coverage-based approaches. It works quite well but I don't know how de novo assembly would perform for exome (I heard the Trinity pipeline is rising as a good tool for de novo assembly of transcriptome or exome).

There is nice review on Nature Reviews Genetics. It said everything I mentioned and much more. http://www.nature.com/nrg/journal/v12/n5/full/nrg2958.html

ADD COMMENTlink written 7.6 years ago by Vitis2.2k
2

CNV-calling programs designed for whole-genomes will almost certainly not work on Exomes - the data is sparse, the depths are variable due to capture affinities, etc.

ADD REPLYlink written 7.6 years ago by Chris Miller21k
1

Split read methods will be extremely limiting with exome data. For it to work, it assumes that the breakpoint is within the exon or sighing the roughly 50 to 100 bp of "splash" on either side of the exon. It is far more likely that the breakpoint is rather far away from the exon, yet the event affects the exon as well. Normalizing and comparing depth of coverage among multiple samples is your best bet.

ADD REPLYlink written 7.6 years ago by Aaronquinlan11k
4
gravatar for Daniel Swan
7.6 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

Another vote for ExomeCNV. There's also CNASeg and CNV-Seq (although I'm not sure of their appropriateness for exome data). I've also seen CNVnator mentioned on SeqAnswers in relation to this question, but I think that Chris's point about variable depth means this is certainly a trickier proposition than for WGS.

EDIT:

I've also just seen an abstract for another BioConductor package based on an HMM approach. exomeCopy is the package.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Daniel Swan13k
3
gravatar for Eric T.
4.6 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

Are you looking for CNVs in a population, or disease-causing copy number alterations in individual tumor or constitutional samples?

For the former, most of the answers already posted here, including cn.MOPS, will do.

For the latter, particularly tumor samples, CNVkit is a program I wrote recently that performs well.

There are lots of these tools tailored for slightly different purposes, and it's a good idea to look for recent papers that independently benchmark several of them at once.

ADD COMMENTlink written 4.6 years ago by Eric T.2.5k
2
gravatar for fromer
6.2 years ago by
fromer20
fromer20 wrote:

We've written the XHMM software for calling CNV from exomes: http://atgu.mgh.harvard.edu/xhmm/

Our paper describing this was published last year in AJHG: http://www.cell.com/AJHG/abstract/S0002-9297%2812%2900417-X

ADD COMMENTlink written 6.2 years ago by fromer20
1

Can you say how many BAM files would be required to have reliable calling?

ADD REPLYlink written 6.2 years ago by Ryan D3.3k
0
gravatar for chongchu.cs
4.6 years ago by
chongchu.cs10
United States
chongchu.cs10 wrote:

We have a tool for calling genotypes of insertions and deletions for WGS. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0113324

 

ADD COMMENTlink written 4.6 years ago by chongchu.cs10

I'll be interesting on testing your algorithm but will it work also for exome. Otherwise, i'll look at the source code to know how it works.

By the way, i'm analysing wildtype/tumor samples 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by djtilyon0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 953 users visited in the last hour