Question: Small CNV calling (exome deletions and insertions)
0
gravatar for mafonso
6.0 years ago by
mafonso10
Spain
mafonso10 wrote:

Hi all,

I have a question and hope you guys can help me. 

I'm doing a software for a geneticist and she wants to detect deletions and insertions in target exomes in humans. She uses a software where she can see the deepth of coverage in the samples and visually detect if any alteration is present in the exomes of the genes she is interested in.

The problem is that I cannot understand exactly what the name of this problem is. Is it CNV (Copy Number Vatiations) or something else? 

Additionaly I asked what sample I should use as control and she said any other sample that used the same experiment could be used as a control sample. The thing is: if I see a difference between the two samples, in which one is the variation? I don't think that makes a lot of sense. Is there a way to get an average .bam file to use as control?

Thank you very much. I really appreciate any help. 

target cnv variant calling exome • 3.7k views
ADD COMMENTlink modified 6.0 years ago by DG7.1k • written 6.0 years ago by mafonso10

Hello mafonso!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=45033

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 6.0 years ago by Pierre Lindenbaum129k

Hi Pierre,

I'm sorry.. I did not know it was not recommended. Thank you for the advice.

 

ADD REPLYlink written 6.0 years ago by mafonso10
4
gravatar for Kizuna
6.0 years ago by
Kizuna800
France, Paris
Kizuna800 wrote:

Hi mafonso,

You need to define a set of reference samples or control samples, these sample should be genetically solved, thus you need to know their causative mutations, you should not include any unsolved samples among the ctrls. The best controls are those who are solved and their sequencing was done in the same time with the tests.. preferably same batch, or same run, otherwise you will increase your false positive results.

I am using library(ExomeDepth) written in R to detect CNVs from target NGS panels and WES. This library is working fine. I have added a link that summarize a bit the process https://www.dropbox.com/lightbox/homeExomeDepth calculates the count reads of test and ctrl direclty from the .bam files (.bai should also be included)..

All needed information can be found : http://cran.r-project.org/web/packages/ExomeDepth/vignettes/ExomeDepth-vignette.pdf​.

Hope this helps,

Kiz

 

 

ADD COMMENTlink written 6.0 years ago by Kizuna800
1

Hi Kiz,

Thank you very much for your reply. 

I can't open the links though... What do you mean when you say the samples should be solved? Is it that you already know if there are any variations?

I am now using CONTRA and EXCAVATOR. But I can try ExomeDepth too! 

Best,

Mariana

ADD REPLYlink written 6.0 years ago by mafonso10
3
gravatar for Devon Ryan
6.0 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

Insertions or deletions within an exon are not CNVs, they're Indels. A CNV would be a change in copy number of a whole gene/feature (or a region containing multiple features).

Her suggestion of using any other sample that underwent the identical process is correct. You likely have control samples being sequenced along with affected samples, so just use one/all of those. I should also note that what you're describing already exists. What you've described sounds like normal variant calling, for which a LOT of software already exists.

ADD COMMENTlink written 6.0 years ago by Devon Ryan95k

Thanks for the answer. The thing is, she told us it a small CNV, where she looks at the exons and she sees if there is a deletion. 

Yes, I tested CONTRA for CNV detection and the Paper says that it detects exon/small-region CNV. 

Am I doing everything wrong?

 

ADD REPLYlink written 6.0 years ago by mafonso10

I suppose if the entire exon is affected, then that could still count as a CNV. I'd follow the answer from Kizuna and give ExomeDepth a try as well. I'd give that and similar packages a try before bothering to roll my own.

ADD REPLYlink written 6.0 years ago by Devon Ryan95k

Yes, I agree with you... That was what I was trying to do, but the geneticist keeps insisting that it is really easy and you don't need any fancy algorithm to detect the variations. Well, I think she is deluded. 

Thanks again for the response.

ADD REPLYlink written 6.0 years ago by mafonso10

You're unfortunately correct.

ADD REPLYlink written 6.0 years ago by Devon Ryan95k

You can identify obvious depth of coverage issues visually, and sometimes you'll see something this way that a more robust algorithm won't detect for various statistical reasons. But it gets tricky for problematic or noisy regions. When doing CNV analysis with SNP genotyping data, although there are algorithms for identifying it, many geneticists do also manually inspect them. They can be pretty obvious when they are large.

ADD REPLYlink written 6.0 years ago by DG7.1k

You could find zero reads at a site of deletion (if homozygous). Or you could find a few reads by sequence homology and misalignment. If the deletion is heterozygous you might find 50% mean coverage, or maybe that region is more mappable and preferentially amplified and will look like 115% sequence coverage. 

That's why you need a bunch of control samples, to see what the informatic pipeline does to the normal variation at the region. More fancy software will use breakpoints and misalignments, and the best strategy varies by the size of the indel relative to the length of a read.

ADD REPLYlink written 6.0 years ago by karl.stamm3.6k
1
gravatar for Kizuna
6.0 years ago by
Kizuna800
France, Paris
Kizuna800 wrote:

Hi Mariana,

You can find the figure here https://www.dropbox.com/s/3hjwrv7vf9befs0/Untitled.jpg

To be honest I used CONTRA, but I was not highly satisfied.. 

A genetically solved sample is a sample where you have detected the mutations that cause the phenotype.. Thus the reason of sickness is not yet unknown.. 

P.S: to my knowledge, a deletion of one exon is considered as a CNV..

 

 

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Kizuna800
1
gravatar for Charles Warden
6.0 years ago by
Charles Warden7.8k
Duarte, CA
Charles Warden7.8k wrote:

If you have a batch of at least a dozen exomes, I would recommend CoNIFER.

XHMM would be another similar, popular option.

If you only have a single tumor-normal pair, you could try the VarScan somatic copynumber caller, but I tend to prefer using that for larger indels (although some larger exons, or cluster of nearby exons, might be OK)

ADD COMMENTlink modified 8 months ago by RamRS27k • written 6.0 years ago by Charles Warden7.8k

I've found CoNIFER pretty easy to work with.

ADD REPLYlink written 6.0 years ago by DG7.1k

It was pretty easy to work with, but for me, produced bad results. At conservative settings (higher svd #) we didn't see any interesting CNV predictions. At a low enough setting to see an interesting gene impacted, we had so many false positives, that one did not validate by Taqman. Zero for one is not much evidence, and I admit I had to tune the sensitivity to see a gene of interest among ~50 samples.

 

Do you know, has anyone really evaluated the accuracy of Conifer or similar tools?

ADD REPLYlink written 6.0 years ago by karl.stamm3.6k

I had also same problem with CoNIFER...

This article: Detection of clinically relevant copy number variants with whole exome sequencing. de Ligt J et al,  (PMID=23893877) made a comparison of some available CNV detection tools (CoNIFER is among them)..

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Kizuna800

Yeah, I don't think this is a problem unique to CoNIFER. CNV calling on exome data is tough for a whole host of reasons. I've admittedly only done some work in the area but enough to know that it is challenging to do much with. Especially if you don't have a large enough set of samples to use as controls.

ADD REPLYlink written 6.0 years ago by DG7.1k

I agree with Dan.  I think false negative rate may be high, but I think that coverage-based analysis is tricky and I think the CoNIFER false positive rate is relatively low.  For what it's worth, it was able to recover all already validated deletions with SV=2 (albeit with only a few already validated deletions) in one case where that information was available.  I don't think you need to use SV > 2 (so, just look at SV=1 or 2).

You can also export your results (preferably for SV = 0,1, or 2) to DNAcopy to increase sensitivity, but I think this also comes with an increase in false positive rate.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Charles Warden7.8k
0
gravatar for DG
6.0 years ago by
DG7.1k
DG7.1k wrote:

Another tool option, published by some collaborators is FishingCNV: http://bioinformatics.oxfordjournals.org/content/early/2013/03/28/bioinformatics.btt151.abstract

ADD COMMENTlink written 6.0 years ago by DG7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour