Question

Small CNV calling (exome deletions and insertions)

0

Entering edit mode

9.8 years ago

mafonso ▴ 10

Hi all,

I have a question and hope you guys can help me.

I'm doing a software for a geneticist and she wants to detect deletions and insertions in target exomes in humans. She uses a software where she can see the depth of coverage in the samples and visually detect if any alteration is present in the exomes of the genes she is interested in.

The problem is that I cannot understand exactly what the name of this problem is. Is it CNV (Copy Number Variations) or something else?

Additionaly I asked what sample I should use as control and she said any other sample that used the same experiment could be used as a control sample. The thing is: if I see a difference between the two samples, in which one is the variation? I don't think that makes a lot of sense. Is there a way to get an average .bam file to use as control?

Thank you very much. I really appreciate any help.

Variant-calling Exome CNV Target • 5.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by mafonso ▴ 10

0

Entering edit mode

Hello mafonso!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=45033

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 9.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi Pierre,

I'm sorry.. I did not know it was not recommended. Thank you for the advice.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by mafonso ▴ 10

Ram · Answer 1 · 2014-07-17

4

Entering edit mode

9.8 years ago

Kizuna ▴ 870

Hi mafonso,

You need to define a set of reference samples or control samples, these sample should be genetically solved, thus you need to know their causative mutations, you should not include any unsolved samples among the ctrls. The best controls are those who are solved and their sequencing was done in the same time with the tests.. preferably same batch, or same run, otherwise you will increase your false positive results.

I am using library(ExomeDepth) written in R to detect CNVs from target NGS panels and WES. This library is working fine. I have added a link that summarize a bit the process. ExomeDepth calculates the count reads of test and ctrl direclty from the .bam files (.bai should also be included)..

All needed information can be found here.

Hope this helps,

Kiz

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Kizuna ▴ 870

1

Entering edit mode

Hi Kiz,

Thank you very much for your reply.

I can't open the links though... What do you mean when you say the samples should be solved? Is it that you already know if there are any variations?

I am now using CONTRA and EXCAVATOR. But I can try ExomeDepth too!

Best,
Mariana

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by mafonso ▴ 10

Ram · Answer 2 · 2014-07-17

3

Entering edit mode

9.8 years ago

Devon Ryan 104k

Insertions or deletions within an exon are not CNVs, they're Indels. A CNV would be a change in copy number of a whole gene/feature (or a region containing multiple features).

Her suggestion of using any other sample that underwent the identical process is correct. You likely have control samples being sequenced along with affected samples, so just use one/all of those. I should also note that what you're describing already exists. What you've described sounds like normal variant calling, for which a LOT of software already exists.

ADD COMMENT • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the answer. The thing is, she told us it a small CNV, where she looks at the exons and she sees if there is a deletion.

Yes, I tested CONTRA for CNV detection and the Paper says that it detects exon/small-region CNV.

Am I doing everything wrong?

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by mafonso ▴ 10

0

Entering edit mode

I suppose if the entire exon is affected, then that could still count as a CNV. I'd follow the answer from Kizuna and give ExomeDepth a try as well. I'd give that and similar packages a try before bothering to roll my own.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, I agree with you... That was what I was trying to do, but the geneticist keeps insisting that it is really easy and you don't need any fancy algorithm to detect the variations. Well, I think she is deluded.

Thanks again for the response.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by mafonso ▴ 10

0

Entering edit mode

You're unfortunately correct.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

You can identify obvious depth of coverage issues visually, and sometimes you'll see something this way that a more robust algorithm won't detect for various statistical reasons. But it gets tricky for problematic or noisy regions. When doing CNV analysis with SNP genotyping data, although there are algorithms for identifying it, many geneticists do also manually inspect them. They can be pretty obvious when they are large.

ADD REPLY • link 9.8 years ago by DG 7.3k

0

Entering edit mode

You could find zero reads at a site of deletion (if homozygous). Or you could find a few reads by sequence homology and misalignment. If the deletion is heterozygous you might find 50% mean coverage, or maybe that region is more mappable and preferentially amplified and will look like 115% sequence coverage.

That's why you need a bunch of control samples, to see what the informatic pipeline does to the normal variation at the region. More fancy software will use breakpoints and misalignments, and the best strategy varies by the size of the indel relative to the length of a read.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by karl.stamm 4.1k

Ram · Answer 3 · 2014-07-17

1

Entering edit mode

9.8 years ago

Kizuna ▴ 870

Hi Mariana,

You can find the figure here

To be honest I used CONTRA, but I was not highly satisfied..

A genetically solved sample is a sample where you have detected the mutations that cause the phenotype.. Thus the reason of sickness is not yet unknown..

P.S: to my knowledge, a deletion of one exon is considered as a CNV..

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Kizuna ▴ 870

Ram · Answer 4 · 2014-07-18

1

Entering edit mode

9.8 years ago

Charles Warden 8.2k

If you have a batch of at least a dozen exomes, I would recommend CoNIFER.

XHMM would be another similar, popular option.

If you only have a single tumor-normal pair, you could try the VarScan somatic copynumber caller, but I tend to prefer using that for larger indels (although some larger exons, or cluster of nearby exons, might be OK)

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 9.8 years ago by Charles Warden 8.2k

0

Entering edit mode

I've found CoNIFER pretty easy to work with.

ADD REPLY • link 9.8 years ago by DG 7.3k

0

Entering edit mode

It was pretty easy to work with, but for me, produced bad results. At conservative settings (higher svd #) we didn't see any interesting CNV predictions. At a low enough setting to see an interesting gene impacted, we had so many false positives, that one did not validate by Taqman. Zero for one is not much evidence, and I admit I had to tune the sensitivity to see a gene of interest among ~50 samples.

Do you know, has anyone really evaluated the accuracy of Conifer or similar tools?

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by karl.stamm 4.1k

0

Entering edit mode

I had also same problem with CoNIFER...

This article: Detection of clinically relevant copy number variants with whole exome sequencing de Ligt J et al, (PMID=23893877) made a comparison of some available CNV detection tools (CoNIFER is among them)..

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Kizuna ▴ 870

0

Entering edit mode

Yeah, I don't think this is a problem unique to CoNIFER. CNV calling on exome data is tough for a whole host of reasons. I've admittedly only done some work in the area but enough to know that it is challenging to do much with. Especially if you don't have a large enough set of samples to use as controls.

ADD REPLY • link 9.8 years ago by DG 7.3k

0

Entering edit mode

I agree with Dan. I think false negative rate may be high, but I think that coverage-based analysis is tricky and I think the CoNIFER false positive rate is relatively low. For what it's worth, it was able to recover all already validated deletions with SV=2 (albeit with only a few already validated deletions) in one case where that information was available. I don't think you need to use SV > 2 (so, just look at SV=1 or 2).

You can also export your results (preferably for SV = 0,1, or 2) to DNAcopy to increase sensitivity, but I think this also comes with an increase in false positive rate.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Charles Warden 8.2k

Ram · Answer 5 · 2014-07-18

0

Entering edit mode

9.8 years ago

DG 7.3k

Another tool option, published by some collaborators is FishingCNV.

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by DG 7.3k