Question

Overlap of variants between CaVEMan and MuTect2 ?

1

Entering edit mode

6.0 years ago

乙 ▴ 210

Hello everyone,

I want to discuss with Biostars community and the specialists on Exome-seq the two known callers: MuTect2 and CaVEMan. As far as I know, there hasn't been any papers when comparing both tools but I heard people claiming that the overlap goes to at least 80%. In my analysis, I found that the overlap is only 20%. So, I would really appreciate any feedback on this topic or any help you could provide.

Background information on Exome-seq: Exome-seq (also called Whole Exome Sequencing or WES), in short, allows to discover mutations in exons of the sequenced DNA using different mathematical approaches by comparing, for example, a matched normal and tumor sample.

To be able to judge better the outcome, I believe it is necessary to mention briefly the steps performed by my analysis.

Samples:

TCGA: 4 patients matched normal/tumor in BAM format that have been co-cleaned by the TCGA.

My samples (will call them Cohort): 15 patients matched normal/tumor of which 3 are duplicates (18 samples), in FastQ format. The samples after alignment have an average of 90 million reads after cleaning and PCR duplicates removed and around 95% of properly paired reads with the reference genome.

Methods:

Following the Best Practices of GATK, I produced the BAM files for the cohort like so. The reference genome for the cohort is hg19 and for the TCGA patients is GRCh38.d1.vd1 (provided by the TCGA). It is all natural that I compare MuTect2 and CaVEMan results between the same reference genomes. So, the first comparison between both tools will be for my cohort and the second one will be for the TCGA.

After that, I produced the Panel of Normal (PONs can be a useful (optional) input to help filter out commonly seen sequencing noise that may appear as low allele-fraction somatic variants between the normal samples.).

For CaVEMan, I ran the pipeline described on their page using the default values. Copy number values are given by default as suggested by the authors. For MuTect2, it has also been ran by default, COSMIC and dbSNP were provided, PON was given only for the cohort but not for the TCGA samples. Additionally, a contamination fraction is set to 0.02.

For overlapping, I have used vcf-tools, more precisely vcf-compare and vcf-isec. Also, CaVEMan produces a VCF of version 4.1 while that of MuTect2 is 4.2, I convert 4.1 to 4.2 using vcf-convert

Since the TCGA samples have been already co-cleaned following the same protocol as above, I proceeded to analyze all the samples using MuTect2 (GATK3.7) and CaVEMan(v1.11.3).

Results:
- Overlap between the same patients from TCGA and my re-analysis

in this section, I will be showing the results of overlap between the patients between TCGA analysis and my re-analysis:

Sample 1 Sample 2 Sample 3 Sample 4

As you can see, in most cases, there is around 90 % of overlap which is totally fine. In my opinion, this difference could be due that in case of the TCGA analysis, they used the whole DNA to find mutations without specifying the exons intervals which is different from my case. Additionally, I didn't give any PON for this analysis in contrast to them.

Overlap between MuTect2 and CaVEMan for the TCGA patients

Sample 1 Sample 2 Sample 3 Sample 4

As you can see, there is a poor overlap between both tools for the TCGA patients.

Overlap between MuTect2 and CaVEMan for the cohort

Overlap 1 Overlap 2

Again, for my cohort the overlap is around 20%.

Opinion

Given the good depth of sequencing for the samples, one expects that both callers would call mutations correctly with high overlap. But in my analysis, I only get 20% which is not logical. I understand that the parameters could affect the outcome or maybe the algorithm of each tool but getting that low percentage of overlap is a "scary" though because most of research done on Exome-seq base their interpretation from such analysis. I haven't done any post-filtering after I got the raw mutations from CaVEMan and MuTect2 and I am not sure where this difference could be coming from. I will highly appreciate your feedback on this topic or if you got any data to share that could help clear things. If you need any other information, let me know.

Thanks in advance!

--Alaa

Whole Exome Sequencing MuTect2 CaVEMan sequencing • 3.0k views

ADD COMMENT • link 6.0 years ago by 乙 ▴ 210

score 1 · Answer 1 · 2018-04-05

Hi- some random thoughts...

the overlap goes to at least 80%

I think that depends a lot on the particular sample(s) you are looking at. If you have variants with high frequency than both methods should pick them up. As you go towards lower allele frequencies then the discrepancy inevitably increases. I did a quick & dirty comparison between mutect2 and strelka for a number of samples (all processed in the same way) and I found a wide range of overlaps between the two methods, from ~80% to <10%. Lower overlaps found for samples with low tumour allele frequencies.

It seems to me that method papers often compare just a few samples with different methods (citation needed) so it's difficult to draw general conclusions.

In general, I would keep in mind that for practical reasons we dichotomize genomic positions into those that are variant between tumour and normal and those that are invariant. In fact, there's a continuum between the two categories. Ideally, one would attach to each genomic position a probability distribution for that position being a variant (in practice, most positions would have this distribution squashed around 0). Then, the concordance between methods would be based on the overlap between distributions.

The point is that variants close to the decision boundary may be called by one method and discarded by another even if both are agreeing in saying "this variant is close to the boundary". In these cases, changing a bit the filtering thresholds could result in a large difference in number of calls. This in turn means that it's difficult to disentangle a method per se vs the exact thresholds you decide to use (and often these are just the authors' recommendations).

In other words, variants with high frequency called by one method only may suggest a problem with one of the methods but variants at low frequency cannot say much (and probably low-frequency variants are the majority). The problem with venn-diagrams, I think, is that they put in the same set all the variants, regardless of their confidence.

Just some thoughts... I'm also intersted in other people's opinion.

score 0 · Answer 2 · 2018-04-09

0

Entering edit mode

6.0 years ago

乙 ▴ 210

Sorry to bump, I am still awaiting more opinions from people on this topic. Thanks !

ADD COMMENT • link 6.0 years ago by 乙 ▴ 210