Question

MuTect is giving strikingly different results

0

Entering edit mode

7.8 years ago

haiying.kong ▴ 360

My colleague identified a set of somatic mutations with MuTect for a project in the end of 2013. I have worked on exactly same data and identified somatic mutations. I want to emphasize that we have used same software for preprocesisng (BWA, Picard, GATK) and somatic mutation identification (MuTect). Both of us used mostly default parameters, and same stringent criteria for MuTect. But the results we got is strikingly different. We used different versions of software, but the results are too different to be explained by the version differences. If we look at the numbers of mutations we identified:

Patient      N_MyColleague   N_Me   N_Intersect
B62047_N018  23              307    12
B62524_N029  234             309    164
B64433_N058  9               154    2
B68756_N045  0               347    0
B79632_N011  5               156    2
B82772_N030  63              133    45
B84442_N007  5               108    4
B88098_N009  64              557    45
B91397_N016  32              327    23

Can any one please give me an explanation how this is possible?

software error • 2.8k views

ADD COMMENT • link updated 7.8 years ago by H.Hasani ▴ 990 • written 7.8 years ago by haiying.kong ▴ 360

score 0 · Answer 1 · 2016-06-28

Hi,

If anyone of you have used Mutect2, be careful that its still under beta and known to produce false positives.

MuTect2 has not yet undergone the same degree of scrutiny and validation as the original MuTect since it is so new. Early validation results suggest that MuTect2 has a tendency to generate more false positives as compared to the original MuTect; for example, it seems to overcall somatic mutations at low allele frequencies, so for now we recommend applying post-processing filters, e.g. by hard-filtering calls with low minor allele frequencies.

Broad still suggests to use earlier Mutect until this version becomes fully stable. More info on this here

score 0 · Answer 2 · 2016-06-29

Hi,

well, as you already mentioned it, the different software version could lead to difficulty in reproducing the results. I once read a practical review where just R version changed the entire results, and no overlap at all was found between old results and new ones.

Long story short, in your place, I would not exclude this factor yet, instead run a deeper analysis to address the starting point where results deviation from the already known ones started. I.e. re-run the exact same analysis they did but with deeper software control. Only then you can tell for sure, how big the effect of this parameter on your results.

hth