Question

MuTect Seems to Miss a Lot of Mutations

1

Entering edit mode

8.3 years ago

haiying.kong ▴ 360

I have whole exome data for paired samples of blood and tumor for 23 patients. I processed them with BWA(alignment), Picard(removing duplicates, sorting), GATK(local realignment, base quality score re-calibration etc.), then ran MuTect with STD mode to identify somatic mutations.

I also have Sanger sequence results for 3 genes.

So I compared the results from whole exome and Sanger, and found that some somatic mutations identified from Sanger are not appearing in the results from whole exome data. I think I might need to modify some parameters when I run MuTect. Which parameter can I adjust?

I have attached some IGV figures in the regions where MuTect has missed somatic mutations. Some are too obvious, and cannot understand why MuTect does not identify them as mutation. The input to IGV is bam files from GATK base quality score recalibration.

Thank you very much for any advice, suggestions, or hints.

SNP • 2.8k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by haiying.kong ▴ 360

0

Entering edit mode

What were your command-line parameters? The default mode, especially with matched tumour-normal is High Confidence, not STD. To run in STD mode you typically either need to turn off a bunch of filters or run with the --artifact_detection_mode flag. You should get pretty much everything picked up, plus tons of false positives in STD mode.

ADD REPLY • link 8.3 years ago by DG 7.3k

0

Entering edit mode

This is the line for MuTect in my bash script:

/usr/bin/java -Xmx4g -Djava.io.tmpdir=tmp -jar ${MuTect}mutect-1.1.7.jar \
                --artifact_detection_mode --analysis_type MuTect \
                --reference_sequence ${GATK_hg} --cosmic ${COSMIC} \
                --dbsnp ${dbSNP} --intervals ${Intervals} \
                --input_file:normal ${BQSR_dir}${Normal}.recal.bam \
                --input_file:tumor ${BQSR_dir}${Tumor}.recal.bam \
                --out ${MuTect_dir}${Tumor}_${Normal}_call_stats.out \
                --coverage_file ${MuTect_dir}${Tumor}_${Normal}_coverage.wig.txt \
                --vcf ${MuTect_dir}${Tumor}_${Normal}.vcf

Thank you so much for your quick reply.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by haiying.kong ▴ 360

0

Entering edit mode

Hmmm, I'm not sure. Try calling just with the tumour sample in artifact_detection_mode and see if it is picked up at all that way? I am assuming you are looking in the raw VCF and not just a downstream analysis where variants with a filter flag set might not be reported?

ADD REPLY • link 8.3 years ago by DG 7.3k

0

Entering edit mode

Yes, I am looking at VCF file and only those mutations tagged as PASSED. Which file should I look at?

How to do this "calling just with the tumour sample in artifact_detection_mode"?

Thank you so much for your help.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by haiying.kong ▴ 360

0

Entering edit mode

You just pass a single input file -I:tumor [tumor.bam], just leave out the parameter for the normal sample.

ADD REPLY • link 8.3 years ago by DG 7.3k

score 0 · Answer 1 · 2016-01-05

0

Entering edit mode

8.3 years ago

haiying.kong ▴ 360

I made a mistake, and this could explain part of the results.

I think MuTect picked up the mutations in the first 3 figures, but the genomic location is different from Sanger by 1 base, and this is why I thought it did not pick up.

But for some others, some times there are mutations in more than 2 fragments or 2 coverage, MuTect did not pick up them as mutations, but Sanger identified them as mutations.

I understand that tumor tissues are highly heterogeneous, and it is possible that the mutation occurred on only a very small portion of the tumor cells.

But in any case, is there any parameter we could use for threshold when we decide somatic or not?

ADD COMMENT • link 8.3 years ago by haiying.kong ▴ 360

0

Entering edit mode

Yes, remember different genomic file formats have different counting styles, 0 versus 1-based counting.

I'm not sure what you mean though by "some times there are mutations in more than 2 fragments or 2 coverage"?

Also as a suggestion: Typically to update your question you can edit your original post and add new information at the bottom. You could also add it in the comments as well underneath but you shouldn't submit new information as an answer. Leave answers to be solely answers to the question.

ADD REPLY • link 8.3 years ago by DG 7.3k