Post-filtering WES for mutation calling
1
0
Entering edit mode
2.3 years ago
graeme.thorn ▴ 50

I have some whole exome sequencing data from cell-free DNA (at 500 or 1000X coverage) and have called mutations using the GATK pipeline with Mutect2. However, the results have mutations listed which are not in the exons, alongside all the exon mutations we expect.

Examining the original BAM file from mapping the reads, we have lots of off-target reads, from introns and from intergenic regions. The depth of coverage in some off target reads is of comparable size to coverage at some exons, so a hard depth filter will remove all the true positives in these places as well as all the false positives from introns or intergenic regions.

I'm not entirely au fait with Mutect2 so I don't know if there is a normalisation step in the algorithm, so its result might change if the off-target reads are removed before calling the variants (the effective library size would change, but there would still be >200X coverage at exons). This would suggest filtering the final generated VCF, but I'm not sure on best practice in this case. As mentioned before, a hard filter on depth will remove some of the mutations we want to detect, as they are likely to be at low frequency.

The question is: at which stage in the pipeline should the off-target reads be removed? Before calling the variants or once the VCF is produced?

mutect2 wxs wes • 738 views
ADD COMMENT
3
Entering edit mode
2.3 years ago
bruce.moran ▴ 880

Most GATK tools have the --intervals (-L) flag which allows you to input a file of the regions that you are interested in, i.e. your exome. That is the stage at which I usually reduce to the region of interest.

My rationale is that copy number callers can sometimes use off-target reads from exome, so getting rid of them may reduce quality of calls. Bigger picture, it may be of use to others if/when data is shared, and so leaving everything that aligns in BAM is probably good practice.

ADD COMMENT
0
Entering edit mode

Moved to answer. I agree, you should generally leave everything that has aligned until the end, and then only filter on the final output that you obtain (e.g. the VCF output).

ADD REPLY

Login before adding your answer.

Traffic: 1532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6