Question: Post-filtering WES for mutation calling
0
gravatar for graeme.thorn
9 days ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

I have some whole exome sequencing data from cell-free DNA (at 500 or 1000X coverage) and have called mutations using the GATK pipeline with Mutect2. However, the results have mutations listed which are not in the exons, alongside all the exon mutations we expect.

Examining the original BAM file from mapping the reads, we have lots of off-target reads, from introns and from intergenic regions. The depth of coverage in some off target reads is of comparable size to coverage at some exons, so a hard depth filter will remove all the true positives in these places as well as all the false positives from introns or intergenic regions.

I'm not entirely au fait with Mutect2 so I don't know if there is a normalisation step in the algorithm, so its result might change if the off-target reads are removed before calling the variants (the effective library size would change, but there would still be >200X coverage at exons). This would suggest filtering the final generated VCF, but I'm not sure on best practice in this case. As mentioned before, a hard filter on depth will remove some of the mutations we want to detect, as they are likely to be at low frequency.

The question is: at which stage in the pipeline should the off-target reads be removed? Before calling the variants or once the VCF is produced?

mutect2 wes wxs • 73 views
ADD COMMENTlink written 9 days ago by graeme.thorn40
3
gravatar for bruce.moran
9 days ago by
bruce.moran600
Ireland
bruce.moran600 wrote:

Most GATK tools have the --intervals (-L) flag which allows you to input a file of the regions that you are interested in, i.e. your exome. That is the stage at which I usually reduce to the region of interest.

My rationale is that copy number callers can sometimes use off-target reads from exome, so getting rid of them may reduce quality of calls. Bigger picture, it may be of use to others if/when data is shared, and so leaving everything that aligns in BAM is probably good practice.

ADD COMMENTlink written 9 days ago by bruce.moran600

Moved to answer. I agree, you should generally leave everything that has aligned until the end, and then only filter on the final output that you obtain (e.g. the VCF output).

ADD REPLYlink written 4 days ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1801 users visited in the last hour