Question

CollectHsMetrics per_base_coverage results quirk

0

Entering edit mode

4.5 years ago

a.j.wilson0000 ▴ 10

Hey all,

I've recently been working with GATK's Mutect2 --mitochondria-mode for calling variants on chrM. I am testing the WDL Best Practices Pipeline on terra which employs a double alignment strat, along with a reverse engineered command line version for a single alignment strat.

Recently tested on a WES exome cram, as well as gatk's provided sample WGS bam, both aligned on broad's hg38 referenced genome. Both samples' seq probes are designed to capture mito content effectively.

Coverage per base data was collected using GATK's CollectHsMetrics tool, which is separate from Mutect2. I do not believe CollectHsMetrics can account for NuMTs like Mutect2 mitomode does, so no filters have really been applied to these datasets. This is simply bam/cram file and bait/target interval input with coverage per base output.

I noticed some odd quirks in the coverage tsvs and was hoping some of you could provide insight.

The gatk sample data, run on terra, shows a sharp dip in coverage at position 3106:

terra wgs coverage

The terra exome data and the gatk command line exome data both show the same dip at the same position, along with an unusual spike immediately proceeding it.

terra exome coverage

comline exome coverage

A colleague has suggested that the spike in our exome data can probably be attributed to the 16s rRNA gene, which has copies in the nuclear genome and which our seq kit has separate probes for, which inflates coverage.

But I'm still wondering what is causing this odd dip at 3106. If it was an error with the double alignment process, why is it showing up in the comline version?

Something to do with an inaccuracy in the hg38 reference genome maybe? Or perhaps a highly conserved deletion? Maybe a CollectHsMetrics tool error?

GATK Mitochondrial Variant Calling • 1.4k views

ADD COMMENT • link updated 4.5 years ago by houkto ▴ 80 • written 4.5 years ago by a.j.wilson0000 ▴ 10

score 1 · Answer 1 · 2019-11-01

Hi there,

I came across your post on biostars as well as GATK, I managed to run the WDL script locally so if you didn't do that yet I can share the code with you. In regard to your question there is artificial insertion of N at 3106 of the Mitochondria Cambridge reference (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4149221/) this should explain the dip.

I am still exploring the mitochondria sequencing using WGS and Exome so we can learn from each other. let me know if you need more information.