Question: GATK Terra Best Practices pipeline for Mito Variant Calling
1
gravatar for a.j.wilson0000
4 weeks ago by
a.j.wilson000010 wrote:

Hey all.

Currently working on a project to do mitochondrial variant calling on whole exome data. Our probes are about 120 bp and are setup to capture the entirety of the chrM contig at extremely high depth.

We're currently looking at a few different tools, and the new GATK best practices MUTECT2 mito pipeline that incorporates a double alignment strategy looks very promising. The thing is, the --mitochondria-mode tag is brand spanking new, and there just isn't a lot of documentation or usage examples for replicating the pipeline at the command line. I've tried contacting support a few times, but GATK support is quite understaffed at the moment.

Thinking instead that we might use their fully built TERRA cloud pipeline (found here) to generate our vcfs, but the pipeline docs indicate that the workflow is configured for full WGS bam/crams, not WES bam/crams.

People who are familiar with TERRA and variant calling... do you think it is possible to do WES on this workflow? What required inputs would need to change? I'm guessing a few of the interval lists?

Also, there are quite a few optional inputs you can use. Can anyone suggest some I might want to use besides setting a vaf_filter_threshold?

ADD COMMENTlink modified 4 weeks ago by Jeremy Leipzig18k • written 4 weeks ago by a.j.wilson000010

Before suggesting anything else, did you check if your data have a solid number of alignments to chrM? Typical genomic DNA extraction kits do capture circular mtDNA, neither do typical exome kits capture it out of the DNA pool. Samtools idxstats would be a fast way to check.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint24k
2
gravatar for Jeremy Leipzig
4 weeks ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Currently working on a project to do mitochondrial variant calling on whole exome data. Our probes are about 120 bp and are setup to capture the entirety of the chrM contig at extremely high depth.

As I understand it a typical exome kit doesn't include mtDNA probes because the kit would be overwhelmed by the number of mtdna compared to nuclear dna and any attempt to include probes can also end up amplifying NUMTs. Maybe you have a better solution or more specific probes.

Thinking instead that we might use their fully built TERRA cloud pipeline (found here) to generate our vcfs, but the pipeline docs indicate that the workflow is configured for full WGS bam/crams, not WES bam/crams.

WGS doesn't suffer as much from the problems I mentioned above (though NUMTs are still a thing), so they just figured you would have done it this way. This first step in this TERRA WDL workflow just filters for chrM anyway. It's about as crude as you'd imagine and would behave the same for any bam file.

  task SubsetBamToChrM {
  input {
    File input_bam
    File input_bai
    String contig_name
    String basename = basename(basename(input_bam, ".cram"), ".bam")
    File? ref_fasta
    File? ref_fasta_index
    File? ref_dict

    File? gatk_override

    # runtime
    Int? preemptible_tries
  }
  Float ref_size = if defined(ref_fasta) then size(ref_fasta, "GB") + size(ref_fasta_index, "GB") + size(ref_dict, "GB") else 0
  Int disk_size = ceil(size(input_bam, "GB") + ref_size) + 20

  meta {
    description: "Subsets a whole genome bam to just Mitochondria reads"
  }
  parameter_meta {
    ref_fasta: "Reference is only required for cram input. If it is provided ref_fasta_index and ref_dict are also required."
    input_bam: {
      localization_optional: true
    }
    input_bai: {
      localization_optional: true
    }
  }
  command <<<
    set -e
    export GATK_LOCAL_JAR=~{default="/root/gatk.jar" gatk_override}

    gatk PrintReads \
      ~{"-R " + ref_fasta} \
      -L ~{contig_name} \
      --read-filter MateOnSameContigOrNoMappedMateReadFilter \
      --read-filter MateUnmappedAndUnmappedReadFilter \
      -I ~{input_bam} \
      -O ~{basename}.bam
  >>>
  runtime {
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    docker: "us.gcr.io/broad-gatk/gatk:4.1.1.0"
    preemptible: select_first([preemptible_tries, 5])
  }
  output {
    File output_bam = "~{basename}.bam"
    File output_bai = "~{basename}.bai"
  }
}
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Jeremy Leipzig18k

I'm fairly certain it's all been sorted on the lab side. We have new TWIST mito probes custom tailored to the rCRF that can pull out chrM piecemeal without disrupting the standard WES process.

Auto filtering of amplified NuMTs is apparently done with FilterMutectCalls along with the default settings for the --mitochondrial-mode tag.

ADD REPLYlink written 4 weeks ago by a.j.wilson000010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2541 users visited in the last hour