Question: Some questions about call variants for mtDNA using GATK HaplotypeCaller
1
gravatar for MatthewP
10 weeks ago by
MatthewP20
China
MatthewP20 wrote:

Hi everyone. I tried to apply GATK pipeline to amplicon-based mtDNA sequencing data analyse and found some problems want to ask/discuss with you guys.
I sat up my pipeline according this GATK Best Practice. Here is my pipeline:

  1. QC(fastp)
  2. mapping
    bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam
  3. SAM to BAM(samtools)
    samtools view -bF 12 sample.sam > sample.bam
  4. sort BAM(GATK SortSam)
    gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate
  5. BAM index(samtools index)
  6. Validate BAM(GATK ValidateSamFile)
    gatk ValidateSamFile -I sample_sorted.bam -M VERBOSE
  7. Call variant(GATK HaplotypeCaller)
    gatk HaplotypeCaller -I samplt_sorted.bam -O sample.g.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -ERC GVCF
  8. Consolidate GVCFs(GATK GenomicsDBImport)
    gatk GenomicsDBImport --genomicsdb-workspace-path ./workspace -L MT -V sample1.g.vcf -V sample2.g.vcf ...(many samples)
  9. Joint-Call Cohort(GATK GenotypeGVCFs)
    gatk GenotypeGVCFs -O 18R111.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -V gendb://workspace

Here is something I want to explain first

  • No mark/remove duplicates because this is amplicon-based(multi-PCR) sequencing data, most of them are duplicates normally(fastp reported about 96%).
  • I set -ploidy 1 in step 7 and 9 because mtDNA should treated as haploid. However I can't set step 8 to haploid because this only for diploid. Second, I find description below in this GATK article: > The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the ploidy, which describes how many copies of each chromosome each individual in the species carries.

It seems this ploidy setting only works for nuclear genome, means the copies of chromosome. But this not quite right for mtDNA for copy number not sure(At least not 1 copy). Question1: What will this affect my result?

When I check my finally vcf result I find 2 problems, result example:

MT      214     .       A       G       2386.46 ... GT:AD:DP:GQ:PL  1:0,112:112:99:5445,0   0:136,0:136:99:0,1080...

There problem is AD and DP too low value(Important to me, I need to calculate each variant's frequency.). The real depth value is about 5000, I find many has the same issue(check here). This seems because HaplotypeCaller not suite for amplicon data. Question2: Have anyone call such mtDNA variant using GATK? How?

Question3: Will Mutect2 work here?
I think about this because mtDNA heterogeneity are like tumor, while tumor has many other features that mtDNA don't like structure variants. Any one think this before? What should be normal tissue data (-I:normal param in Mutect2) if I apply Mutect2 to mtDNA? Reference sequence?

Thanks everyone!

mtdna gatk haplotypecaller • 190 views
ADD COMMENTlink written 10 weeks ago by MatthewP20
1

2 . mapping
bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam

3 . SAM to BAM(samtools)
samtools view -bF 12 sample.sam > sample.bam

4 . sort BAM(GATK SortSam)
gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate

Note that you shorten steps 2-4, and avoid intermediate files by piping the result of the alignment directly to samtools sort, creating a sorted bam file:

bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz  | samtools sort -o sample.bam -
ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by WouterDeCoster34k

Thanks, I used samtools sort before until I found gatk SortSam. I though this two tools do exactly the same thing. You mean my understanding here is wrong?

ADD REPLYlink written 10 weeks ago by MatthewP20

They probably do exactly the same thing, but I'd naively expect samtools to be quicker.

What you are doing is correct, there is just no need to split the alignment, sam to bam and sorting bam across three commands. It's messy, takes longer and makes intermediate files which you'll have to delete. A sam file takes a ton of disk space.

ADD REPLYlink written 10 weeks ago by WouterDeCoster34k

Good advice, indeed I need to delete those files by hand. Thanks.

ADD REPLYlink written 10 weeks ago by MatthewP20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour