Hello,
Has anyone worked on mtDNA NGS analysis ? I'm developing an inhouse pipeline using the rCRS reference. The pipeline is as follows
1) BWA map to reference; Picard/Sambamba sort, index mark duplicates
2) GATK base recalibration
3) Samtools mpileup followed by Varscan for variant calling (strand filter 0 min-var-freq 0.01 --min-avg-qual 20 --min-coverage 50 )
We know that rCRS reference has an artefact at position 3107 and should be detected when aligned and also should be homzygous with high heteroplasmy levels. But using the above steps, 3107 does not get detected in all samples and if it does, it classifies as heterozygous with low levels of heteroplasmy (less than 15%)
Has anyone encountered such issues or can suggest ways to improve the pipeline ?
I know this artefact is usually excluded from the reference but in this particular case, we do not exclude it but classify it as an artefact.
thank you
Hey Nandini, I used
mPileup + varscanas you suggested. I still have some questions/problems want to ask you:varscanmpileup2cnscommand? I used this and I found this only call one mutation frequency for me each position even there are more than 1 variant at that site. Check here for detail description.HaplotypeCallerbut still have some problems. I wonder have you think about treat mtDNA as tumor and usevarscanTumor-normal Comparisonset rCRS as normal tissue data, or useMutecttool from GATK. I am think about this because mtDNA and tumor both has heterogeneity. Howevertumorhas more other features different frommtDNAlike more commonly structure variants. Thanks.Hi MatthewP, So I use varscan
mpileup2SNPandmpileup2indeland then combine the two result files, followed by annotation. If you are using GATK, then I THINK you can use GATK's Mutect rather than Haplotypecaller (though I do not have much experience with GATK and mtDNA analysis)So can you get 1 frequency value per variant instead of per site(POS?) I really need frequency for each variant(Clients need mtDNA heterogeneity information). I already find HaplotypeCaller not suite for mtDNA data(because amplicon-based and haploid).