Hi,
I'm confused about papers that are able to allocate variants to either the light or heavy strands of mitochondria, such as this excellent example by Ju et al. 2014: https://elifesciences.org/articles/02935#info
In this and other instances, the authors filter out variants that align exclusively to a single strand (except for at the extreme 5' end). What's more, they only report "folded mutations", e.g. C>N and T>N, rather than all A>N, C>N, G>N and T>N if the strands were actually properly phased. So what how do they determine if a variant is derived from a specific strand? Am I misreading it?
I've been handed varscan output from a mouse strain that maybe someone could explain how the inference is made.
position ref alt mutRatio Reads1 Reads2 Strands1 Strands2 Qual1 Qual2 Reads1Plus Reads1Minus Reads2Plus Reads2Minus trinucleotide_context
13766 T G 0.012 5197 63 2 2 36 38 2491 2706 33 30 ATT_G
All reference bases are plus strand. I'll keep trying to understand it and will post in response if I figure it out on my own. For reference, the methods of the Ju et al. paper are here:
We extracted mtDNA reads using Samtools (Li and Durbin, 2009). We used VarScan2 (Koboldt et al., 2012) for initial variant calling with a few options (--strand-filter 1 (mismatches should be reported by both forward and reverse reads), --min-var-freq 0.03 (minimum VAF 3%), --min-avg-qual 20 (minimum base quality 20), --min-coverage 3 and --min-reads2 2). With respect to the --strand-filter, it generally removes variant when >90% of mismatches are reported from either of the H or the L mtDNA strand. However, where only reads with a specific orientation are could be aligned dominantly (i.e. in both extreme region of mitochondrial reference genome; only L strand reads could be aligned on the 5′ extreme of mtDNA), we compared strand bias between ‘perfect matches’ (# perfect matches from L strand reads / total # perfect matches) and mismatches (# mismatches from L strand reads / total # mismatches). If the difference between those two bias <0.1, the mutations were rescued. Of the 1907 mutations, 54 (2.8%) were rescued accordingly.