Question

Mitochondrial light/heavy strand allocation of variants

0

Entering edit mode

3.3 years ago

caearthworms • 0

Hi,

I'm confused about papers that are able to allocate variants to either the light or heavy strands of mitochondria, such as this excellent example by Ju et al. 2014: https://elifesciences.org/articles/02935#info

In this and other instances, the authors filter out variants that align exclusively to a single strand (except for at the extreme 5' end). What's more, they only report "folded mutations", e.g. C>N and T>N, rather than all A>N, C>N, G>N and T>N if the strands were actually properly phased. So what how do they determine if a variant is derived from a specific strand? Am I misreading it?

I've been handed varscan output from a mouse strain that maybe someone could explain how the inference is made. position ref alt mutRatio Reads1 Reads2 Strands1 Strands2 Qual1 Qual2 Reads1Plus Reads1Minus Reads2Plus Reads2Minus trinucleotide_context

13766 T G 0.012 5197 63 2 2 36 38 2491 2706 33 30 ATT_G

All reference bases are plus strand. I'll keep trying to understand it and will post in response if I figure it out on my own. For reference, the methods of the Ju et al. paper are here:

We extracted mtDNA reads using Samtools (Li and Durbin, 2009). We used VarScan2 (Koboldt et al., 2012) for initial variant calling with a few options (--strand-filter 1 (mismatches should be reported by both forward and reverse reads), --min-var-freq 0.03 (minimum VAF 3%), --min-avg-qual 20 (minimum base quality 20), --min-coverage 3 and --min-reads2 2). With respect to the --strand-filter, it generally removes variant when >90% of mismatches are reported from either of the H or the L mtDNA strand. However, where only reads with a specific orientation are could be aligned dominantly (i.e. in both extreme region of mitochondrial reference genome; only L strand reads could be aligned on the 5′ extreme of mtDNA), we compared strand bias between ‘perfect matches’ (# perfect matches from L strand reads / total # perfect matches) and mismatches (# mismatches from L strand reads / total # mismatches). If the difference between those two bias <0.1, the mutations were rescued. Of the 1907 mutations, 54 (2.8%) were rescued accordingly.

variant calling mitochondria • 899 views

ADD COMMENT • link 3.3 years ago by caearthworms • 0

score 1 · Accepted Answer · 2021-01-12

I emailed Young Seok Ju, who was first author on the work I quote above, and instrumental in the recent expansion of this work as part of PCAWG. He responded with a clear example of how heavy and light stranded mutations were defined:

As far as I remember, the strandness is based on the "pyrimidine wildtype base". It is not related to the varscan Plus and Minus strand information. For example, for a C:G>T:A base substitution, if the "C" wildtype base is on the light strand of the mtDNA genome, we regarded it as a C>T mutation on the light strand. Instead, if "C" is on the heavy strand (equivalent to G on the light strand), it was considered as a C>T mutation on the heavy strand. Just for your information, the varscan Plus and Minus strand information was considered to remove the sequencing artifact. Once a mutation was called, its heavy or light strandness was determined by the rule mentioned above.