Question: Mitochondrial light/heavy strand allocation of variants
gravatar for caearthworms
14 days ago by
caearthworms0 wrote:


I'm confused about papers that are able to allocate variants to either the light or heavy strands of mitochondria, such as this excellent example by Ju et al. 2014:

In this and other instances, the authors filter out variants that align exclusively to a single strand (except for at the extreme 5' end). What's more, they only report "folded mutations", e.g. C>N and T>N, rather than all A>N, C>N, G>N and T>N if the strands were actually properly phased. So what how do they determine if a variant is derived from a specific strand? Am I misreading it?

I've been handed varscan output from a mouse strain that maybe someone could explain how the inference is made. position ref alt mutRatio Reads1 Reads2 Strands1 Strands2 Qual1 Qual2 Reads1Plus Reads1Minus Reads2Plus Reads2Minus trinucleotide_context

13766 T G 0.012 5197 63 2 2 36 38 2491 2706 33 30 ATT_G

All reference bases are plus strand. I'll keep trying to understand it and will post in response if I figure it out on my own. For reference, the methods of the Ju et al. paper are here:

We extracted mtDNA reads using Samtools (Li and Durbin, 2009). We used VarScan2 (Koboldt et al., 2012) for initial variant calling with a few options (--strand-filter 1 (mismatches should be reported by both forward and reverse reads), --min-var-freq 0.03 (minimum VAF 3%), --min-avg-qual 20 (minimum base quality 20), --min-coverage 3 and --min-reads2 2). With respect to the --strand-filter, it generally removes variant when >90% of mismatches are reported from either of the H or the L mtDNA strand. However, where only reads with a specific orientation are could be aligned dominantly (i.e. in both extreme region of mitochondrial reference genome; only L strand reads could be aligned on the 5′ extreme of mtDNA), we compared strand bias between ‘perfect matches’ (# perfect matches from L strand reads / total # perfect matches) and mismatches (# mismatches from L strand reads / total # mismatches). If the difference between those two bias <0.1, the mutations were rescued. Of the 1907 mutations, 54 (2.8%) were rescued accordingly.

ADD COMMENTlink modified 14 days ago • written 14 days ago by caearthworms0
gravatar for caearthworms
14 days ago by
caearthworms0 wrote:

I emailed Young Seok Ju, who was first author on the work I quote above, and instrumental in the recent expansion of this work as part of PCAWG. He responded with a clear example of how heavy and light stranded mutations were defined:

As far as I remember, the strandness is based on the "pyrimidine wildtype base". It is not related to the varscan Plus and Minus strand information. For example, for a C:G>T:A base substitution, if the "C" wildtype base is on the light strand of the mtDNA genome, we regarded it as a C>T mutation on the light strand. Instead, if "C" is on the heavy strand (equivalent to G on the light strand), it was considered as a C>T mutation on the heavy strand. Just for your information, the varscan Plus and Minus strand information was considered to remove the sequencing artifact. Once a mutation was called, its heavy or light strandness was determined by the rule mentioned above.

ADD COMMENTlink written 14 days ago by caearthworms0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1664 users visited in the last hour