Question: Choosing the best alignment or aligner
0
gravatar for adhb
12 months ago by
adhb0
United States
adhb0 wrote:

I'm aligning short-read (2x150) data to fairly divergent references (up to ~10%). So far I have been working mostly with bwa-mem and NGM and have very generous read depths (200-300x). After each alignment, I call the consensus using samtools pileup and bcftools call with defaults (currently no quality filters on variant calling, but will add them soon). My goal is consensus with ambiguities for downstream work.

My problem is that the consensus seqs from the different aligners don’t match exactly (overall 97-99% identical), with most mismatches being transitions (C-T or A-G). The Ts/Tv ratios from bcftools stats are around 1.7-2.0, but these alignments are all to protein coding regions so those ratios are really on the low end.

My questions are:

  1. Why am I getting so many transitions between consensus seqs from different aligners?
    Are these discrepancies likely heterozygous positions, but not being called as such because the aligners (based on their internal tuning) have some variation in which haplotype maps better in each instance? (The consensus seqs from both aligners contain some heterozygous/ambiguous positions, so I know it’s possible to see heterozygous positions if both haplotypes map sufficiently.)

  2. By what metric(s) should I choose between alignments?
    If the transitions do represent heterozygous positions, would it be fair to effectively choose both by melding the consensus seqs from different aligners and create ambiguities where they differ?

Thanks in advance.

ADD COMMENTlink modified 12 months ago by RamRS24k • written 12 months ago by adhb0

Have you tried a simulation study to see which mapper is better able to deal with the high divergence? You could use Simulome to generate a reference sequence in FASTA format then simulate Illumina read pairs with ART.

ADD REPLYlink written 10 months ago by mark.ziemann1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1824 users visited in the last hour