Calling variants after using different read alignment programs
1
0
Entering edit mode
2.1 years ago
ManuelDB ▴ 80

After using different alignments programs, I have called the variants by using Freebayes (default options). I have concatenated position, REF, and ALT to create a variants ID and be able to see common variants. Here are the results.

enter image description here

Bowtie2, BWA-SW and BWA-MEM look like there are quite similar However, BWA-backtrack has only 13 variants in common with the others and it has also called a lot of variants.

I have used IGV to see what is going on and I have found this

enter image description here

In this order MEM, Bowtiew, SW and backtrack.

Is normal a great number of artifacts when using backtrack for ~150bp long reads in the FASTQ? And second, regarding SW, are the green reads Tandem Duplication (according to IGV interpretation documentation)?? If so, why?

Someone has done this comparison before and has found similar results?

Bowtie2 IGV BWA • 758 views
ADD COMMENT
2
Entering edit mode
2.1 years ago
d-cameron ★ 2.9k

That misalignment rate is astoundingly high and I strongly suspect there is an error in your pipeline. The green reads in the second last are also highly problematic. Things to check are:

  • NM tag. Does the number of bases different that you count in IGV match what the NM tag says? If it doesn't, then the alignment is garbage. Maybe you aligned against a different reference genome/ different bwa version that had a different index format?
  • What's common about the green reads? Fragment size less than read length? Less than twice read length? All read 1/2?

are the green reads Tandem Duplication

No. You'd only get half your negative strand reads green if this was inside a tandem duplication and you appear to have all -ve strand reads green.

Someone has done this comparison

This is unlikely as the bwa documentation says to use bwa mem for data with reads longer than 70bp. There's not much point comparing against something that is not recommended and known to be problematic. No serious analysis should be using bwa sw for 2x150bp reads unless they know exactly how bwa works and have a very very specific reason for doing so.

TDLR: drop bwa bt and bwa se from your comparison - they're not suitable for 2x150bp data.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. What do you mean when you say "all -ve strand reads green". What -ve means?

ADD REPLY
1
Entering edit mode

Negative strand. These are the reads with the 0x10 SAM flag set (Section 1.4.2 of the SAM specifications https://samtools.github.io/hts-specs/SAMv1.pdf). They're the reads in IGV that are angled at the start of the alignment instead of the end.

ADD REPLY

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6