Bwa And Bwa Mem Produce Different Alignments
1
1
Entering edit mode
7.1 years ago

Hello

I have MiSeq paired-end samples that I aligned before using the classical approach (bwa aln, bwa sampe, bwa fixmate ) and that I realigned using bwa_mem.

Once the bam files generated and sorted I run samtools flagstat to get the difference between the two results and here is what I found :

bwa

3685558 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
452652 + 0 mapped (12.28%:nan%)
3685558 + 0 paired in sequencing
1842779 + 0 read1
1842779 + 0 read2
158326 + 0 properly paired (4.30%:nan%)
426744 + 0 with itself and mate mapped
25908 + 0 singletons (0.70%:nan%)
139248 + 0 with mate mapped to a different chr
80182 + 0 with mate mapped to a different chr (mapQ>=5)


and bwa mem :

3937425 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
3216740 + 0 mapped (81.70%:nan%)
3937425 + 0 paired in sequencing
1968780 + 0 read1
1968645 + 0 read2
291160 + 0 properly paired (7.39%:nan%)
2523397 + 0 with itself and mate mapped
693343 + 0 singletons (17.61%:nan%)
2097921 + 0 with mate mapped to a different chr
1488875 + 0 with mate mapped to a different chr (mapQ>=5)


Checking the mapped reads only using samtools -c -F4 I found 3216740 for the method with bwa mem and 452652 for the classical bwa method

Although this points to that bwa mem is much better, I find it strange that the mate mapped to different chr is huge with bwa mem in comparison to bwa.

Any idea on the significance of such difference ? on some locations I see huge coverage in comparison to the alignment generated with the classical bwa version

samtools • 5.3k views
1
Entering edit mode

That's actually 15% (bwa mem) mapped to different chromosome against 30% (bwa), which makes of bwa mem result better than the first one, am I correct ? especially with high % of mapped reads

0
Entering edit mode
7.1 years ago

Is there something special about your sample preparation? I think both alignments seem to indicate a relatively high translocation rate and I would typically see >90% alignment rate with either normal BWA or BWA-MEM when working with DNA-Seq data in a standard organsim (such as a human exome dataset). So, the normal BWA alignment seems abnormal (and even BWA-MEM seems suboptimal)

I would also agree with the comment from aradwen - don't forget that you need to consider the proportion of alignment types to the number of aligned (not total) reads

0
Entering edit mode

Thanks @cwarden45, these are mouse DNA amplified with human primer, most of the primers should fail to amplify the mouse genome. Does this explain the high level of translocation ?

0
Entering edit mode

I don't know - I think there was one case where I initially aligned a mouse RNA-Seq dataset to the human genome by accident. I remember the alignment percentage going way down (< 50%) but I don't recall what the distribution of aligned reads looked like (and it might have been single-end data anyways). So, I think this might explain why the alignment percentage was low, but I don't think I can say much else.

If you want to only look at mouse DNA, you can first filter out reads that align to the human genome and then see what the aligned read distribution looks like among the remaining reads. I'm guessing you are targeting genes and not expecting any translocations within the gene, so perhaps you can see if this strategy increases the number of "properly paired" alignments.

Also, I apologize for not noticing that you commented on your own question ;)

0
Entering edit mode

Thank you cwarden45, actually the problem is not why the alignment percentage is low (it has to be like that actually) the problem I don't get is why it is high with bwa mem!! reads are not supposed to have that much high level of mapping !! Strange that bwa mem is reporting this

0
Entering edit mode

I think that is strange - I don't know for sure, but maybe it has to do with something strange about your to the library preparation. For example, I would say the alignment rate for BWA and BWA-MEM is both >90% in the data that I have worked with. I've never actually seen a sample like this. Maybe someone else can provide more specific help.

0
Entering edit mode

neither bwa nor bwa-mem should align at 90% this sample, this is a control sample (mouse DNA aligned to human genome). bwa makes sense, bwa-mem not at all. I looked at the alignments with IGV and it is a messy alignment, even though there is a high % of mapping. (mapping <> good alignment)

0
Entering edit mode

Any update on this issue? Did you figure out what the problem was?

1
Entering edit mode

Yes, we had an experiment in the lab that was not suppose to give a bullet proof alignment, bwa-mem failed in the sense that it was trying to align the sequences anyway, which is not supposed to happen, which means the result does not make sense biologically. I ended up preferring bowtie2 on bwa for these analyses in particular.

0
Entering edit mode

Perhaps this has something to do with the fact that bwa-mem uses local alignment and bwa-aln uses global alignment. The way you prepared you library may gets lots of reads cannot be completed aligned to the reference (meaning there are lots of mismatches in the middle). Reads like that are more acceptable for local alignment but not for global alignment.