Comparing alignment stats BWAmem vs. Soap2 aligner
0
0
Entering edit mode
3.4 years ago
pinn ▴ 160

Hi,

I aligned a human genome dataset with two different aligners 1) BWAmem 2) Soap2. while comparing the alignment output for a aligned bam files I find a lot of difference.

1) no. of mapped reads 2) properly paired

,any suggestions ..

**BWAmem**
samtools flagstat -@ 10  SRR2962693_BWAmem.sam
0 + 0 secondary
63900 + 0 supplementary
0 + 0 duplicates
**118444503 + 0 mapped (99.85% : N/A)**
118564172 + 0 paired in sequencing
**117806380 + 0 properly paired (99.36% : N/A)**
118326136 + 0 with itself and mate mapped
54467 + 0 singletons (0.05% : N/A)
177530 + 0 with mate mapped to a different chr
130003 + 0 with mate mapped to a different chr (mapQ>=5)


Soap2

samtools flagstat -@ 10 SRR2962693_3.sam
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
**1948750 + 0 mapped (100.00% : N/A)**
1948750 + 0 paired in sequencing
**1948750 + 0 properly paired (100.00% : N/A)**
1948750 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


Here are the commands used:

Soap2

soap -D soap2.21release/index/hg38.fa.index -a benchmarkdatasets/sickle_SRR2052405_R1q20.fastq -b benchmarkdatasets/sickle_SRR2052405_R2q20.fastq -o SRR2052405.soap -2 unpaired-hits -u unmapped.fq -l 256 -p 5


BWA

bwa mem -t 10 BWA-MEM/hg38 sample1/fixed1_normal.fq sample1/fixed2_normal.fq > fixed.sam

genome next-gen alignment • 2.2k views
2
Entering edit mode

Just asking out of interest. Why are you comparing these two aligners? I am asking because sometimes one wastes a lot of time doing these kind of comparisons, and in the end, you use BWA anyway, because it is the most widely used, well-documented, maintained and accepted software out there, especially when it comes to mutation calling and structural variant detection. You might consider "simply" to use. I say that because I myself wasted a lot of time doing things like this.

1
Entering edit mode

Something about this result is not right. Why are there 10x more reads in bwa stats than soap2 if this is the same exact input file?

0
Entering edit mode

OP needs to add their exact alignment commands

0
Entering edit mode

Soap2 CMD

soap -D soap2.21release/index/hg38.fa.index -a benchmarkdatasets/sickle_SRR2052405_R1q20.fastq -b benchmarkdatasets/sickle_SRR2052405_R2q20.fastq -o SRR2052405.soap -2 unpaired-hits -u unmapped.fq -l 256 -p 5


BWA CMD

bwa mem -t 10 BWA-MEM/hg38 sample1/fixed1_normal.fq sample1/fixed2_normal.fq > fixed.sam

0
Entering edit mode

0
Entering edit mode

yes, the exact same file. I performed alignment on 4 different human genome datasets, more reads in bwa ?. I tried with other aligners like cushaw3, bowtie2, Kart and Whisper the stats doesn't differ much among these aligners when compare with soap2 stats I found lot of difference. How to sort it out ?

0
Entering edit mode

I agree with ATpoint that this probably is not worth your time to puzzle out. You have sane answers from multiple other aligners, there is something wrong with soap (maybe your genome index is corrupted?) so I'd just move on and use bowtie2 or bwa, on the grounds that they are more popular; it's easier to get help with them, and it's easier for others to replicate your work with them (Is that true of bowtie2 anymore?). Rather than finding the aligner that gives you a fraction of a percentage better alignment numbers, better to pick one, and study it and its options to know how to use it best.

0
Entering edit mode