Comparison Of Soap And Bwa
3
4
Entering edit mode
9.8 years ago

Hi,

Could you detail me about the differences between SOAP2 and BWA (latest version of both) (though both use BWT algorithm).

Which one is faster?

Which one uses lesser memory?

Which one is more accurate and which one is more widely used for human genome sequence alignment and why?

In both the cases I want to compare single read alignment of human query gnome with the given reference gnome.

bwa next-gen sequencing human read • 10k views
16
Entering edit mode
9.8 years ago
lh3 32k

From my evaluation and an internal evaluation done by 1000g:

On specificity (well, bwa is not far off):

novoalign~stampy+bwa>bwa>soap2>bowtie


On single-end speed:

bowtie~soap2>bwa>novoalign~stampy+bwa


On paired-end speed:

soap2>bwa~bowtie>novoalign


On paired-end sensitivity, I guess:

bwa~soap2>bowtie


On single-end sensitivity, I guess:

soap2>bwa~bowtie


On memory (">" means better or less memory):

bowtie>bwa>soap2>novoalign


On citations:

bowtie~bwa>soap2


People choose bowtie and bwa more often probably because both natively support the SAM output, while soap2 not. Bowtie is often seen in RNA-seq/ChIP-seq because it is extremely fast for single-end reads and because the whole tophat/cufflink package is very useful. BWA is often seen for SNP/indel calling because it does gapped alignment and produces fewer false alignment. BWA/stampy/novoalign estimate mapping quality which is at times useful. Bowtie/soap2 do not, which is why they are faster.

When you really come to very rare events (e.g. somatic mutations, structural variations, RNA editing and rare splicing form), probably you should consider novoalign/stampy or even trying two aligners at the same time.

0
Entering edit mode

@lh3: I profiled both your latest version of bwa (short read) and soap2 in "intel vtune" for "single end read" of human "chromosomes10" and "chromsomeX" but found the soap2 faster. That doesn't follow your answer. Am I missing something?

0
Entering edit mode

I am saying for single-end reads soap2 is faster than bwa?

0
Entering edit mode

Probably just had trouble parsing that very useful and information dense paragraph. Would be a bit more readable if you tabulated those results or at least started each '>' string on its own line...

0
Entering edit mode

I just quickly edited the post because it was quite unreadable despite the interesting information.

0
Entering edit mode

I have a question regarding specificity: when comparing specificity for aligners and getting something below 100%, does that mean that some tools indeed call 'false' alignments. How is it at all possible?

0
Entering edit mode

Essentially no aligner guarantees to find the "best" alignment from the human genome. Even if an aligner could achieve the best, the "best" is not necessarily the correct.

0
Entering edit mode

You did not include novoalign in sensitivity, do you kow how it compares to soap2/bwa/bowtie?

5
Entering edit mode
9.8 years ago
Benm ▴ 710

Please check this paper: Bao S, etc., Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet. 2011 Apr 28.PubMed

BTW, the latest version of SOAPaligner is SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads.

1
Entering edit mode

FYI, That citations appears to have been retracted by the publisher.

0
Entering edit mode

This paper does not tell us about accuracy.

0
Entering edit mode
0
Entering edit mode

But is it available? PubMed indeed lists this as WITHDRAWN:

http://www.ncbi.nlm.nih.gov/pubmed/21677664.1#

2
Entering edit mode

Late reply because I just found this thread: The "Withdrawn"-status is because of double publishing,

"The publisher is retracting this Review. The same Review was made available online on 28 April 2011 and published in the June issue of Journal of Human Genetics (doi:10.1038/jhg.2011.43)."

It's still available at the link.

0
Entering edit mode
8.2 years ago

I found Bowtie2GP to be 4 times faster than BWA on some human pair-end alignments, took less than half the memory and almost the same accuracy. See http://arxiv.org/abs/1301.5187

Bill

0
Entering edit mode

The following is my reply in email. GP is quite interesting anyway.

Thank you. Genetic improvement sounds very interesting. I will look into it further. On the other hand, for short-read alignment, there is more than speed and sensitivity. We have known for a long time that bowtie is several to 10 times faster than BWA with comparable sensitivity and I could make BWA much faster without reducing its sensitivity. However, I keep BWA as it is and BWA is still more widely used for variant discovery and cancer projects. This is firstly because of its accuracy (your Table 2) and secondly for its power to distinguish good and bad hits. For those applications, a tiny fraction of wrong alignments make up many false calls. It is critical to inform the caller which alignment to trust. Bowtie cannot do that.

Bowtie2 version beta5 or later (not earlier versions) competes well with bwa for 100bp reads and is likely to surpass bwa (not bwa-sw) for 200bp reads. I would be interested in the comparison for the typical 100bp and the upcoming 250bp reads, instead of the 36bp reads in your RN. Aligner performance/accuracy is greatly affected by the read length.

Thanks,

Heng

1
Entering edit mode

Dear Heng, I replied to your email before spotting your posting. So for everyone else here are my thoughts:

Yes it does seem that people are moving to longer read lengths. We measured 36bp because that is what the Cancer Institute used. One of the longer term goals of the project is to make it easier to tune software as its users change their requirement. So whilst we optimised Bowtie2GP for 36bp single ended, it was nice to see the optimisation still held for pair-end but Bioinformatics, like many fields, sees a pretty much continual change in data. At present people are forced to keep up by manual code changes. Whilst here we have an automated approach, it may be that people will want to run GP on the new data and inspect its suggested optimisations before allowing them to be implemented. I guess for BWA, there might be a user controlled switch which enabled the GP code tweaks. Initially it could default to off and only later (when users have more confidence in it) it might default to on. ....

If you use Linux, the 64bit binary is available via ftp http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/bowtie2gp Alternatively I could [POST] the three GP optimised source files.

Thanks again

Bill