Alternative to blast, faster but less sensitive?
2
0
Entering edit mode
8.0 years ago

I am looking for an alternative to blast that can search the UniProt sequence database in a faster way by losing some sensitivity. I am interesting in finding only relatively close homolog sequences, say >~ 50% sequence identity. Thus, sensitivity is not an issue. I thought it could be possible to do that with blast by tuning some parameters like word length and so on but I haven't gotten anywhere. I've come across USEARCH that claims better performance but could not try it out since one needs a license for it.

Any ideas around?

sequence blast • 2.7k views
0
Entering edit mode

The license for the 32 bit USEARCH is free and will almost certainly cover your needs. As to other blast alternatives, you could check out HMMER (and pfamscan). You wouldn't be searching against UniProt though but hmmprofiles. Depending on your research question, it can easily be the better option..

0
Entering edit mode

I actually had a go with the 32bit USEARCH, but unfortunately 32 bit is an issue these days due to the size of the sequence databases. You need more than 4GB of memory (i.e. you need 64 bit executable) to go through the whole UniProt database or even to do only a fraction of it.

0
Entering edit mode

UniProt is actually a rather small database, I think UniRef100 has just 35M entries. I'm almost certain that the 32 bit USEARCH binaries should handle it just fine. The 64-bit licensees listed at the USEARCH site deal with databases that are orders of magnitudes larger. Did you read the manual?

0
Entering edit mode

I did this test sometime ago, using a UniRef100 from 2009 (i.e. a lot smaller than current ones):

$./usearch7.0.959_i86linux32 -makeudb_usearch uniprot_trembl.fasta -output uniprot_trembl.udb usearch v7.0.959_i86linux32, 4.0Gb RAM (49.4Gb total), 12 cores (C) Copyright 2013 Robert C. Edgar, all rights reserved. http://drive5.com 00:00 19Mb Reading input 00:18 942Mb 100.0% Masking 00:43 955Mb 100.0% Word stats 00:44 2.9Gb 57.2% Building slots Out of memory mymalloc(1032), curr 4.14e+09 bytes myutils.cpp(2136): ./usearch7.0.959_i86linux32 -makeudb_usearch uniprot_trembl.fasta -output uniprot_trembl.udb ---Fatal error--- Out of memory, mymalloc(1032), curr 4.14e+09 bytes ADD REPLY 0 Entering edit mode I see you didn't read the manual page I linked to the previous post. ADD REPLY 0 Entering edit mode Well I'm sorry but I still don't see where I'm going wrong (I did go through the manual by the way). Do I not need to run the makeudb_search first? The manual says "A database file must be specified using the ‑db option. FASTA and .udb formats are supported. For large databases, .udb format is recommended (see makeudb_usearch command)." For completeness I've just run this command, with similar memory issues:$ ./usearch7.0.959_i86linux32 -usearch_global O00560.fa -db uniprot_trembl.fasta -id 0.8 -alnout results.aln
usearch v7.0.959_i86linux32, 4.0Gb RAM (49.4Gb total), 12 cores
http://drive5.com
02:43 3.6Gb  100.0% Word stats
02:43 3.6Gb    0.0% Building slots
Out of memory mymalloc(6632), curr 4.16e+09 bytes

myutils.cpp(2136):

./usearch7.0.959_i86linux32 -usearch_global O00560.fa -db uniprot_trembl.fasta -id 0.8 -alnout results.aln

---Fatal error---
Out of memory, mymalloc(6632), curr 4.16e+09 bytes

Now that I'm reading other threads in biostars (this I didn't do before) I'm seeing that other users are having similar issues with memory, see this reply: C: Looking For Faster Blastp-Like Program?

So maybe I should change my question: are there any positive experiences with usearch? or could someone provide feedback on how does usearch work in real-life? I would definitely get a license for it, but first I have to check that it does what I need.

0
Entering edit mode

See also this comment A: How To Solve An Out-Of-Memory Error When Using Usearch Chimera Detection with regards to memory problems and 32 bit usearch

2
Entering edit mode
8.0 years ago
cdsouthan ★ 1.9k

BLAT is obvious choice for blistering speed but don't know feasibiity of implementation

0
Entering edit mode

agree with u

1
Entering edit mode
6.6 years ago

Try PAUDA.