I am looking for an alternative to blast that can search the UniProt sequence database in a faster way by losing some sensitivity. I am interesting in finding only relatively close homolog sequences, say >~ 50% sequence identity. Thus, sensitivity is not an issue. I thought it could be possible to do that with blast by tuning some parameters like word length and so on but I haven't gotten anywhere. I've come across USEARCH that claims better performance but could not try it out since one needs a license for it.
Any ideas around?
The license for the 32 bit USEARCH is free and will almost certainly cover your needs. As to other blast alternatives, you could check out HMMER (and pfamscan). You wouldn't be searching against UniProt though but hmmprofiles. Depending on your research question, it can easily be the better option..
I actually had a go with the 32bit USEARCH, but unfortunately 32 bit is an issue these days due to the size of the sequence databases. You need more than 4GB of memory (i.e. you need 64 bit executable) to go through the whole UniProt database or even to do only a fraction of it.
UniProt is actually a rather small database, I think UniRef100 has just 35M entries. I'm almost certain that the 32 bit USEARCH binaries should handle it just fine. The 64-bit licensees listed at the USEARCH site deal with databases that are orders of magnitudes larger. Did you read the manual?
I did this test sometime ago, using a UniRef100 from 2009 (i.e. a lot smaller than current ones):
I see you didn't read the manual page I linked to the previous post.
Well I'm sorry but I still don't see where I'm going wrong (I did go through the manual by the way). Do I not need to run the makeudb_search first? The manual says "A database file must be specified using the ‑db option. FASTA and .udb formats are supported. For large databases, .udb format is recommended (see makeudb_usearch command)." For completeness I've just run this command, with similar memory issues:
Now that I'm reading other threads in biostars (this I didn't do before) I'm seeing that other users are having similar issues with memory, see this reply: C: Looking For Faster Blastp-Like Program?
So maybe I should change my question: are there any positive experiences with usearch? or could someone provide feedback on how does usearch work in real-life? I would definitely get a license for it, but first I have to check that it does what I need.
See also this comment A: How To Solve An Out-Of-Memory Error When Using Usearch Chimera Detection with regards to memory problems and 32 bit usearch