Are there some faster alternatives to BLAST (specifically nucleotide BLAST)? I like that I can search through all GenBank+EMBL+DDBJ+PDB+RefSeq sequences (nt collection), but I feel like there must be a faster way. If I wanted to identify thousands or millions of sequences, it's somewhat inefficient.
I guess what I was really asking for is a metagenomic classifier. There are a few of those out there:
- Kraken: https://ccb.jhu.edu/software/kraken/
- GOTTCHA: http://lanl-bioinformatics.github.io/GOTTCHA/
- CLARK: http://clark.cs.ucr.edu/
- MetaPhlAn: http://huttenhower.sph.harvard.edu/metaphlan
Sure, these are not exactly same as BLAST, but if you need to quickly classify a lot of reads, these tools will do that.
If you have millions of query sequences it's not a bad idea to cluster them and only blast the representative sequences. Further more, with millions of query sequences and no cluster at hand, it might be a good idea to select a smaller reference database such as UniRef90, but this depends on your research questions. I think DIAMOND is one of the most recent blast alternatives. As far as I recall, they overview some other alternatives in the article (don't have access from home). If you want to do just nucleotide-nucleotide another option would be blat.