Faster BLAST alternative
2
1
Entering edit mode
7.3 years ago
igor 13k

Are there some faster alternatives to BLAST (specifically nucleotide BLAST)? I like that I can search through all GenBank+EMBL+DDBJ+PDB+RefSeq sequences (nt collection), but I feel like there must be a faster way. If I wanted to identify thousands or millions of sequences, it's somewhat inefficient.

blast alignment sequencing • 9.6k views
ADD COMMENT
4
Entering edit mode
6.4 years ago
igor 13k

I guess what I was really asking for is a metagenomic classifier. There are a few of those out there:

Sure, these are not exactly same as BLAST, but if you need to quickly classify a lot of reads, these tools will do that.

ADD COMMENT
1
Entering edit mode
7.3 years ago
5heikki 10k

If you have millions of query sequences it's not a bad idea to cluster them and only blast the representative sequences. Further more, with millions of query sequences and no cluster at hand, it might be a good idea to select a smaller reference database such as UniRef90, but this depends on your research questions. I think DIAMOND is one of the most recent blast alternatives. As far as I recall, they overview some other alternatives in the article (don't have access from home). If you want to do just nucleotide-nucleotide another option would be blat.

ADD COMMENT
1
Entering edit mode

Unfortunately, DIAMOND is for protein (not nucleotide) alignment.

Blat is a good suggestion. Not sure how easy it would be to summarize the results.

ADD REPLY
1
Entering edit mode

Well, DIAMOND is blastx-like so nucletide-vs-protein, which is almost always better than nucleotide-nucleotide if you want to detect putative homologs. You haven't really told us anything about your research questions nor the type of your query sequences (length, source, etc.) so it's hard to say. Also blat has output option that is similar to tabular blast output, which is the way to go IMO.

ADD REPLY
0
Entering edit mode

Sorry if I was being too vague. I am trying to identify contaminants in raw sequencing data. For example, the reads should be human, but only 50% align to human. What are the other reads? I can check some likely contaminants, but I'd like to check against all known sequences.

ADD REPLY
0
Entering edit mode

If I were you, I would take a small subsample of the non-human mapping reads and blast then against nt to see what is going on..

ADD REPLY
0
Entering edit mode

See this thread: http://seqanswers.com/forums/showthread.php?t=60696

Hopefully you are not the same person as the originator of that thread.

ADD REPLY

Login before adding your answer.

Traffic: 860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6