Question

Faster BLAST alternative

1

Entering edit mode

10.1 years ago

igor 13k

Are there some faster alternatives to BLAST (specifically nucleotide BLAST)? I like that I can search through all GenBank+EMBL+DDBJ+PDB+RefSeq sequences (nt collection), but I feel like there must be a faster way. If I wanted to identify thousands or millions of sequences, it's somewhat inefficient.

blast sequencing alignment • 14k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by igor 13k

1

Entering edit mode

10.1 years ago

5heikki 11k

If you have millions of query sequences it's not a bad idea to cluster them and only blast the representative sequences. Further more, with millions of query sequences and no cluster at hand, it might be a good idea to select a smaller reference database such as UniRef90, but this depends on your research questions. I think DIAMOND is one of the most recent blast alternatives. As far as I recall, they overview some other alternatives in the article (don't have access from home). If you want to do just nucleotide-nucleotide another option would be blat.

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by 5heikki 11k

1

Entering edit mode

Unfortunately, DIAMOND is for protein (not nucleotide) alignment.

Blat is a good suggestion. Not sure how easy it would be to summarize the results.

ADD REPLY • link 10.1 years ago by igor 13k

1

Entering edit mode

Well, DIAMOND is blastx-like so nucletide-vs-protein, which is almost always better than nucleotide-nucleotide if you want to detect putative homologs. You haven't really told us anything about your research questions nor the type of your query sequences (length, source, etc.) so it's hard to say. Also blat has output option that is similar to tabular blast output, which is the way to go IMO.

ADD REPLY • link 10.1 years ago by 5heikki 11k

0

Entering edit mode

Sorry if I was being too vague. I am trying to identify contaminants in raw sequencing data. For example, the reads should be human, but only 50% align to human. What are the other reads? I can check some likely contaminants, but I'd like to check against all known sequences.

ADD REPLY • link 10.1 years ago by igor 13k

0

Entering edit mode

If I were you, I would take a small subsample of the non-human mapping reads and blast then against nt to see what is going on..

ADD REPLY • link 10.1 years ago by 5heikki 11k

0

Entering edit mode

See this thread: http://seqanswers.com/forums/showthread.php?t=60696

Hopefully you are not the same person as the originator of that thread.

ADD REPLY • link 10.1 years ago by GenoMax 152k

score 4 · Accepted Answer · 2016-05-12

I guess what I was really asking for is a metagenomic classifier. There are a few of those out there:

Kraken: https://ccb.jhu.edu/software/kraken/
GOTTCHA: http://lanl-bioinformatics.github.io/GOTTCHA/
CLARK: http://clark.cs.ucr.edu/
MetaPhlAn: http://huttenhower.sph.harvard.edu/metaphlan

Sure, these are not exactly same as BLAST, but if you need to quickly classify a lot of reads, these tools will do that.