Question: Output blastx top hits as fasta of the original sequences
gravatar for bonsalldavid
3.2 years ago by
bonsalldavid0 wrote:

I have performed a blastx of 1,000,000,000 short sequences against a small database. This took a while,

blastall -p blastx -i infile.fa -d prot_blastdb -o outfile.txt -m 8 -S 3 -b 1 -e 0.001

I then filtered the original fasta file to pull out those with top hits. I can share my awk script for this if asked, but i'm wondering if there is a way to get blast to output the fasta sequences directly to a file as it finds them. It seems silly making two passes of the same file.

blast blastx • 1.0k views
ADD COMMENTlink written 3.2 years ago by bonsalldavid0

Use blastdbcmd utility (part of blast+) along with a list of accession numbers (one per line in a file with -entry_batch option with blastdbcmd) you are interested in to retrieve the sequences you need. While this is still a two step process it is reasonably fast.

ADD REPLYlink written 3.2 years ago by genomax75k

...just had a thought. There might be faster options for filtering (eg hashing??) if blast output a numerical index for each read. I could add this artificially by prefixing the read name, but this would be a last resort I think.

ADD REPLYlink written 3.2 years ago by bonsalldavid0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2082 users visited in the last hour