Speed up BLASTp vs NCBI nr database
0
1
Entering edit mode
6.2 years ago
biotech ▴ 560

I'm trying to add functional information to a fungus genome annotated with AUGUSTUS. Here is my command. Is taking forever. Just one protein inside 'example_1.faa'

blastp -db /home/bernardo/Databases/BLAST/nr -query example_1.faa -out AWNI01.tab -evalue 0.000001 -outfmt "6 qseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qcovhsp" -max_target_seqs 1 -num_threads 16 # note outfmt and max_target_seqs

I thought about changing to other database. Maybe swissprot or refseq_protein, but possibly I will miss annotations.

Thanks

blast • 3.3k views
ADD COMMENT
2
Entering edit mode

If this is a standalone machine you may have done all you probably can but have you checked to see if all of your CPU cores are 100% busy (top/htop etc)? If they are not it could be an indication that your system is I/O bound and you can't do much else but be patient.

ADD REPLY
1
Entering edit mode

Do you have some benchmarking data for BLASTp? I have 6K proteins in my input file. I will have to wait maybe two or three days?

ADD REPLY
1
Entering edit mode

You said you have only one protein in the file. So this is just a test?

If you have 6K proteins to do then you would want to do this somewhere else. Ideally a cluster with your input file split into 100 sequence chunks, in parallel.

ADD REPLY
1
Entering edit mode

Yes it's just a test. Thanks for the chunks idea.

ADD REPLY
1
Entering edit mode

I'm trying to annotate a fungus. Maybe extracting a group of fungus data from the nr database would be an alternative.

ADD REPLY
1
Entering edit mode

Did you download the complete nr database from the ftp site (most recent has 49 parts)? Did you do makeblastdb on all parts of the nr db (you should have gotten an error if you didn't)? This post could also help: Blast Help On Nucleotide Collection Nr/Nt

ADD REPLY
1
Entering edit mode

It's running. I have correct output. Dowloaded the preformatted BLAST database for nr.

ADD REPLY
1
Entering edit mode

I did this nohup wget ftp://anonymous@ftp.ncbi.nih.gov/blast/db/nr.* > foo_wget3.out 2> foo_wget3.err < /dev/null &

ADD REPLY
1
Entering edit mode

I will go towards custom database for now. I will use all proteins of sequenced same fungus genus. Seems there are some full genomes available. Just 26K, what a tiny thing!!

ADD REPLY
1
Entering edit mode

Great. Took just 20min, the exact time for a coffee.

ADD REPLY

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6