Question: Speed up BLASTp vs NCBI nr database
1
gravatar for biotech
3.0 years ago by
biotech520
United States
biotech520 wrote:

I'm trying to add functional information to a fungus genome annotated with AUGUSTUS. Here is my command. Is taking forever. Just one protein inside 'example_1.faa'

blastp -db /home/bernardo/Databases/BLAST/nr -query example_1.faa -out AWNI01.tab -evalue 0.000001 -outfmt "6 qseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qcovhsp" -max_target_seqs 1 -num_threads 16 # note outfmt and max_target_seqs

I thought about changing to other database. Maybe swissprot or refseq_protein, but possibly I will miss annotations.

Thanks

blast • 1.4k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by biotech520
2

If this is a standalone machine you may have done all you probably can but have you checked to see if all of your CPU cores are 100% busy (top/htop etc)? If they are not it could be an indication that your system is I/O bound and you can't do much else but be patient.

ADD REPLYlink written 3.0 years ago by genomax67k
1

Do you have some benchmarking data for BLASTp? I have 6K proteins in my input file. I will have to wait maybe two or three days?

ADD REPLYlink written 3.0 years ago by biotech520
1

You said you have only one protein in the file. So this is just a test?

If you have 6K proteins to do then you would want to do this somewhere else. Ideally a cluster with your input file split into 100 sequence chunks, in parallel.

ADD REPLYlink written 3.0 years ago by genomax67k
1

Yes it's just a test. Thanks for the chunks idea.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by biotech520
1

I'm trying to annotate a fungus. Maybe extracting a group of fungus data from the nr database would be an alternative.

ADD REPLYlink written 3.0 years ago by biotech520
1

Did you download the complete nr database from the ftp site (most recent has 49 parts)? Did you do makeblastdb on all parts of the nr db (you should have gotten an error if you didn't)? This post could also help: Blast Help On Nucleotide Collection Nr/Nt

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by st.ph.n2.4k
1

It's running. I have correct output. Dowloaded the preformatted BLAST database for nr.

ADD REPLYlink written 3.0 years ago by biotech520
1

I did this nohup wget ftp://anonymous@ftp.ncbi.nih.gov/blast/db/nr.* > foo_wget3.out 2> foo_wget3.err < /dev/null &

ADD REPLYlink written 3.0 years ago by biotech520
1

I will go towards custom database for now. I will use all proteins of sequenced same fungus genus. Seems there are some full genomes available. Just 26K, what a tiny thing!!

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by biotech520
1

Great. Took just 20min, the exact time for a coffee.

ADD REPLYlink written 3.0 years ago by biotech520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 504 users visited in the last hour