Hi,
I am trying to blast a fasta file of protein sequences against the non-redundant database on a HPC. I run the following command:
cat prot/split_fasta/master.dataframe.tide-tandem.protein.part_001.fa | parallel --GNU --block 100k --recstart '>' --pipe '/home/users/nus/e0470749/ncbi-blast-2.8.1+/bin/blastp -query - -db nr -outfmt "6 std slen qlen stitle staxids sscinames" -max_target_seqs 500 -num_threads 12 -evalue 0.001' > seps_nr_out_001.txt
However, the job gets terminated with Exit status: 1. I thought that this was a memory issue based on previous posts with the same error. Hence, I tried to break my original FASTA file (10,000 sequence) into smaller parts. The current file contains ~ 100 sequences now. I also run the job with 1 TB of memory which seems to be sufficient based on the usage report:
Resource Usage on 2021-11-08 11:52:18.892810:
JobId: 6845745.wlm01
Project: personal
Exit Status: 1
NCPUs Requested: 12 NCPUs Used: 12
CPU Time Used: 11:50:04
Memory Requested: 1tb Memory Used: 159785036kb
Vmem Used: 266577592kb
Walltime requested: 12:00:00 Walltime Used: 01:39:10
Execution Nodes Used: (lmn2609:mem=1073741824kb:ncpus=12)
The Blast database also seems to be normal. Running ~/ncbi-blast-2.8.1+/bin/blastdbcmd -info -db blastdb/nr
gives:
Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 436,338,278 sequences; 161,860,501,762 total residues
Is there any thing else I can try to solve this? The only other thing I can think of is to downgrade BLAST.
You are using an older version of
blast+
which may be incompatible with currentnr
(I assume you downloaded the pre0formatted indexes which are nowv.5
). You can update yourblast
package to latest and see if the helps.Can you show us what your fasta headers look like?
Thanks for your help. Yes, I am using the pre-formatted nr database (v. 5). I am using a slightly older Blast+ (v. 2.8.1) as the hpc server I am working on has an outdated GLIBC. When I use the latest blast by running
./ncbi-blast-2.12.0+/bin/blast+
, I get this error:I will try using an older nr database (v. 4) in this case.
These are the first few lines of my FASTA file:
Unless you create a new version of v.4 indexes yourself
nr
old database version that you can download from NCBI is frozen as of Feb 2020. Keep that in mind.You could try a small subset of the fasta you have and see if you get that error. If you do then you may want to remove
(+1)
etc from the fasta headers and see if that eliminates the error.