BLAST nr database
1
1
Entering edit mode
4.3 years ago

I am running the command for BLAST :

blastp -query /Users/shalini/Desktop/shalini/project/unmodelled_fasta/AAA18895.fasta -outfmt "7 sacc qcovs pident ppos evalue" -db=/Users/shalini/Downloads/nr -out=/Users/shalini/Desktop/shalini/project/blast_resultnr

After running for one and half hour, it results in

Error memory mapping:/Users/shalini/Downloads/nr.79.phr openedFilesCount=251 threadID=0
BLAST Database error: Cannot memory map /Users/shalini/Downloads/nr.79.phr. Number of files opened: 251

I am working on macOS. also I have built the database of nr which results in 114 .phr, .pin, .psq and .pog files.

Please help me, if anyone knows.

BLAST nr database • 4.6k views
ADD COMMENT
0
Entering edit mode

I am not sure but seems like memory error, please mention the size of available RAM in system.

ADD REPLY
0
Entering edit mode

It's 16 GB RAM memory in the macOS, I am using

ADD REPLY
0
Entering edit mode

nr database currently contains 142 files. Not sure why you built your own database instead of downloading the pre-built indexes. Are you using the latest blast+ software?

You also should not be using = in program options. I am surprised it seems to be working e.g. -db=/Users/shalini/Downloads/nr -out=

Instead of using bold please use the code button to present your code/errors so they are readable. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Thank you for your reply and suggestion I will use code button from the next time. Yes, the command line is running with = sign also. I didn't use the pre-built indexes because I am not finding the complete nr database. It is divided into parts (like nr.00.tar.gz, nr.01.tar.gz, etc.). So, I found a link (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) from where I only got the nr file not it's alias or index file. That's why I built the database using makeblastdb.

Yes, I am using latest blast+ software

ADD REPLY
0
Entering edit mode

I posted a link for pre-built nr database in my last comment. You need to download all nr files from there (no need to get md5 sum files) and then uncompress them in one directory. You may have an incomplete index (did you check to make sure there were no errors when you built the index?). 16G is probably not enough RAM for nr searches but if you have a large swap space defined it may work.

Even if = is working please don't use that method.

ADD REPLY
0
Entering edit mode

That means I need to use an external disc for nr database because in future, results will also need space. As I am using BLAST for ~7000 proteins. Yes I am sure, there were no errors when I built the index. Can you please tell me in all the nr database files (you have sent me the link), which one is having the FASTA files because I have downloaded 5 to 6 files and on untaring them, it gives only index files. We require FASTA file to put in the -db in the command of blastp.

ADD REPLY
0
Entering edit mode

We require FASTA file to put in the -db in the command of blastp.

No. -db has to point to the basename of the blast index being used. In this case it is nr.

That means I need to use an external disc for nr database because in future, results will also need space. As I am using BLAST for ~7000 proteins.

Using an external spinning disk will slow down everything. With 7000 proteins you will want to use blast options smartly to get data you need. Have you considered using -remote option to do the blast remotely at NCBI? You could batch the proteins in groups of 10 or 15.

ADD REPLY
0
Entering edit mode

You will have to download ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz , ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz, ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz ......... ftp://ftp.ncbi.nih.gov/blast/db/nr.142.tar.gz all these file. Extract all of them and then run your command. Just provide the path of this directory where you extracted these 143 pre-built blast database along with basename. For instance if you extract these gz files into /Users/shalini/Downloads/ directory than in command line write this "-db=/Users/shalini/Downloads/nr". However coming to your main issue, I dont think you will be able to run blast with your current system configuration, as this one has only 16GB RAM, and BLAST at least requires 32-64 RAM. However if you really want to do it on your current system. Than what you can do is, you can download the Fasta file for NR database from the link genomax provide. Than make small subsets of this fasta file, make blast database for each subset and than you can run blast with each subset. But here is a small issue, you will face in this approach is that your E-value will be highly impacted, as your search space is reduced due to subset of fasta file.

ADD REPLY
0
Entering edit mode

Thank you @prince26121991 for the help.

ADD REPLY
1
Entering edit mode
3.1 years ago
dxh0222 ▴ 10

That means you open files limit is 256, you can change this limit with this code on terminal, ulimit -n 9999, than you can check you file limit with code ulimit -n

ADD COMMENT
0
Entering edit mode

This was the issue for our error:

Error memory mapping:/combined_blast_indexes/nu/lb.g.1.nsi openedFilesCount=1019 threadID=0
BLAST Database error: Error pre-fetching sequence data

Changed the ulimit -n 9999 and the blast parser works fine. Spent a long time trying to trace this one down. Thank you for pointing it out!

ADD REPLY

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6