Question: BLAST nr database
0
gravatar for shalinikaushik1293
3 months ago by
shalinikaushik12930 wrote:

I am running the command for BLAST :

blastp -query /Users/shalini/Desktop/shalini/project/unmodelled_fasta/AAA18895.fasta -outfmt "7 sacc qcovs pident ppos evalue" -db=/Users/shalini/Downloads/nr -out=/Users/shalini/Desktop/shalini/project/blast_resultnr

After running for one and half hour, it results in

Error memory mapping:/Users/shalini/Downloads/nr.79.phr openedFilesCount=251 threadID=0
BLAST Database error: Cannot memory map /Users/shalini/Downloads/nr.79.phr. Number of files opened: 251

I am working on macOS. also I have built the database of nr which results in 114 .phr, .pin, .psq and .pog files.

Please help me, if anyone knows.

blast nr database • 189 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by shalinikaushik12930

I am not sure but seems like memory error, please mention the size of available RAM in system.

ADD REPLYlink written 3 months ago by prince2612199170

It's 16 GB RAM memory in the macOS, I am using

ADD REPLYlink written 3 months ago by shalinikaushik12930

nr database currently contains 142 files. Not sure why you built your own database instead of downloading the pre-built indexes. Are you using the latest blast+ software?

You also should not be using = in program options. I am surprised it seems to be working e.g. -db=/Users/shalini/Downloads/nr -out=

Instead of using bold please use the code button to present your code/errors so they are readable. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax80k

Thank you for your reply and suggestion I will use code button from the next time. Yes, the command line is running with = sign also. I didn't use the pre-built indexes because I am not finding the complete nr database. It is divided into parts (like nr.00.tar.gz, nr.01.tar.gz, etc.). So, I found a link (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) from where I only got the nr file not it's alias or index file. That's why I built the database using makeblastdb.

Yes, I am using latest blast+ software

ADD REPLYlink written 3 months ago by shalinikaushik12930

I posted a link for pre-built nr database in my last comment. You need to download all nr files from there (no need to get md5 sum files) and then uncompress them in one directory. You may have an incomplete index (did you check to make sure there were no errors when you built the index?). 16G is probably not enough RAM for nr searches but if you have a large swap space defined it may work.

Even if = is working please don't use that method.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax80k

That means I need to use an external disc for nr database because in future, results will also need space. As I am using BLAST for ~7000 proteins. Yes I am sure, there were no errors when I built the index. Can you please tell me in all the nr database files (you have sent me the link), which one is having the FASTA files because I have downloaded 5 to 6 files and on untaring them, it gives only index files. We require FASTA file to put in the -db in the command of blastp.

ADD REPLYlink written 3 months ago by shalinikaushik12930

We require FASTA file to put in the -db in the command of blastp.

No. -db has to point to the basename of the blast index being used. In this case it is nr.

That means I need to use an external disc for nr database because in future, results will also need space. As I am using BLAST for ~7000 proteins.

Using an external spinning disk will slow down everything. With 7000 proteins you will want to use blast options smartly to get data you need. Have you considered using -remote option to do the blast remotely at NCBI? You could batch the proteins in groups of 10 or 15.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax80k

You will have to download ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz , ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz, ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz ......... ftp://ftp.ncbi.nih.gov/blast/db/nr.142.tar.gz all these file. Extract all of them and then run your command. Just provide the path of this directory where you extracted these 143 pre-built blast database along with basename. For instance if you extract these gz files into /Users/shalini/Downloads/ directory than in command line write this "-db=/Users/shalini/Downloads/nr". However coming to your main issue, I dont think you will be able to run blast with your current system configuration, as this one has only 16GB RAM, and BLAST at least requires 32-64 RAM. However if you really want to do it on your current system. Than what you can do is, you can download the Fasta file for NR database from the link genomax provide. Than make small subsets of this fasta file, make blast database for each subset and than you can run blast with each subset. But here is a small issue, you will face in this approach is that your E-value will be highly impacted, as your search space is reduced due to subset of fasta file.

ADD REPLYlink written 3 months ago by prince2612199170

Thank you @prince26121991 for the help.

ADD REPLYlink written 12 weeks ago by shalinikaushik12930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1960 users visited in the last hour