BLAST Database creation error
3
1
Entering edit mode
9.3 years ago

Hey so for the past two days I have been trying to install and execute a stand alone blast named ncbi-blast-2.2.30+ on a centos os system. I managed to download a nr ref sequence from ncbi ftp using the command wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz and extracted the file using the command tar -xvpf nr.gz. I got an nr file of 33 gb size but when I try to format the file using the command

./makeblastdb -in /home/Desktop/ncbi-blast-2.2.30+/db/nr -dbtype 'nucl' -input_type 'fasta' -out /home/Desktop/ncbi-blast-2.2.30+/output

I get the error showing

BLAST Database creation error: FASTA-Reader: No residues given

Can anyone give any suggestions on the nature of the problem and how I can solve it?

alignment software-error blast • 13k views
ADD COMMENT
0
Entering edit mode

You need a z in your tar command for gzip files. Use -xvzf. The file is probably not unzipped correctly. but also, the nr database is protein ("Non-Redundant peptides") so you will be creating a protein database. Think about whether that is what you want and why. I believe that if you ran the makeblastdb correctly it would tell you that you are mixing up protein and nucleotides. If you want the whole nucleotide database it is called 'nt ("Non-Translated nucleotides). I can't remember if they are the official acronyms or whether it's just what my brain uses to keep them the correct way around.

ADD REPLY
6
Entering edit mode
9.1 years ago
Galo ▴ 70

Ok I think I found the problem. When making a blast db or using any masker algorithm like DUST Windowmasker etc, it raises an error if the program finds an empty record. This means a record in the form:

>gi|xxxx|

>gi|yyyyy|
AGACCGATGACT

I'm sure there are many ways of remove empty records from a fasta file. A simple and fast way is using awk. You could simply copy and paste this command in the terminal (obviously substituting the name of your file):

awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' your_fasta_file.fna > output.fna

If you want the explanation of the code follow this thread:

Removing All Empty Fasta Sequences From A File (Was: Editing The Headers Of The Fasta Format Sequence)

ADD COMMENT
0
Entering edit mode

Worked for me, thanks!!

ADD REPLY
0
Entering edit mode

This occurs not only with empty records but also with completely masked ones BTW.

ADD REPLY
2
Entering edit mode
9.3 years ago
5heikki 11k
  1. nr is a protein db, so -dbtype would be prot (there's no need for the apostrophes)
  2. Is the filename really just "nr" after gunzip?
  3. Why are you downloading the huge fasta file instead of the prebuilt db?
ADD COMMENT
0
Entering edit mode

2. yes it is a 33.6 gb file.

3. What's the difference with a prebuilt db, I don't know.

ADD REPLY
1
Entering edit mode
9.1 years ago
Galo ▴ 70

EDITED: Now I realize it wasn't the same error so it maybe won't work for you. Tell us if this works for you or the way you solved this problem =)

Hi, I got the same error either making a blastDB from fasta files or using a masking algorithm like DUST.

For me the problem was that some of my sequences had a blank line between the last line of nucleotides from one sequence and the header of the next one like:

>gi|xxxx|
ATGACCGT...
[[:BLANK:]]
>gi|yyyy|
ACGATCGG...

An easy way of remove that blank lines in UNIX is with grep:

grep -v '^$' fasta_with_blanks.fna > fasta_without_blanks.fna

Saludos!

ADD COMMENT

Login before adding your answer.

Traffic: 2762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6