Question: BLAST Database creation error
1
gravatar for vigneshprbh37
4.2 years ago by
INDIA
vigneshprbh3720 wrote:

hey so for the past two days i have been trying to install and execute a stand alone blast named

" ncbi-blast-2.2.30+" on a centos os system

i managed to download a nr ref sequence from ncbi ftp using the command

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz

and extracted the file using the command

 tar -xvpf nr.gz

i got an nr file of 33 gb size

but when i try to format the file using the command

./makeblastdb -in /home/Desktop/ncbi-blast-2.2.30+/db/nr -dbtype 'nucl' -input_type 'fasta' -out /home/Desktop/ncbi-blast-2.2.30+/output

i get the error showing

BLAST Database creation error: FASTA-Reader: No residues given

can anyone give any suggestions on the nature of the problem and how i can solve it?

blast alignment software error • 7.6k views
ADD COMMENTlink modified 4.0 years ago by Galo60 • written 4.2 years ago by vigneshprbh3720

You need a 'z' in your tar command for gzip files. Use -xvzf. The file is probably not unzipped correctly. but also, the nr database is protein ("Non-Redundant peptides") so you will be creating a protein database. Think about whether that is what you want and why. I believe that if you ran  the makeblastdb correctly it would tell you that you are mixing up protein and nucleotides. If you want the whole nucleotide database it is called 'nt ("Non-Translated nucleotides). I can't remember if they are the official acronyms or whether it's just what my brain uses to keep them the correct way around.

ADD REPLYlink written 4.2 years ago by Daniel3.7k
5
gravatar for Galo
4.0 years ago by
Galo60
Spain
Galo60 wrote:

Ok I think I found the problem. When making a blast db or using any masker algorithm like DUST Windowmasker etc, it raises an error if the program finds an empty record. This means a record in the form:

>gi|xxxx|

>gi|yyyyy|
AGACCGATGACT

I'm sure there are many ways of remove empty records from a fasta file. A simple and fast way is using awk. You could simply copy and paste this command in the terminal (obviously substituting the name of your file):

awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' your_fasta_file.fna > output.fna

If you want the explanation of the code follow this thread:

Removing All Empty Fasta Sequences From A File (Was: Editing The Headers Of The Fasta Format Sequence)

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Galo60

Worked for me, thanks!!

ADD REPLYlink written 2.9 years ago by biotech510

This occurs not only with empty records but also with completely masked ones BTW.

ADD REPLYlink written 7 months ago by gtrwst90
2
gravatar for 5heikki
4.2 years ago by
5heikki8.3k
Finland
5heikki8.3k wrote:

1. nr is a protein db, so -dbtype would be prot (there's no need for the apostrophes)

2. Is the filename really just "nr" after gunzip?

3. Why are you downloading the huge fasta file instead of the prebuilt db?

ADD COMMENTlink written 4.2 years ago by 5heikki8.3k

2.yes it is a 33.6 gb file.

  1. whats the difference with a prebuilt db, i don't know.
ADD REPLYlink written 4.2 years ago by vigneshprbh3720
1
gravatar for Galo
4.0 years ago by
Galo60
Spain
Galo60 wrote:

EDITED: Now I realize it wasn't the same error so it maybe won't work for you. Tell us if this works for you or the way you solved this problem =)

Hi, I got the same error either making a blastDB from fasta files or using a masking algorithm like DUST.

For me the problem was that some of my sequences had a blank line between the last line of nucleotides from one sequence and the header of the next one like:

>gi|xxxx|
ATGACCGT...
[[:BLANK:]]
>gi|yyyy|
ACGATCGG...

An easy way of remove that blank lines in UNIX is with grep:

grep -v '^$' fasta_with_blanks.fna > fasta_without_blanks.fna

Saludos!

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Galo60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour