Question

makeblastdb options error

0

Entering edit mode

7.5 years ago

StullenBrot • 0

Hello,

I wanted to generate a blast db from my fasta file and got an error using the makeblastdb tool. When executing the tool with the following command:

C:\WINDOWS\system32>makeblastdb -in C:\\Users\\Dennis\\Desktop\\db\\db.fasta -dbtype prot -out C:\\Users\\Dennis\\db\\output

I receive:

Building a new DB, current time: 11/03/2016 13:36:06 New DB name:  
C:\\Users\\Dennis\\db\\output New DB title: 
C:\\Users\\Dennis\\Desktop\\db\\db.fasta Sequence type: Protein Keep
Linkouts: T Keep MBits: T Maximum file size: 1000000000B BLAST options
error: C:\\Users\\Dennis\\Desktop\\db\\db.fasta does not match input
format type, default input type is FASTA

The database though is in fasta format and looks like this:

>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTV.....
>sp|P62258_REVERSED|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1-REVERSED
QNEDEVDQLAEKNQEEGDGQMDSTWLTLNDRLLQMILTSD....

(so one line with protein info after ">" and the next line with the protein sequence)

I don't know really what the problem here is so I ask you for help.

Thank you very much.

Dennis

blast • 2.8k views

ADD COMMENT • link updated 7.5 years ago by Devon Ryan 104k • written 7.5 years ago by StullenBrot • 0

1

Entering edit mode

After @Devon reformatted your post the fasta sequence looks ok but are you sure the entries for each new sequence look exactly like they do now (each ">" one on a new line)? What happens if you just take a couple of sequences and try to make a test index?

ADD REPLY • link 7.5 years ago by GenoMax 141k

0

Entering edit mode

when I open the file with Notepad++ they are seperated in different lines.

A shorter .fasta file of like 24 sequences surprisingly worked though. So I guess the file size is the reason for the failure?

Thank you for your help!

Edit: File size is ~50 MB

ADD REPLY • link 7.5 years ago by StullenBrot • 0

1

Entering edit mode

I don't think file size is the problem. I have a feeling that you may have a sequence (or more) that is not starting with > on a new line. Since you are using Notepad++ can you find and count "^>" (tick the box for regular expression) to see if the number matches the reads you expect to be in there? Also make sure there is no extra space at the end of the file.

ADD REPLY • link 7.5 years ago by GenoMax 141k

0

Entering edit mode

Nice hint thank you. There are actually less "^>" than ">" in the text. Comes probably from when I was deleting some Proteins manually in a wrong way. Is there a correct way to do it?

ADD REPLY • link 7.5 years ago by StullenBrot • 0

1

Entering edit mode

Just make sure all ">" start on a new line. Should be easy enough to find in Notepad++ and fix.

ADD REPLY • link 7.5 years ago by GenoMax 141k

0

Entering edit mode

Solved!

Thank you for your help!

ADD REPLY • link 7.5 years ago by StullenBrot • 0