makeblastdb options error
0
0
Entering edit mode
7.5 years ago

Hello,

I wanted to generate a blast db from my fasta file and got an error using the makeblastdb tool. When executing the tool with the following command:

C:\WINDOWS\system32>makeblastdb -in C:\\Users\\Dennis\\Desktop\\db\\db.fasta -dbtype prot -out C:\\Users\\Dennis\\db\\output

I receive:

Building a new DB, current time: 11/03/2016 13:36:06 New DB name:  
C:\\Users\\Dennis\\db\\output New DB title: 
C:\\Users\\Dennis\\Desktop\\db\\db.fasta Sequence type: Protein Keep
Linkouts: T Keep MBits: T Maximum file size: 1000000000B BLAST options
error: C:\\Users\\Dennis\\Desktop\\db\\db.fasta does not match input
format type, default input type is FASTA

The database though is in fasta format and looks like this:

>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTV.....
>sp|P62258_REVERSED|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1-REVERSED
QNEDEVDQLAEKNQEEGDGQMDSTWLTLNDRLLQMILTSD....

(so one line with protein info after ">" and the next line with the protein sequence)

I don't know really what the problem here is so I ask you for help.

Thank you very much.

Dennis

blast • 2.8k views
ADD COMMENT
1
Entering edit mode

After @Devon reformatted your post the fasta sequence looks ok but are you sure the entries for each new sequence look exactly like they do now (each ">" one on a new line)? What happens if you just take a couple of sequences and try to make a test index?

ADD REPLY
0
Entering edit mode

when I open the file with Notepad++ they are seperated in different lines.

A shorter .fasta file of like 24 sequences surprisingly worked though. So I guess the file size is the reason for the failure?

Thank you for your help!

Edit: File size is ~50 MB

ADD REPLY
1
Entering edit mode

I don't think file size is the problem. I have a feeling that you may have a sequence (or more) that is not starting with > on a new line. Since you are using Notepad++ can you find and count "^>" (tick the box for regular expression) to see if the number matches the reads you expect to be in there? Also make sure there is no extra space at the end of the file.

ADD REPLY
0
Entering edit mode

Nice hint thank you. There are actually less "^>" than ">" in the text. Comes probably from when I was deleting some Proteins manually in a wrong way. Is there a correct way to do it?

ADD REPLY
1
Entering edit mode

Just make sure all ">" start on a new line. Should be easy enough to find in Notepad++ and fix.

ADD REPLY
0
Entering edit mode

Solved!

Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6