Question: makeblastdb options error
0
gravatar for StullenBrot
2.5 years ago by
Germany; Leipzig
StullenBrot0 wrote:

Hello,

I wanted to generate a blast db from my fasta file and got an error using the makeblastdb tool. When executing the tool with the following command:

C:\WINDOWS\system32>makeblastdb -in C:\\Users\\Dennis\\Desktop\\db\\db.fasta -dbtype prot -out C:\\Users\\Dennis\\db\\output

I receive:

Building a new DB, current time: 11/03/2016 13:36:06 New DB name:  
C:\\Users\\Dennis\\db\\output New DB title: 
C:\\Users\\Dennis\\Desktop\\db\\db.fasta Sequence type: Protein Keep
Linkouts: T Keep MBits: T Maximum file size: 1000000000B BLAST options
error: C:\\Users\\Dennis\\Desktop\\db\\db.fasta does not match input
format type, default input type is FASTA

The database though is in fasta format and looks like this:

>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTV.....
>sp|P62258_REVERSED|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1-REVERSED
QNEDEVDQLAEKNQEEGDGQMDSTWLTLNDRLLQMILTSD....

(so one line with protein info after ">" and the next line with the protein sequence)

I don't know really what the problem here is so I ask you for help.

Thank you very much.

Dennis

blast • 1.1k views
ADD COMMENTlink modified 2.5 years ago by Devon Ryan89k • written 2.5 years ago by StullenBrot0
1

After @Devon reformatted your post the fasta sequence looks ok but are you sure the entries for each new sequence look exactly like they do now (each ">" one on a new line)? What happens if you just take a couple of sequences and try to make a test index?

ADD REPLYlink written 2.5 years ago by genomax65k

when I open the file with Notepad++ they are seperated in different lines.

A shorter .fasta file of like 24 sequences surprisingly worked though. So I guess the file size is the reason for the failure?

Thank you for your help!

Edit: File size is ~50 MB

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by StullenBrot0
1

I don't think file size is the problem. I have a feeling that you may have a sequence (or more) that is not starting with > on a new line. Since you are using Notepad++ can you find and count "^>" (tick the box for regular expression) to see if the number matches the reads you expect to be in there? Also make sure there is no extra space at the end of the file.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax65k

Nice hint thank you. There are actually less "^>" than ">" in the text. Comes probably from when I was deleting some Proteins manually in a wrong way. Is there a correct way to do it?

ADD REPLYlink written 2.5 years ago by StullenBrot0
1

Just make sure all ">" start on a new line. Should be easy enough to find in Notepad++ and fix.

ADD REPLYlink written 2.5 years ago by genomax65k

Solved!

Thank you for your help!

ADD REPLYlink written 2.5 years ago by StullenBrot0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1461 users visited in the last hour