Question: extracting sequences from a blast database
1
gravatar for max_19
4 months ago by
max_19110
max_19110 wrote:

Hi all!

I'm creating a blast database using:

makeblastdb -in proteins.fasta -dbtype prot -parse_seqids -out my_protein_db

I was trying to extract some sequences from this using blastdbcmd but kept getting error messages of "Entry not found".

My entries look like this: (there is 1 pipe in each entry): ABC|DEF60375.1 EHL|XP_003887.1

However if i do check the identifiers in my database using:

blastdbcmd -entry all -db my_protein_db -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"

I get lines like this:

> OID: 0 GI: N/A ACC: ABC|DEF60375.1 IDENTIFIER: gnl|ABC|DEF60375.1 

> OID:0 GI: N/A ACC: EHL|XP_003887.1 IDENTIFIER: lcl|EHL|XP_003887.1

so it seems NCBI has added some text+a pipe infront of my identifiers, I can just concatenate these additional letters onto my entries when I use blastdbcmd, however I noticed that these letters are not always the same, for some cases it is "gnl|" and others it is "lcl|". Does anyone know how NCBI decides this naming convention? and whats the best way to get around this?

Thanks very much for any input

sequencing blast protein genome • 206 views
ADD COMMENTlink modified 4 months ago by genomax71k • written 4 months ago by max_19110

What do fasta headers in your proteins.fasta look like? grep "^>" | head -3?

ADD REPLYlink written 4 months ago by genomax71k

like this:

>
MKFSTLLKSNKLQGWEDFYIQYDNLIKYLKTDPLKFKNLLIKENTKITTFFNEIEEQANQQKNELLMLVKNNLIYDSSTK
YKNFKDKLYQNELID
ADD REPLYlink modified 3 months ago • written 4 months ago by max_19110
1

Which version of blast are you using?

See this page for additional detail.

Those are NCBI standard fasta identifiers.

ADD REPLYlink written 4 months ago by genomax71k

blast+/2.6.0

I will check them out, thanks!

ADD REPLYlink written 4 months ago by max_19110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour