Option Negative_Gilist With Blastp Version 2.2.25. No Isam File?
1
1
Entering edit mode
11.3 years ago
Angele ▴ 40

Hi everyone,

I want to use the option negative_gilist from blastp with the version blast 2.2.25.

I need to search a query against the swissprot database but I want to remove a set of sequences in my swissprot database. I saw that negative_gilist can restrict search of database to everything except the listed GIs . Here is my command line that I used:

/blast-2.2.25/bin/blastp -num_alignments 100 -evalue 1 -db /db/uniprot/uniprot/uniprot_sprot.fasta -query seq.fasta -out seq.out -negative_gilist sequence.gi.txt

*sequence.gi.txt is a GI list of the sequence that blastp doesn't have to take into account during the search.

I obtain this error message: BLAST Database error: GI list specified but no ISAM file found for GI

I searched what was the ISAM file and I found it was additional file with different information for the database as db.phr db.pin db.psq.... My swissprot database is formatted with formatdb with the option -o T, so it should give all the ISAM file. I noticed that I don't have the file with extensions: .pni and .pnd which seems to contain GI information. I can't figure it out why (I tried to reformat the database with formatdb andmakeblast_db, I can't obtain this file) but I don't know if it is why negative_gilist does not work.

Thanks for any help!

• 5.0k views
ADD COMMENT
0
Entering edit mode

On this note, how come we can't search with Entrez queries when we do local searches? For instance, I would be more interested in excluding a whole phylum from my search rather than just a few GIs.

ADD REPLY
2
Entering edit mode
11.3 years ago
Michael 54k

Using formatdb without the "-o T" indexing option results in three BLAST database files (.nhr, .nin, ,nsq). Using the "-o T" option will result in additional files. If gi's are present in the FASTA definition lines of the source file, there will be four additional files created (.nsd, nsi, nni, nnd). These are ISAM indices for mapping a sequence identifier to a particular sequence in the BLAST database If gi's are not use there will be only two additional files created (.nsd, .nsi).

formatdb,html (emphasis is mine)

So I assume that there were possibly no GIs present in the fasta header lines, or the formatting was incorrect such that they were not recognized by formatdb.

The same documentation also provides a formatting example:

>gi|5819095|ref|NC_001321.1| Balaenoptera physalus mitochondrion, complete genome

In case your fasta file was seemingly formatted correctly, running formatdb again should yield the missing files.

In case you don't have access to the original fasta file you can get a fasta dump via fastacmd: fastacmd -d mydb -D 1. Using -D 2 will dump all the GIs if present.

ADD COMMENT
0
Entering edit mode

Effectively I did not have the GIs present in the fasta header lines in the database. When I added GI's, formatdb produced the ISAM file which were missing, and negative_gilist worked!

ADD REPLY

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6