Question: blastdb_aliastool mismatch converted GIs and final sequences
gravatar for roger.huerlimann
7 months ago by
roger.huerlimann0 wrote:

Hi all,

I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?

However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.

>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences



blast database • 236 views
ADD COMMENTlink written 7 months ago by roger.huerlimann0

@Roger: NCBI deprecated gi's for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax62k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour