Question: blastdb_aliastool mismatch converted GIs and final sequences
gravatar for roger.huerlimann
10 weeks ago by
roger.huerlimann0 wrote:

Hi all,

I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?

However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.

>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences



blast database • 146 views
ADD COMMENTlink written 10 weeks ago by roger.huerlimann0

@Roger: NCBI deprecated gi's for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax55k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 733 users visited in the last hour