Question: blastdb_aliastool mismatch converted GIs and final sequences
gravatar for roger.huerlimann
11 days ago by
roger.huerlimann0 wrote:

Hi all,

I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?

However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.

>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences



blast database • 80 views
ADD COMMENTlink written 11 days ago by roger.huerlimann0

@Roger: NCBI deprecated gi's for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.

ADD REPLYlink modified 11 days ago • written 11 days ago by genomax52k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1502 users visited in the last hour