Question: blastdb_aliastool mismatch converted GIs and final sequences
0
gravatar for roger.huerlimann
4 months ago by
roger.huerlimann0 wrote:

Hi all,

I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?

However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.

>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences

Thanks!

Roger

blast database • 188 views
ADD COMMENTlink written 4 months ago by roger.huerlimann0

@Roger: NCBI deprecated gi's for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax58k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour