Ncbi Tax Id For Draft Bacteria Genomes
Entering edit mode
7.5 years ago
c.v.oflynn ▴ 100

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

ncbi taxonomy conversion bacteria • 4.2k views
Entering edit mode
7.5 years ago


you can find the line


the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

$ curl """>


furthermore, notice that the tax-id is in the filename.

Entering edit mode

Thanks Pierre, completely overlooked that there were GenBank files, brilliant.


Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6