Question: Ncbi Tax Id For Draft Bacteria Genomes
5.3 years ago
United Kingdom
c.v.oflynn90 wrote:

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

5.3 years ago
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:


you can find the line


the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

$ curl """>


furthermore, notice that the tax-id is in the filename.

Thanks Pierre, completely overlooked that there were GenBank files, brilliant.

