Hi everyone,
I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...
So i guess my questions are;
can i get a from draft GI's > NCBI tax id
and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..
Thank in advance, Ciaran
Thanks Pierre, completely overlooked that there were GenBank files, brilliant.