Ncbi Tax Id For Draft Bacteria Genomes
1
1
Entering edit mode
7.5 years ago
c.v.oflynn ▴ 100

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

ncbi taxonomy conversion bacteria • 4.2k views
3
Entering edit mode
7.5 years ago

you can find the line

/db_xref="taxon:1120917"


the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

\$ curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=taxonomy&id=484233460&cmd=neighbor_score"

<DbFrom>nuccore</DbFrom>
<IdList>
<Id>484233460</Id>
</IdList>
<DbTo>taxonomy</DbTo>
<Id>1120917</Id>
<Score>0</Score>


furthermore, notice that the tax-id is in the filename.

0
Entering edit mode

Thanks Pierre, completely overlooked that there were GenBank files, brilliant.