Question

Ncbi Tax Id For Draft Bacteria Genomes

1

Entering edit mode

11.1 years ago

c.v.oflynn ▴ 100

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

ncbi taxonomy conversion bacteria • 5.2k views

ADD COMMENT • link updated 11.1 years ago by Pierre Lindenbaum 164k • written 11.1 years ago by c.v.oflynn ▴ 100

score 3 · Answer 1 · 2013-10-15

from ftp://ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT/Acaricomes_phytoseiuli_DSM_14247_uid199097/NZ_AQXM00000000.gbk

you can find the line

/db_xref="taxon:1120917"

the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

$ curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=taxonomy&id=484233460&cmd=neighbor_score"

http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd">
<eLinkResult>

    <LinkSet>
        <DbFrom>nuccore</DbFrom>
        <IdList>
            <Id>484233460</Id>
        </IdList>
        <LinkSetDb>
            <DbTo>taxonomy</DbTo>
            <LinkName>nuccore_taxonomy</LinkName>
            <Link>
                <Id>1120917</Id>
                <Score>0</Score>
            </Link>
        </LinkSetDb>
    </LinkSet>
</eLinkResult>

furthermore, notice that the tax-id is in the filename.