Question: Ncbi Tax Id For Draft Bacteria Genomes
1
gravatar for c.v.oflynn
5.2 years ago by
c.v.oflynn90
United Kingdom
c.v.oflynn90 wrote:

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

ncbi taxonomy bacteria conversion • 3.3k views
ADD COMMENTlink modified 5.2 years ago by Pierre Lindenbaum115k • written 5.2 years ago by c.v.oflynn90
3
gravatar for Pierre Lindenbaum
5.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

from ftp://ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT/Acaricomes_phytoseiuli_DSM_14247_uid199097/NZ_AQXM00000000.gbk

you can find the line

/db_xref="taxon:1120917"

the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

$ curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=taxonomy&id=484233460&cmd=neighbor_score"

http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd">
<eLinkResult>

    <LinkSet>
        <DbFrom>nuccore</DbFrom>
        <IdList>
            <Id>484233460</Id>
        </IdList>
        <LinkSetDb>
            <DbTo>taxonomy</DbTo>
            <LinkName>nuccore_taxonomy</LinkName>
            <Link>
                <Id>1120917</Id>
                <Score>0</Score>
            </Link>
        </LinkSetDb>
    </LinkSet>
</eLinkResult>

furthermore, notice that the tax-id is in the filename.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Pierre Lindenbaum115k

Thanks Pierre, completely overlooked that there were GenBank files, brilliant.

ADD REPLYlink written 5.1 years ago by c.v.oflynn90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour