Ncbi Tax Id For Draft Bacteria Genomes
1
1
Entering edit mode
11.1 years ago
c.v.oflynn ▴ 100

Hi everyone,

I want to take advantage of the numerous draft bacteria genomes on ncbi's ftp site. I have thought for a while that is a shame not to use them in my pipeline, when there are only ~3000 in the Bacteria folder but an additional ~9000 in Bacteria_Draft most of pretty good quality. However the protein gi's from the drafts do not seem to be represented in the ncbi taxonomy (gi_taxid_prot.dmp). Which means that i cannot place them in my tree. Am i correct in assuming that ncbi does not place draft genomes in their tree? is there an alternative file with draft genomes included or does anybody know a method of making one? Another conversion file i use is this idmapping.dat from uniprot which has the protein gi's including drafts so that part of my pipeline should be ok, but what i do not have is a file to convert from GI, Accession whatever to additional third party database such as eggNogg, KEGG ...

So i guess my questions are;

can i get a from draft GI's > NCBI tax id

and does anybody have regularly updated conversion files for GI > GO, SEED, KEGG, EggNogg etc..

Thank in advance, Ciaran

ncbi taxonomy conversion bacteria • 5.2k views
ADD COMMENT
3
Entering edit mode
11.1 years ago

from ftp://ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT/Acaricomes_phytoseiuli_DSM_14247_uid199097/NZ_AQXM00000000.gbk

you can find the line

/db_xref="taxon:1120917"

the GI of that sequence is 484233460 , it also works with ncbi-EFetch:

$ curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=taxonomy&id=484233460&cmd=neighbor_score"

http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd">
<eLinkResult>

    <LinkSet>
        <DbFrom>nuccore</DbFrom>
        <IdList>
            <Id>484233460</Id>
        </IdList>
        <LinkSetDb>
            <DbTo>taxonomy</DbTo>
            <LinkName>nuccore_taxonomy</LinkName>
            <Link>
                <Id>1120917</Id>
                <Score>0</Score>
            </Link>
        </LinkSetDb>
    </LinkSet>
</eLinkResult>

furthermore, notice that the tax-id is in the filename.

ADD COMMENT
0
Entering edit mode

Thanks Pierre, completely overlooked that there were GenBank files, brilliant.

ADD REPLY

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6