Question: (Closed) Retrieve Full Lineage From Taxon Id
1
gravatar for Sophie
6.3 years ago by
Sophie30
Sophie30 wrote:

Hi,

I have a list of NCBI taxon ids for which I would like to have both the full lineage and common name information. That is, say, for 9606, the desired output would look like as follows:

Lineage: root; cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo

Common name: human

I checked the NCBI taxonomy FTP to see whether there is a mapping database but I could not find any.

Any help with this one?

PS. I recall that I came across a solution to this problem at BioStar (using taxdump), but I could not find that thread.

Thank you

taxonomy identifiers • 4.2k views
ADD COMMENTlink modified 6.0 years ago • written 6.3 years ago by Sophie30

the question was http://biostar.stackexchange.com/questions/7348 . Please close your question if it answers your needs.

ADD REPLYlink written 6.3 years ago by Pierre Lindenbaum96k

Thanks for the reply, Pierre. But #7348 is not the thread that I was referring to.

ADD REPLYlink written 6.3 years ago by Sophie30

Since you answered your own question, I'll close as "no longer relevant."

ADD REPLYlink written 6.3 years ago by Neilfws47k
3
gravatar for Pierre Lindenbaum
6.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum96k wrote:

You can use the following stylesheet and NCBI E-Fetch for taxonomy:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
        version='1.0'
        >
<xsl:output method="text"/>

<xsl:template match="/">
<xsl:for-each select="/TaxaSet/Taxon/Lineage"><xsl:value-of select="concat('Lineage:root; ',text())"/><xsl:text>
</xsl:text></xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Example:

xsltproc --novalid stylesheet.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=9606&retmode=xml"

Result:

Lineage:root; cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo
ADD COMMENTlink written 6.3 years ago by Pierre Lindenbaum96k

Hi Pierre,

What would be the script for a file (test_taxids.txt) having a list of taxids through this xml sheet to obtain the full taxonomic information?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Promi10

run a loop https://stackoverflow.com/questions/1521462

ADD REPLYlink written 9 weeks ago by Pierre Lindenbaum96k
2
gravatar for Sophie
6.3 years ago by
Sophie30
Sophie30 wrote:

Okay, I figured out a pretty decent solution - Bioconductor.

1) Install Bioconductor

2) Install the package called 'genomes', i.e., biocLite("genomes")

3) The function "taxid2names" (e.g., taxid2names('9606')) returns the species name and the lineage given a taxon id.

4) I wrote a short script to read through all the taxon ids, get the species names and the linage using taxid2names, and write them out to a text file.

infile <- read.table("input.txt") # where input.txt includes the list of ids
i <- 1
while (i < 506) { # 506 = # of taxa + 1 // Well, alternatively you can easily get the length of your id vector and use it instead
v1 <- a[i,1]
v2 <- taxid2names(v1)
write.table(v2, "output.txt", append=TRUE)
i <- i + 1
}
ADD COMMENTlink written 6.3 years ago by Sophie30

needed a little tweak, but i m only a beginnner

infile <-read.table("input_taxid.txt",header=FALSE,sep="n"); i<-1; while(i<1202){ tmp <-infile$V1[i]; fulltaxa<-taxid2names(tmp); write.table(fulltaxa,"output_taxid.txt",append=TRUE); i<-i+1; }

ADD REPLYlink written 6.0 years ago by Dhruv Sakalley0

Hi Sophie,

Can you tell me what's 'a' in v1 <- a[i,1] here?

ADD REPLYlink written 9 weeks ago by Promi10
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour