Easiest Way To Get Mrna Refseq Acc Related To An Entrez Gene Id Using Ncbi Eutility Programs
2
I would like to know what is the best way to get all the mRNA Refseq accession number related to a given Entrez GeneID using NCBI EUtility programs.
For instance I would like to get all the mRNA Refseq accession number (NM_001014431, NM_005163, NM_001014432) related the gene AKT1 (Entrez GeneiD = 207).
I know that using the url below I get and XML file where are the Refseq accession number are embedded but I think they are very difficult to extract using XSLT.
So if someone have a better url to provide it would be very helpful
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml
eutils
refseq
entrez
ncbi
• 7.7k views
Are you committed to using XSLT? I would think about an XPath query - most languages provide this functionality in their XML libraries.
For example, using R:
library( RCurl)
library( XML)
ef < - xmlTreeParse( getURL( "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml" ) , useInternalNodes = T)
ns < - getNodeSet( ef, "//Gene-commentary_accession" )
accn < - sapply( ns, function( x) { xmlValue( x) } )
accn[ grep( "NM_" , unique( accn)) ]
[ 1] "NM_001014431" "NM_001014432" "NM_005163"
•
link
updated 5.8 years ago by
Ram
45k
•
written 14.8 years ago by
Neilfws
49k
the following xslt stylesheet does it as far as I tested it:
< xsl:stylesheet xmlns:xsl= "<a href=" < a= "" href= "http://www.w3.org/1999/XSL/Transform" rel= "nofollow" > http://www.w3.org/1999/XSL/Transform" " = "" rel= "nofollow" > http://www.w3.org/1999/XSL/Transform'
version=' 1.0'
>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="Entrezgene-Set"/>
</xsl:template>
<xsl:template match="Entrezgene-Set">
<xsl:apply-templates select="Entrezgene"/>
</xsl:template>
<xsl:template match="Entrezgene">
id: <xsl:value-of select="Entrezgene_track-info/Gene-track/Gene-track_geneid"/>
locus: <xsl:value-of select="Entrezgene_gene/Gene-ref/Gene-ref_locus"/>
[
<xsl:apply-templates select=".//Gene-commentary[Gene-commentary_heading=' NCBI Reference Sequences ( RefSeq) '] //Gene-commentary_products/Gene-commentary" mode=" product"/>
]
</xsl:template>
<xsl:template match=" Gene-commentary" mode=" product">
type: <xsl:value-of select=" Gene-commentary_type/@value"/>
acn:<xsl:value-of select=" Gene-commentary_accession"/>
< /xsl:template>
< /xsl:stylesheet>
Usage:
xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml"
id: 207
locus: AKT1
[
type: mRNA
acn:NM_001014431
type: peptide
acn:NP_001014431
type: mRNA
acn:NM_001014432
type: peptide
acn:NP_001014432
type: mRNA
acn:NM_005163
type: peptide
acn:NP_005154
type: genomic
acn:NC_000014
type: genomic
acn:NT_026437
type: genomic
acn:AC_000057
type: genomic
acn:NW_925561
type: genomic
acn:AC_000146
type: genomic
acn:NW_001838116
]
Login before adding your answer.
Traffic: 3547 users visited in the last hour
You don't need a regex. Just use the xpath expression
starts-with
inActually yes I am using XSLT so Pierre's solution fit my needs better. But since I want only the NM_ or XM_ acc number I should may be introduce regular expression in Pierre's solution.
@ Pierre : thanks for the tips but when I try it I get an errror message. Does it work if you include it in the code of your response ?
@ Pierre : I fxed it replacing double quote by single quote : 'MN_' instead of "NM_"