Question: Resources To Batch Map Long Gene Names To Entrez Ids?
2
gravatar for Walter Jessen
8.0 years ago by
Walter Jessen90 wrote:

I frequently have a list of long gene names (not symbols) that I need to map to Entrez IDs. For example, instead of having the gene symbol PTEN, I have the long gene name "phosphatase and tensin homolog". I don't see where Biomart supports the mapping of long gene names (using database: Ensembl genes 64, Sanger UK).

I've tried using MatchMiner. However, often the list of long gene names I have uses something other than "official" gene names and MatchMiner has trouble mapping. It's also quite slow.

What other resources are people using to batch map large lists of long gene names? I'd appreciate any tips.

gene list mapping identifiers • 2.2k views
ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 8.0 years ago by Walter Jessen90
5
gravatar for Pierre Lindenbaum
8.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

Did you try NCBI eSearch ? I got only one hit with your example (don't forget the quotes):

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=9606[TID]+"phosphatase+and+tensin+homolog"[GFN]

<eSearchResult>
  <Count>1</Count>
  <RetMax>1</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>5728</Id>
  </IdList>
  <TranslationSet/>
  <TranslationStack>
    <TermSet>
      <Term>9606[TID]</Term>
      <Field>TID</Field>
      <Count>191183</Count>
      <Explode>Y</Explode>
    </TermSet>
    <TermSet>
      <Term>"phosphatase+and+tensin+homolog"[GFN]</Term>
      <Field>GFN</Field>
      <Count>17</Count>
      <Explode>Y</Explode>
    </TermSet>
    <OP>AND</OP>
  </TranslationStack>
  <QueryTranslation>9606[TID] AND "phosphatase+and+tensin+homolog"[GFN]</QueryTranslation>
</eSearchResult>

Edit:

Walter, you can run a loop with a shell script and call this query for each long name.

$ cat list.txt 
phosphatase and tensin homolog
notch 2

the script:

while read G 
do
    for I in `curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=9606%5BTID%5D+%22${G// /\+}%22%5BGFN%5D" |grep "<Id>" `
       do
         echo  $G $I
       done
done < list.txt

phosphatase and tensin homolog <Id>5728</Id>
notch 2 <Id>4853</Id>
ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by Pierre Lindenbaum122k

I don't see how this supports batch submissions -- I typically have several hundred names to map.

ADD REPLYlink written 8.0 years ago by Walter Jessen90

Thanks Pierre. However, I don't see how this supports batch submissions -- I typically have several hundred names to map.

ADD REPLYlink written 8.0 years ago by Walter Jessen90

Ah, I see. Excellent! Thanks Pierre!

ADD REPLYlink written 8.0 years ago by Walter Jessen90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1712 users visited in the last hour