Question: Retrieve all genes under a mammalian phenotype ontology term
gravatar for eric.kern13
4 months ago by
United States
eric.kern1390 wrote:

I want to retrieve all genes corresponding to a given mammalian phenotype ontology term (for example, MP:0005375), preferably within R. Are there tools to do this? Can I do it within BioMart? Or is the best bet to build something around the APIs here or here?

Related: similar question for GO terms

ADD COMMENTlink modified 4 months ago by Mike Smith250 • written 4 months ago by eric.kern1390

you may like to add biomart in the tags

ADD REPLYlink written 4 months ago by Santosh Anand2.9k

I tried to; it didn't work. I'll try again.

ADD REPLYlink written 4 months ago by eric.kern1390
gravatar for Mike Smith
4 months ago by
Mike Smith250
EMBL Heidelberg / de.NBI
Mike Smith250 wrote:

I'm not sure you can do this using Ensembl's BioMart. There you can filter using specific phenotype ontology terms, but only leaf terms rather than something quite high level like MP:0005375 which is Adipose Tissue Phenotype and has many sub-terms. I don't think you can query using the phenotype ID itself. I don't know if any other data store that has this data provides a BioMart interface, but I can't see on for the two you linked to.

One suggestion is to use the httr package and query MouseMine directly. Here's a fairly crude example, where we query for your phenotype ID, and return the primary ID, gene symbol, and the NCBI Entrez Gene ID.

Load the libraries we'll need, and then create a search query XML string


phenotypeID <- "MP:0005375"

query <- paste0('<query model="genomic" view="Gene.primaryIdentifier Gene.symbol Gene.ncbiGeneNumber" >
                  <constraint path="Gene.ontologyAnnotations.ontologyTerm.identifier" op="=" code="A" value="',
                  phenotypeID, '" />

Then we can submitt the query:

postRes = POST('',
         body=list(query=query, format='json'),

Now do some processing to the result to give us a data_table with one row per gene

jsonToTxt <- fromJSON(content(postRes, as = "text"))
genes <- as_tibble(jsonToTxt$results)
colnames(genes) <- jsonToTxt$columnHeaders

Here's the output:

> genes
# A tibble: 69 × 3
   `Gene > Primary Identifier` `Gene > Symbol` `Gene > NCBI Gene Number`
                         <chr>           <chr>                     <chr>
1                   MGI:101884           Ppard                     19015
2                   MGI:101900           Mmp14                     17387
3                   MGI:102797           Acsl1                     14081
4                   MGI:102858           Fosl2                     14284
5                   MGI:103014            Il15                     16168
6                   MGI:104993            Lepr                     16847
7                   MGI:105304           Il6ra                     16194
8                   MGI:105374           Npy4r                     19065
9                   MGI:106387         Arfgef3                    215821
10                  MGI:107571            Cav2                     12390
# ... with 59 more rows
ADD COMMENTlink written 4 months ago by Mike Smith250

Works like a charm. Thank you very much!

ADD REPLYlink written 4 months ago by eric.kern1390
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 928 users visited in the last hour