Question: Get The Most Specific / Leaf Go Terms With Biomart
gravatar for joj
3.6 years ago by
European Union
joj0 wrote:

I have a list of about 80 HGNC symbols.

I want to find their descriptions and function(s) using biomart

So - first approach is to find everything using biomart. Here I provide an example for three genes.

gL <- c("LTC4S", "ALOX5", "NAT2")
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
results <- getBM(attributes = c("go_id", "hgnc_symbol"), filters = "hgnc_symbol", values = gL, mart = mart)

For 3 HGNCs we have 46 GO ids... With 80 HGNCs - well you can imagine - over 1500 GO categories.

However I'm only interested in the leaf terms, i.e. the most specific terms, for each branch of the GO with which the gene is annotated.

Is there any easy way to get this using biomaRt or other R tools like GO.db?

ADD COMMENTlink modified 3.6 years ago by Emily_Ensembl13k • written 3.6 years ago by joj0
gravatar for Emily_Ensembl
3.6 years ago by
Emily_Ensembl13k wrote:

Only the most specific GO terms will be returned via BioMart. The reason you get a lot is that there are a lot.

ADD COMMENTlink written 3.6 years ago by Emily_Ensembl13k

Is there any way to get the most "interesting" terms - I guess using semantic similarity and information content, something like that - although of course interesting is going to be rather subjective. It must be a very common task though - given a list of genes, annotate them with "their function", in a digestible way

ADD REPLYlink written 3.6 years ago by joj0

What you find the "most interesting" is not necessarily what someone else finds the "most interesting". GO terms are designed to give you the function in a digestible way, it just turns out that the most effective way to do this is to assign a number of GO terms to something.

ADD REPLYlink written 3.6 years ago by Emily_Ensembl13k

I agree, interesting is subjective, and I'm certainly don't dispute that GO/using a DAG is an effective way to provide annotation. But looking at >1500 terms is not really a feasible option in this case. I will keep looking to see if there are ways to whittle it down to most informative terms e.g. using information content. Thanks!

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by joj0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour