Question

Annotating genes with a function

1

Entering edit mode

8.2 years ago

nash.claire ▴ 490

Hi all,

I have a question regarding gene ontology. I have ChIP-seq data which I have compared to RNA-seq expression data which has given me a list of candidate genes that I am interested in pursuing. What I'd like to do next is to annotate each of these genes in the list with a molecular function. For example, I'd like to achieve something like this:-

Gene                      Top Function
---
GeneA                     Transcription factor
GeneB                     Secreted protein

I've done a lot of searching and there are almost too many tools out there for gene ontology that it's hard to know which one to choose. They all seem to take a list of genes and group them into most represented function however this is not really what I'm after. I want an annotation for each gene in the list if possible. I also find that a lot of the tools give terms that are not useful such as "binding" (I mean what does that even mean!!!). I don't know if DAVID can do this sort of thing but I find it not so user friendly and from what I can gather it is also quite out of date now.

Anyone have any ideas?

RNA-Seq ChIP-Seq • 2.1k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by nash.claire ▴ 490

0

Entering edit mode

Very helpful description thank you Sukhdeep. Just to update though, I found a way to sort of get what I'm after with Biomart. Using Biomart I can select fields such as Gene Type, GO term name/accession etc for each gene in my list. I'm going to start with this and try other tools as you say to try and get a feel for the data. Essentially what I want to do is pull out all the transcription factors and all the secreted proteins from my list for further analysis. If there is a simple way of doing that I'm all ears!

ADD REPLY • link 8.2 years ago by nash.claire ▴ 490

Ram · Answer 1 · 2016-01-28

GO analysis is a very evolving field which is not self-sufficient meaning it depends on the inputs from different experiments and how they annotate a gene and the corresponding attributes. Earlier annotating the function of a gene, had no proper rules and that why you see a gene associated with multiple terms of which some are very ambiguous. Lot of tools over the time have tried to come up with different solutions, you can in-fact check the questions regarding GO analysis in the biostars itself, to get a flavour of that.

So, to answer your question, there is no straightforward method to perform a GO analysis and results from various tools vary and often can lead to different interpretations. So, I would recommend using multiple engines to get a flavour of what you are after and you can remove/hide some child or parent terms which are redundant manually. You could also use a tool like David or Panther and export the list to Revigo which is really nice in summarizing, exploring and hiding child terms under the parent categories. Another good way of sorting the GO terms is via the LOD score, which relatively prioritizes the specific terms than the very common ones.

In R, you can also look at ClusterProfiler's function ego, which works out the over-represented terms and can work out as well.

Other pointers: