Question: Functional enrichment on orthologous groups
gravatar for sst
3.8 years ago by
United Kingdom
sst20 wrote:

Hi all,

I am currently looking at the evolution of the genome content of a particular species. I have >15 newly sequenced and annotated species, as well as a species tree and clusters of orthologs/paralogs ("orthogroups") across all species. Let's also say that I have, for each node in the species tree, a list of orthogroups that are lost/gained at this point in the tree. I have also already drawn nice annotated trees with charts at the nodes, giving me some overview of how much happened where. So far, so good.

What I am interested in is whether this can help me find out what functions are gained/lost, or whether pathways are introduced/disabled at each node in the tree, based on the appearance or disappearance of orthogroups at specific nodes. Note that the annotations of these organisms are not in public databases but I have inferred functional annotations (GO, KEGG, Reactome, ...) for them based on InterPro hits and reference species transfers. I also have mapped these functional terms to each orthogroup by simply taking the union of respective terms annotated in each orthogroup's member genes.

My idea was to take the lists of orthogroups that are lost/gained for each node, and use them as lists of 'interesting items' in functional gene set enrichment tools like topGO, with the full set of orthogroups and their terms as the background. For topGO, I have done this, but I am not so sure that this is the right approach to go with and I can trust the results... especially given the fact that I'm looking at many species here.

Also, can anyone recommend other functional enrichment tools, in particular for metabolic pathways, which are:

  • well documented,
  • not restricted to human/mouse/... but able to work with custom data sets, and
  • do not require expression data (which I don't have -- all I have is lists of 'interesting' groups as they are lost or gained).

Looking at the usual suspects like GAGE, clusterProfiler etc. -- these tools look like they are are meant to drive differential expression analysis in the usual popular organisms, but if you need more generic functionality, or don't have the right kind of input, the documentation is very sparse.

I am absolutely not afraid of writing code (as I'm mainly a developer) or munging the data in any form, I'm just quite new at data analysis ;)


ADD COMMENTlink modified 3.7 years ago by jhc2.8k • written 3.8 years ago by sst20
gravatar for jhc
3.7 years ago by
jhc2.8k wrote:

The EggNOG database now provides HMM profiles for all orthologous groups at different taxonomic levels. It also provides functional annotations per group (COG functional categories, domain annotation, and GO terms). You could use hmmer to map your proteomes to eggnog, and then derive some functional profiles and enrichment. 

eggNOG web interface allows single sequence queries and provides trees, algs, annotations and GO term frequencies per group. Data can also be queried using a RESTFul API 

HMM search output:

ADD COMMENTlink written 3.7 years ago by jhc2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 706 users visited in the last hour