Well, gene enrichment (or 'gene-set enrichment analysis'; GSEA) is one of those things on which everyone has their own take, i.e., opinion. I've met people who don't even want to hear anything about it, to those who apparently idolise it. The way that you've carefully written your question tells me that you're in between these two extremes.
The first thing to consider is that gene enrichment is an in silico analysis, but many of the enrichment terms are based on curated datasets. For the Gene Ontology terms, for example, each and every term has an assigned evidence code, which can be taken into account when interpreting a particular enrichment. Take a look at my answer here: A: Go annotation reliability ?
Should I only select significant genes for my enrichment analyses,
pathway analyses? Why, why not?
The general idea of gene enrichment is that you have identified a group of genes as being statistically significantly associated to a particular condition and that you want to learn more about the potential functions, processes, pathways et cetera, that may be altered as a result. Thus, it does not make much sense to perform the enrichment on non-significant genes.
Edit: 11th January 2019: some programs can specifically take all genes in your dataset, perform enrichment, and then determine degree/level of enrichment by utilising the p-values and fold-changes. These methods are more powerful, I feel.
I have found several tutorials on DESeq/2, but I am not finding any
one that gives a clean and comprehensive view on how to further
process the data for downstream enrichment and visualization?
You will never find a 'clean and comprehensive' tutorial - everyone has their own take on it. DESeq2 is excellent at conducting analyses of [primarily] RNA-seq data but it's not a gene enrichment program.
What is the difference between doing GO enrichment by CC vs. BP vs MF?
- CC, cellular component
- BP, biological process
- MF, molecular function
Think of these as sub-classifications. Each of these will contain 1000s of gene enrichment terms that are organised in a hierarchical fashion. Most people will be interested in just BP and MF.
What is the difference between GO vs KEGG?
These are different organisations/groups.
- The Gene Ontology (GO)
Consortium is based in the USA and is funded by the NHGRI. The
consortium has been in existence for almost 20 years and its aim to
is define natural/healthy biological processes, molecular functions,
and components (as per the sub-classifications mentioned above).
Their gene enrichment categories and terms are either based on in
silico or confirmed laboratory evidence (as per the evidence codes
that I mentioned above).
- The Kyoto Encyclopaedia of Genes and
Genomes (KEGG) is a consortium based in Japan. It has been in
existence slightly longer than GO and is most recognised for the
curation of pathways in human and other species. KEGG covers a lot of
things other than pathways, though. Also KEGG focuses on both
normal/healthy and also disease-related pathways.
NB - it's important to remember that some GO terms relate to pathways too.
I am working with non model organism: in that case is it best to do
these analyses by matching the geneID/name of my organism to orthlog
geneID/name of a model organism? This may or maynot be a good idea
because certain pathways between organisms might be different, but
what is any proposed solution.
If you use an enrichment tool like DAVID, your species of interest is most likely included in this and, in addition, with DAVID, you can do enrichment on both GO and KEGG (and other databases) at the same time. On DAVID's main page, go to Functional Annotation and there you'll see a text box where you can input your genes.
My advice to you is to do the enrichment but to be cautious about the interpretation of the results. It is quite easy to 'cherry pick' the enrichment terms that you want to see, i.e., those that fit your hypothesis(es). If you get lucky and everything comes up for which you had hoped, I would still exercise caution. Don't get too excited by gene enrichment.
In terms of filtering enriched terms, if you use DAVID, you can filter enrichment terms based on a Benjamini P value. In terms of displaying gene enrichments, I would recommend simple displays like these: