Question: A good pipeline for GO term analysis on RNA-seq Clusters in R?
gravatar for dtatarak
2.1 years ago by
dtatarak10 wrote:

Hi all,

I'm a relative newcomer to RNA-seq analysis, and I am now at the point where I want to do GO term analysis on my dataset.

I have done hierarchical clustering of my dataset consisting of 800 differentially expressed genes from Zebrafish samples. I have identified clusters that are interesting based on their expression patterns, and I now want to look at the gene ontology within these clusters.

I have looked at several R packages for GO term analysis online including clusterProfiler and GOexpress. But the documentation leaves something to be desired for an R newbie like myself. Does anyone have a suggestion for a GO term analysis pipeline they have used in R? Thank you very much!

Best David Tatarakis

rna-seq go terms R • 3.1k views
ADD COMMENTlink modified 2.1 years ago by caggtaagtat1.1k • written 2.1 years ago by dtatarak10

Could you please share your RNA-Seq pipeline with commands (DSeq2 and Hierarchical clustering with me? I am also a newcomer and trying to analyze the RNA-Seq data from zebrafish. Thank you in advance.

ADD REPLYlink written 2.1 years ago by rminhas0
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

Coincidence, but my recommendation for you, David, is to use DAVID. That's an acronym for Database for Annotation, Visualization and Integrated Discovery. It is quite possibly the easiest tool to use for someone just starting out wih gene enrichment. To help, I've even shown how one can do enrichment in my tutorial here: Clustering of DAVID gene enrichment results from gene expression studies

There are many other tools out there,. but their implementation can be tricky due to annotation issues. With DAVID, you can have your genes in various annotation formats, as you'll see, and it will even attempt to automatically identify the annotation format for you, if you wish.


ADD COMMENTlink modified 4 months ago • written 2.1 years ago by Kevin Blighe61k

DAVID is definitely good, but is there a way to present the results graphically instead of the standard tables?

ADD REPLYlink written 4 months ago by Arindam Ghosh280

I had a tutorial on Biostars previously, specifically for how to plot the results of DAVID as a heatmap; however, as new package versions were released, the tutorial fell into disrepair.

Essentially, you could create a gene X GO term [or KEGG pathway, etc] binary matrix, and shade cells in the heatmap white for 0, and green or any other colour for 1.

ADD REPLYlink written 4 months ago by Kevin Blighe61k

I used PANTHER to identify enriched biological processes in ~4000 genes and obtained ~500 GO BP complete terms after FDR < 0.05. This I guess would not be good with heatmaps. REVIGO treemaps helped reduce the redundancy though and helped make a good figure for publication.

ADD REPLYlink written 4 months ago by Arindam Ghosh280

You could just plot the top 20 as a barplot based on -log10(FDR) ?

Here, I use base R: A: DAVID functional Analysis and its visualization of GO terms using Bar plot

Using ggplot2 would be nicer, though

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe61k
gravatar for Zhilong Jia
2.1 years ago by
Zhilong Jia1.6k
Zhilong Jia1.6k wrote:
  1. the toppcluster webserver, but not work recently.
  2. co-expressed gene set enrichment analysis, cogena. But Zebrafish GO gene sets as a gmt file are needed.
  3. clusterProfiler. GO analyses (groupGO(), enrichGO() and gseGO()) support organisms that have an OrgDb object available. so it supports zebrafish. ref:

In summary. clusterProfiler probably is the easiest way if you program. Or use DAVID webserver as recommended by @Kevin by analyzing per cluster each time.

Another relative post: C: Compare sets of GO enrichments

ADD COMMENTlink written 2.1 years ago by Zhilong Jia1.6k
gravatar for caggtaagtat
2.1 years ago by
caggtaagtat1.1k wrote:

For gene set enrichment analysis (GSEA), I use the R package "EGSEA". It combines 12 prominent GSEA algorithms availible for R and obtains a consensus ranking of biologically relevant results.

The results can than be used for REVIGO for example, to visualize changes of GO families.

ADD COMMENTlink written 2.1 years ago by caggtaagtat1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1858 users visited in the last hour