Hello Biostars, Can anyone tell me how to prepare input data set for GSEA after Differential Gene Expression Analysis by DESeq2? How will I rank the genes? Should I rank based on log2FC or Adjusted P value? Is there any way to generate a GSEA ready data directly from DESeq2?. I was using topGo for gene ontology enrichment analysis before and recently came across GSEA. Which one is better GO enrichment analysis or GSEA? Even after going through the papers I couldn't find a significant difference between above two.
Thank you
I like DESeq2. It would be great to have in the future something like ROAST/CAMERA/GSEA in DESeq2 too!
HI Sreeraj,
I don't know what is your model organism. For humans, mouse, drosophila and similar stuff, I guess it's easy because you can use online available databases and ensemble annotations. I participated in one online course about RNAseq data analysis on HUMAN data so I can share what I learned if it's helpful for you. It's just that I still didn't try that on my own data but here's what I know.
For GSEA - Initially you install these stuff in R:
install.packages("BiocManager") BiocManager::install(version = "3.16") BiocManager::install("DESeq2") BiocManager::install("clusterProfiler") BiocManager::install("org.Hs.eg.db") --> this is an organism-specific annotation package, this one is for humans but for instance, you can maybe find some others here: http://geneontology.org/ OR you can make your own dataset if you are working with nonmodel. I'm not an expert and I am still learning but its DOABLE so here you can see a similar question from my side, maybe it will help you: GSEA on nonmodel organisms
You do DESeq on your Dseq Data Set (dds) and once you get the results you can do this to remove NA.
dds_results_filtered <-dds_results[complete.cases(dds_results),]
I think you should use p-adjusted values in your filtering because that is representing SIGNIFICANT differences.
Then you can make a data set just for significantly upregulated genes like this:
upreg <- rownames(dds_results_filtered)[dds_results_filtered$pvalue < 0.05 & dds_results_filtered$log2FoldChange > 0]
Then you load your libraries:
library(clusterProfiler) library(org.Hs.eg.db)
Then you do GSEA:
gsea <- enrichGO(upreg, OrgDb = org.Hs.eg.db, keyType = "ENSEMBL", ont = "BP", universe = rownames(dds_results_filtered))
than you can make a simplified view
gsea <- simplify(gsea)
extract the data from gsea in nice table, first terms listed are the most significant
gsea_df <- as.data.frame(gsea)
additionally for excel you can try this
write.table(gsea_df, file = "gsea.tsv", sep = "\t")
and finally to see a nice dot plot for example for the top 13 categories:
dotplot(gsea, showCategory =13)
And then you can repeat for downregulated.
Hope this helps.
Lada
Just a comment, this is not really a gene set enrichment analysis. Rather an over-representation test.