GSEA on nonmodel organisms
2
0
Entering edit mode
14 months ago
Lada ▴ 30

Hi, I want to do GSEA analysis in R on significantly differentially expressed genes on nonmodel species (five in total).

My research is based on cross-species comparative transcriptomics. And this is what I am doing:

  • I already have species-specific: de novo assemblies, annotations (across 7 different databases), quantification (read counts), CDS predictions...
  • Next, I did transdecoder and selected the longest ORFs, and used these peptide sequences to detect my single copy orthologues across species with Orthofinder
  • I assigned gene length and gene counts to create a Gene expression matrix.
  • Now I am planning to do differential expression analysis on my orthologues (still learning what is the best approach since I have to do some kind of normalization to account for different species/transcriptomes). I guess this is another topic...

  • Let's say I have my DEG list and I want to do GSEA. I learned how to do that in R for human RNAseq data and one step is loading the human database (https://www.gsea-msigdb.org/gsea/msigdb). My question is what should I do when I have nonmodel species, how do I make my databse?

Can I make it from my annotations list, if yes, how do I do that?

p.s. I am working on isopods with no reference genomes. :)

Thanks!

Lada

GSEA RNAseq transcriptomics multispecies R • 2.1k views
ADD COMMENT
1
Entering edit mode
14 months ago
jv ★ 1.8k

Yes, you can create user-defined gene sets. How you do that, i.e., what format the genes sets need to be, depends of the tool you are planning to use for your gene set enrichment analysis.

A few examples include:

For GSEA from the Broad/UCSD you need a gmt or gmx file https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Creating_Gene_Sets

For clusterProfiler:

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

https://guangchuangyu.github.io/2015/05/use-clusterprofiler-as-an-universal-enrichment-analysis-tool/

ADD COMMENT
0
Entering edit mode

Ok, great! thank you so much for your quick response! I was thinking of the Cluster profiler since I already tried to use that one, but it's good to know there are other options too.

ADD REPLY
1
Entering edit mode
14 months ago

If this is any useful... I wrote a simple command line script to make an anntationDbi package starting from a gff file and a table assigning gene IDs to GO terms. In fact, the gff file is not strictly needed. Script is makeBioconductorAnnotationDbi

I've been using it to make annotation packages suitable for GO analysis with clusterProfiler for non-model organisms.

ADD COMMENT
0
Entering edit mode

Yes, it is useful for sure! thank you so much! It's important for me to know that I can a) do that for nonmodels and b) have some starting steps to create my own database. I am still learning to work in Linux and R environments, so I might call for help again if I'm stuck with making the annotation package.

ADD REPLY

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6