Question

Gene set enrichment (GSEA) - Arabidopsis - Public RNA-seq data

0

Entering edit mode

3.7 years ago

sumanth • 0

Hi all,

I am working on a RNA-seq dataset from Arabidopsis and we have generated few lists of genes (wildtype vs mutant and mock vs treatment). This data is for a new mutant in Arabidopsis that has not yet been studied. So, we decided to do a gene set enrichment analysis, to find if majority (or any) of the DEGs from the mutant match with any of the other DEGs from publicly available Arabidopsis RNA-seq datasets. For this I used the Gene set enrichment from Expression Atlas, but it is limited to 100 genes. As I have few hundreds of DEGs from each comparison, I would like to perform enrichment using whole set instead of 100 genes. Is it possible to do that in Expression Atlas, either in online server or standalone in R?

If not, could someone please recommend any other resources that can do this job with public RNA-seq data from Arabidopsis (e.g. GEO)? I have tried GEO profiles and GEO2R but they are limited to one gene and/or one GEO series.

Some people have performed a meta-analysis (raw data download, mapping, counting and DEG analysis) with few experiments/samples that are of interest to them. But as we first want to get an unbiased overview, I am looking for a general database/algorithm/script to do so. Moreover meta-analysis is practically impossible in an unbiased approach. Any suggestions are highly appreciated.

Thanks in advance.

RNA-Seq GSEA Arabidopsis • 2.7k views

ADD COMMENT • link updated 3.7 years ago by ashish ▴ 680 • written 3.7 years ago by sumanth • 0

score 1 · Answer 1 · 2020-08-14

1

Entering edit mode

3.7 years ago

ashish ▴ 680

You can do all of this in R.

Use the ExpressionAtlas package and retrieve gene counts and metadata for experiments of interest.
Use DESeq2 to get DEGs.
Create a named vector of DEGs with log foldchanges and sort them.
Use clusterprofiler to do gene set enrichment for GO terms and KEGG pathways and make nice plots..

Here, is a tutorial for steps 3 and 4. In the tutorial, use org.At.tair.db as annotation package instead of Drosophila and use TAIR as keytype.

ADD COMMENT • link 3.7 years ago by ashish ▴ 680

0

Entering edit mode

Thank you @ashish. I am mainly looking for comparing my DEG set with other DEGs from public resources. Given that there are ~600 Arabidopsis datasets in ExpressionAtlas. Do I have to retrieve raw counts and calculate the DEGs for all of them, as you mentioned in steps-1&2? Or does the ExpressionAtlas R package stores the differential expression information that could be used to calculate the enrichment directly in comparison to my DEG list using Fisher-exact test? Seems like the web/API version of ExpressionAtlas only calculates the Fisher-exact test using the pre-calculated list of DEGs for each experiment.

ADD REPLY • link 3.7 years ago by sumanth • 0

0

Entering edit mode

As far as I know ExpressionAtlas cannot be used to get DEGs directly. You will have to run DESeq2 on them yourself. It only takes 3-4 lines of R code and you can always use for loops to automate the process. Why do you want to compare your DEG list with all available experiments on Arabidopsis ? Shouldn't you be comparing it to the experiments which have similar design as to your experiment.

ADD REPLY • link 3.7 years ago by ashish ▴ 680

0

Entering edit mode

Sure... In that case, I will do the DEseq2 to get the DEGs.

Regarding the other question, we just want to check if any of the genes that are mis-regulated in our mutants, are also differentially expressed in any other mutants/conditions. We believe that this may help to get another layer of information, apart from GO and pathway enrichment.

ADD REPLY • link 3.7 years ago by sumanth • 0