Question: GSEA with GGProfiler2 (?"annotated" vs "known" for background control)
0
gravatar for drodavis0
11 days ago by
drodavis00
drodavis00 wrote:

Dear BioStars Community,

Wondered if anyone could offer some advice on GGProfiler2 and a question about background control for when I use it to do a gene set enrichment analysis (background referring to a gene list for comparison, I believe). Two of the options when using this package include to use "annotated" genes, which are "genes with at least one annotation." In contrast, the other option is "known", which is a list of all the known genes in the organism (see https://rdrr.io/cran/gprofiler2/f/vignettes/gprofiler2.Rmd).

When I compare my experimental gene list with "known" as background I get many more results with ggprofiler2 (which has functions for gene ontology, TRANSFAC analysis etc.), but when I use "annotated" I receive very little or no results. My inclination is to use "known", but I'm not sure if this is reliable or there are other considerations I should be having before just going with this.

Does anyone have any advice on or experience of this?

I really appreciate any advice or help people can offer - thanks!

gsea rna-seq ggprofiler2 • 66 views
ADD COMMENTlink modified 11 days ago by ATpoint46k • written 11 days ago by drodavis00
1
gravatar for ATpoint
11 days ago by
ATpoint46k
ATpoint46k wrote:

When I use gprofiler2 for term enrichment analysis (so differential genes towards terms from KEGG/REACTOME) I use all genes that were analysed in the DE analysis as background. As I usually use edgeR or DESeq2 this would (for edgeR) the genes that survive the filterByExpr filter and (for DESeq2) the genes that are not NA after running results so surviving the independent/outlier filtering. That having said, the option you describe defines which genes the tool consideres as background. If "annotated" then only genes that have some annotations are considered. Imagine your gene is a poorly-annotated non-coding RNA without any know functions. "Annotated" would probably ignore that gene while "known" would include it. That changes the number of total genes in the background and therefore the pvalue calculations. To be honest I never changed the defaults, therefore I use "annotated", so only genes from my custom background are actually considered that have some kinds of annotations. That is probably reasonable as unannotated genes (for this kind of analysis) do not contain any information and therefore probably should be ignored.

Note that this is not GSEA (Gene Set Enrichment Analysis) what gprofiler2 (gost function) does. It is enrichment of a list of genes (e.g. differential genes) towards functional terms. GSEA in contrast checks if your entire transcriptome show tendency to be up/downregulated as a whole for functional terms. For this you rank your genes (all genes) by a metric, e.g. fold change or pvalue and then compare the distribution of ranks of those genes overlapping the terms you check against. The question in GSEA is whether a gene set as a whole shows evidence for over- or underexpression, it does not ask whether a user-defined set of genes (e.g. differential ones) is enriched for certain terms. That means that GSEA can be significant even if not a single gene in your analysis is differential in a pairwise analysis.

ADD COMMENTlink written 11 days ago by ATpoint46k

Thanks very much - makes sense!

ADD REPLYlink written 9 days ago by drodavis00
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2297 users visited in the last hour
_