Question

GSEA produces too few enriched sets

0

Entering edit mode

6.9 years ago

crimsontabaq ▴ 70

Here's a transcriptome of a non-model organism. Comparing two conditions, kallisto generated ~6000 differentially expressed genes. KEGG metabolic pathways of a relative organism were used to classify DEGs and check pathways enrichment. These categories are relatively small (3-100 genes/set). Whilst DEGs number is so high, sets with PADJ is quite small - 10-15 sets are truly enriched (padj = 0.05). Same situation is appearing when we applied Fisher test.

We're newbies in the field and it feels like we've missed something. What do we do wrong? Sorry if I've missed any details.

gsea kallisto transcriptome • 2.0k views

ADD COMMENT • link updated 6.9 years ago by Kristoffer Vitting-Seerup ★ 4.1k • written 6.9 years ago by crimsontabaq ▴ 70

0

Entering edit mode

Why is that wrong? What reasons do you have to expect having more gene sets enriched in your experiment? What I don't understand is how you used KEGG metabolic pathways of a relative organism "to classify DEGs and check pathways enrichment", first you classify DEGs based on your knowledge, and then you did a GSEA for each group of DEGs? Usually one test the enrichment of the genes in all the genes, independently if they are DEG or not.

ADD REPLY • link 6.9 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

We blastx'ed our transcripts against proteins of a relative organism, which genes are classified into KEGG pathways, so we can now group matching transcripts to these pathways. There's strong evidence for some groups to be enriched with DEGs based on previous experiments, but they ain't; also the states which are compared are radically different on a physiological level.

ADD REPLY • link 6.7 years ago by crimsontabaq ▴ 70

score 0 · Answer 1 · 2017-06-19

0

Entering edit mode

6.9 years ago

Kristoffer Vitting-Seerup ★ 4.1k

One possible explanation is that you have "to many" differentially expressed genes in the sense that if 30-50% of your detected genes are differentially expressed it is very hard to have large enrichments. I would try using a more strict DE cutoff by for example filtering on the log2FCs.

ADD COMMENT • link 6.9 years ago by Kristoffer Vitting-Seerup ★ 4.1k