Question: Input GSEA Pre-ranked list
4
gravatar for stevenlang123
4.4 years ago by
stevenlang123170
United States
stevenlang123170 wrote:

Hey y'all

 I'm currently trying to run GSEA using a pre-ranked gene list but I'm not sure if my input file is correctly formatted, because my results seem to be mostly insignificant. 

So my input looks something like this (roughly 16,000 genes): 

Where my ranking statistic is the negative log of the p-Value obtained through an association test. 

GENE neg_log_Chi_permutation
ARHGAP4 0.928986
C16orf3 1.496821
HOPX 0.975562
FAM3D 1.132781
HTR2C 1.276158
UGCG 0.064802
VPS13D 0.123508
VWF

My results have over 300 gene sets shown to be enriched, but many of them have a FDR p value of close to 1, with a high NES value.  What could I be doing wrong? 

-Best, 

Steven

 

ADD COMMENTlink modified 4.4 years ago by geek_y10k • written 4.4 years ago by stevenlang123170

In the original paper describe GSEA, you use FDR less than 0.25 actually. Additionally, pay attention to p=0 when you -log(p).

ADD REPLYlink written 4.4 years ago by Zhilong Jia1.5k

I do have several instances of p=1, and subsequently instances where genes are ranked the same. How should that be reconciled?

ADD REPLYlink written 4.4 years ago by stevenlang123170
3

Just recognised an error in your methods. You should combine the sign of logFC and -log of the p-Value (you ranked both up and down DE genes in the top of the rnk file). Or rank based on other metric, like logFC, t statistic. Additionally, GSEA use all the genes not DE genes.

ADD REPLYlink written 4.4 years ago by Zhilong Jia1.5k

Hi Zhilong,

Regarding "pay attention to p=0 when you -log(p)", should those genes with p=1 be filtered out prior to the GSEA analysis?

ADD REPLYlink modified 22 days ago by RamRS25k • written 4.3 years ago by sebastiangeorge010819830

I suggest that all the genes /probes should be included for GSEA input.

ADD REPLYlink written 4.3 years ago by Zhilong Jia1.5k
5
gravatar for geek_y
4.4 years ago by
geek_y10k
Barcelona
geek_y10k wrote:

You should all the genes from your dataset and rank them. Here is a nice post on ranking the DE genes for GSEA analysis. 

http://genomespot.blogspot.com.es/2014/09/data-analysis-step-8-pathway-analysis.html

ADD COMMENTlink modified 4.3 years ago • written 4.4 years ago by geek_y10k
3
gravatar for andrew
4.4 years ago by
andrew480
United States
andrew480 wrote:

I'm not an expert on GSEA, but there are at least two possibilities that I can see worth considering.

  1. Your data is not significant. I know it's hard to believe, but it happens.
  2. It would appear that if your input list contains roughly 16k genes, you are covering almost all of the protein coding genes, and thus, there is very little room for "enrichment". However, the "rank" of your gene list is supposed to affect this. But this is one of the classic limitations with any enrichment analysis.

Others may have alternative points to consider.

ADD COMMENTlink modified 29 days ago by RamRS25k • written 4.4 years ago by andrew480

Thanks for your feedback. In the case of the later, can the pre-ranked gene list be truncated? Or does posterior modification bias the experiment?

ADD REPLYlink written 4.4 years ago by stevenlang123170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1979 users visited in the last hour