Help in choosing the right metric for ranking genes in fgsea
1
0
Entering edit mode
4 months ago
Ngrin • 0

Hi,

I have around 500 RNA-Seq datasets, all analyzed using DESeq. As I studied similar questions and posts here, specialists recommend to use Stat column values as ranking metric for genes. However, the stat column is not included in the the DE analysis results. I have Log2FC, p-value, and adjusted p-value, and standard deviation for each group. I am not sure what metric should I pick now. I have tried once with fold-change and another time with absolute fold change. Which one is valid? I am getting two different set of significant pathways using each.

FGSEA GSEA • 721 views
ADD COMMENT
0
Entering edit mode

The spambot marked your post as spam (no idea why). I've restored it now.

ADD REPLY
3
Entering edit mode
4 months ago
alserg ▴ 930

I have tried once with fold-change and another time with absolute fold change. Which one is valid? I am getting two different set of significant pathways using each.

It depends on what changes you are looking for. Ranking genes by logFC will prioritize pathways that are predominantly up- or down- regulated, while absolute logFC ranking will prioritize pathways that have many changing genes (independent of direction, e.g. a pathway that have 50% genes going up and 50% genes going down will be highly significant). Normally, people are looking for directed change, so signed ranking is proffered. But there can be situations when absolute ranking is also meaningful. Just make sure to use scoreType="pos" option to run a one-tailed test.

As for selecting the best signed metric - currently there is no proper consensus and people use different ones. In general, the results should be pretty similar. That said, if the logFC represent DESEq2-shrinked logFC estimates, it should be a pretty good metric. You can also try signed logPval, that is log(pvalue) * sign(logFC).

ADD COMMENT
0
Entering edit mode

The signed test statistic can also be a valid metric.

ADD REPLY
0
Entering edit mode

jared.andrews07 unfortunately I do not access the stat values.

ADD REPLY
0
Entering edit mode

Great point @alserg. I forgot to use one-tailed test when I used absolute fold change. I looked into other posts like: https://pnnl-comp-mass-spec.github.io/proteomics-data-analysis-tutorial/gsea.html. They used similar metric as yours while have a minus in addition, that I do not think make any difference. Considering that I do not access the raw results after performing DESeq, I am not sure if the people who have analyzed these datasets have applied the shrinked function or not. So between Log2FC and -log(pvalue) * sign(logFC) which one do you think is more trustworthy?

ADD REPLY
1
Entering edit mode

I would go with -log(pvalue) * sign(logFC) as it accounts for gene-wise variance

ADD REPLY
0
Entering edit mode

Great. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6