Question: GSEA preranked with DESeq2 in a RNA-seq
1
gravatar for rafaelsolersanblas
3 months ago by
rafaelsolersanblas20 wrote:

Hi!

I am trying to perform a GSEA preranked analysis from a paired RNAseq analysis, and I have 2 questions:

  • When the DESeq2 analysis is performed, a lot of genes include NAs values inside the dataframe, in the columns of Log2FC, padj, etc. To perform the GSEA, we have to use ALL the genes, and I think that is obvious that I have to eliminate the NAs values from Log2FC (if it is the value of ranking the list), but what happen with the NAs values in padj genes?? They have their own Log2FC value (altough the padj is NAs). Should I remove them? Or put them all in the analysis?

  • Which value of the DESeq2 results should I use to prerank the genes? The Log2FC? The stat?

Thanks a lot!! :D

gsea preranked rna-seq deseq2 • 198 views
ADD COMMENTlink modified 3 months ago by ATpoint44k • written 3 months ago by rafaelsolersanblas20
1
gravatar for ATpoint
3 months ago by
ATpoint44k
ATpoint44k wrote:

I personally rank by -log10(p) * logFC, where p is the nominal ("raw") p-value and the logFC is simply the fold change. That will give you positive values for FCs > 0 and vice versa. The advantage of raw p-values over padj is that it contains fewer ties (e.g. the many NAs or 1s) after the independent filtering or if power is low. You do not want ties since the GSEA methodology relies on a continuous ranking of genes. You can also rank by logFC alone, e.g. after using lfcShrink. I do not use logFC since I use edgeR for DE analysis and it does not explicitely offer fold change shrinkage to correct the logFCs, therefore you get large FCs when counts are low. I therefore use the p-value to somewhat correct for this (p for large FCs at low counts are often high), so somewhat penalizing these large / unreliable FCs. I also have seen others using the stat column of DESeq2::results(), the F statistics column etc., depending on what the tool you use outputs. Technically it must be something that assigns ranks to each genes sorted by fold change direction and "significance" where "significance" is something that can be defined by the user, logFC, p-values, or anything else, whatever you find reasonable in the context of your experiment.

ADD COMMENTlink modified 3 months ago • written 3 months ago by ATpoint44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour
_