Question

DESeq2 and Gene Set Enrichment Analysis

0

Entering edit mode

3 months ago

noerm123 • 0

Hello I have two questions about DESeq2. One is about the interpretation of the output when more covariates are included in the model. Second is about the ranking metric for GSEA.

I used the normal DESeq workflow:

# Build dds
dds <- DESeqDataSetFromTximport(txi = tx.salmon, colData = met, design = ~ SEX +  RECUR + threshold)

# Pre-filter and remove genes that have no or little information on gene expression
keep <- rowSums( counts(dds) >= 10 ) >= 15 #Filtering for threshold 
dds <- dds[keep,]

# Run DeSeq2
dds <- DESeq(dds, test = 'Wald')

#Wald test
res = results(dds, contrast = c("threshold", "TRUE", "FALSE"), alpha = 0.05)

## Save the unshrunken results
res_noshrink <- res
# # Apply fold change shrinkage
res_lfcShrink_apeglm <- lfcShrink(dds, coef= gsub(' ', '_', gsub('^.+statistic: ', '', res@elementMetadata$description[4])), type="apeglm") 

##Save it as a tibble for further GSEA analysis
res_tb <- res_lfcShrink_apeglm %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>%
  as_tibble()

## Remove any NA values (reduces the data by quite a bit) because KEGG pathway is better with EntreZid 
res_tb <- res_tb %>% 
  arrange(padj)
res_entrez <- dplyr::filter(res_tb, entrezid != "NA")

## Remove any Entrez duplicates
res_entrez <- res_entrez[which(duplicated(res_entrez$entrezid) == F), ]
## Extract the foldchanges
foldchanges <- res_entrez$log2FoldChange
## Name each fold change with the corresponding Entrez ID
names(foldchanges) <- res_entrez$entrezid
## Sort fold changes in decreasing order
foldchanges <- sort(foldchanges, decreasing = TRUE)
foldchanges

First Questions: The logFoldChanges I get in the res_tb dataframe are the LogFoldChanges for my variable threshold and they are adjusted for SEX and RECUR in this case? Just to be sure.

Second Question: which metrics to use exactly to rank my genes for further GSEA? I read different literature and ask some colleagues that are bioinformaticians, but I am confused now. I used the shrunken log2FoldChanges. But then I got the feedback that it does not account for the significant p values. One colleague proposed to use the stat column instead. But I think that this is not right because it does not take the shrunken logFoldChanges into account (as long as I understood).

Input would be highly appreciated!

Thanks!

DESeq GSEA • 262 views

ADD COMMENT • link updated 3 months ago by ATpoint 82k • written 3 months ago by noerm123 • 0

score 1 · Answer 1 · 2024-01-30

1

Entering edit mode

3 months ago

ATpoint 82k

First Question: Yes, because the covariates are in the design, so it's accounted for.

Second Question: shrunken log2FoldChanges is fine. There is no right or wrong answer to this. Alternatively stat column would be fine or signed -log10(pvalue) is often used. But yours is ok.

ADD COMMENT • link 3 months ago by ATpoint 82k