DESeq2 and Gene Set Enrichment Analysis
1
0
Entering edit mode
3 months ago
noerm123 • 0

Hello I have two questions about DESeq2. One is about the interpretation of the output when more covariates are included in the model. Second is about the ranking metric for GSEA.

I used the normal DESeq workflow:

# Build dds
dds <- DESeqDataSetFromTximport(txi = tx.salmon, colData = met, design = ~ SEX +  RECUR + threshold)

# Pre-filter and remove genes that have no or little information on gene expression
keep <- rowSums( counts(dds) >= 10 ) >= 15 #Filtering for threshold 
dds <- dds[keep,]

# Run DeSeq2
dds <- DESeq(dds, test = 'Wald')

#Wald test
res = results(dds, contrast = c("threshold", "TRUE", "FALSE"), alpha = 0.05)

## Save the unshrunken results
res_noshrink <- res
# # Apply fold change shrinkage
res_lfcShrink_apeglm <- lfcShrink(dds, coef= gsub(' ', '_', gsub('^.+statistic: ', '', res@elementMetadata$description[4])), type="apeglm") 

##Save it as a tibble for further GSEA analysis
res_tb <- res_lfcShrink_apeglm %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>%
  as_tibble()

## Remove any NA values (reduces the data by quite a bit) because KEGG pathway is better with EntreZid 
res_tb <- res_tb %>% 
  arrange(padj)
res_entrez <- dplyr::filter(res_tb, entrezid != "NA")

## Remove any Entrez duplicates
res_entrez <- res_entrez[which(duplicated(res_entrez$entrezid) == F), ]
## Extract the foldchanges
foldchanges <- res_entrez$log2FoldChange
## Name each fold change with the corresponding Entrez ID
names(foldchanges) <- res_entrez$entrezid
## Sort fold changes in decreasing order
foldchanges <- sort(foldchanges, decreasing = TRUE)
foldchanges

First Questions: The logFoldChanges I get in the res_tb dataframe are the LogFoldChanges for my variable threshold and they are adjusted for SEX and RECUR in this case? Just to be sure.

Second Question: which metrics to use exactly to rank my genes for further GSEA? I read different literature and ask some colleagues that are bioinformaticians, but I am confused now. I used the shrunken log2FoldChanges. But then I got the feedback that it does not account for the significant p values. One colleague proposed to use the stat column instead. But I think that this is not right because it does not take the shrunken logFoldChanges into account (as long as I understood).

Input would be highly appreciated!

Thanks!

DESeq GSEA • 262 views
ADD COMMENT
1
Entering edit mode
3 months ago
ATpoint 82k

First Question: Yes, because the covariates are in the design, so it's accounted for.

Second Question: shrunken log2FoldChanges is fine. There is no right or wrong answer to this. Alternatively stat column would be fine or signed -log10(pvalue) is often used. But yours is ok.

ADD COMMENT

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6