Question

Comparing multiple conditions and understanding rlog/resLFC

0

Entering edit mode

3.8 years ago

ccha97 ▴ 60

Hello, I'm new to R and I'm having some trouble understanding some elements of the DESeq2 package (I'm an undergrad student who's never used the R prior to this project, so any help would be appreciated).

For context, I have three different conditions e.g. A, B, C (A = acute model, B = chronic model, C = a deletion in that chronic model B). I'm wanting to compare A vs B, as well as B vs C but wasn't sure which way to go about it. I was originally using the contrast function:

AB <- results(dds, contrast = c("condition", "A", "B"), alpha = 0.05)

BC <- results(dds, contrast = c("condition", "B", "C"), alpha = 0.05)

My current end goal is to use k-means clustering and form a heatmap. Based on the tutorials, I understand that the rlog function is used when visualising data.

pheatmap(assay(rld)[sigGenesAB,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col = as.data.frame(cdata), row.names=rownames(cdata))

In this case [sigGenesAB,] refers to the deferentially expressed genes where the padj value < 0.05. However, when I generate this heatmap, it also includes the condition 'C' and I don't know what to make of it. I'm also unable to use the rlog function on AB as it comes up with this error:

rldAB <- rlog(AB)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sizeFactors’ for signature ‘"DESeqResults"’

My supervisor suggested using factor levels, his code is something similar to this where he's obtained a matrix including the intercept, with zeroes and ones:

dds$condition <- factor(dds$condition, levels = c("A","B", "C"))    
condition <- factor(rep(c("A","B","C"))) 
model.matrix(~ condition)

I am aware of the Analyzing RNA-seq data with DESeq2 tutorial and have read through the sections (his code seems to be related to log fold shrinking/lfcshrink), but I'm still having trouble understanding things - should I be using rlog or lfcshrink to generate a heatmap? Ultimately, I want to do kmeans clustering and generate a heatmap, as well as investigate those specific clusters using GO-term analysis.

I've thought about making two different data sets (e.g. one with just the counts of A+B, and the other with just B+C) and doing a separate DESeq analysis for each, but it also means I'll have a lot of different variables which will probably get confusing downstream. I'd appreciate any help in understanding some of these concepts, as well as any recommendations regarding how I should approach my data.

R RNA-Seq rna-seq DESeq2 heatmap • 1.3k views

ADD COMMENT • link 3.8 years ago by ccha97 ▴ 60

0

Entering edit mode

EDIT: I've added the first line: dds$condition <- factor( c("A","B", "C")) I'm not sure if that will change my contrast results. Is someone able to explain the idea of a model matrix to me? I also have the code for lfcShrink

ABresLFC <- lfcShrink(dds, coef="A_vs_B", type="apeglm")
BCresLFC <- lfcShrink(dds, coef="B_vs_C", type="apeglm")

I also still want to make a heatmap for the differentially expressed genes - I've already stored the genes into variables (sigGenesAB, sigGenesBC), but just need help with coding the heatmap.

ADD REPLY • link 3.8 years ago by ccha97 ▴ 60