Hello, I'm new to R and I'm having some trouble understanding some elements of the DESeq2 package (I'm an undergrad student who's never used the R prior to this project, so any help would be appreciated).
For context, I have three different conditions e.g. A, B, C (A = acute model, B = chronic model, C = a deletion in that chronic model B). I'm wanting to compare A vs B, as well as B vs C but wasn't sure which way to go about it. I was originally using the contrast function:
AB <- results(dds, contrast = c("condition", "A", "B"), alpha = 0.05) BC <- results(dds, contrast = c("condition", "B", "C"), alpha = 0.05)
My current end goal is to use k-means clustering and form a heatmap. Based on the tutorials, I understand that the rlog function is used when visualising data.
pheatmap(assay(rld)[sigGenesAB,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col = as.data.frame(cdata), row.names=rownames(cdata))
In this case [sigGenesAB,] refers to the deferentially expressed genes where the padj value < 0.05. However, when I generate this heatmap, it also includes the condition 'C' and I don't know what to make of it. I'm also unable to use the rlog function on AB as it comes up with this error:
rldAB <- rlog(AB) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘sizeFactors’ for signature ‘"DESeqResults"’
My supervisor suggested using factor levels, his code is something similar to this where he's obtained a matrix including the intercept, with zeroes and ones:
dds$condition <- factor(dds$condition, levels = c("A","B", "C")) condition <- factor(rep(c("A","B","C"))) model.matrix(~ condition)
I am aware of the Analyzing RNA-seq data with DESeq2 tutorial and have read through the sections (his code seems to be related to log fold shrinking/lfcshrink), but I'm still having trouble understanding things - should I be using rlog or lfcshrink to generate a heatmap? Ultimately, I want to do kmeans clustering and generate a heatmap, as well as investigate those specific clusters using GO-term analysis.
I've thought about making two different data sets (e.g. one with just the counts of A+B, and the other with just B+C) and doing a separate DESeq analysis for each, but it also means I'll have a lot of different variables which will probably get confusing downstream. I'd appreciate any help in understanding some of these concepts, as well as any recommendations regarding how I should approach my data.