I've got a set of samples, treated and control across multiple time points. I did DESeq with Day as design to get the genes which change nicely across the time span. I wanted to make a graph of some of those genes. The Log2 of the normalized counts, versus the time, colored by treatment, with the regression lines for each data set added.
Here's my code. I know it's not doing exactly what DESeq is doing, but it's close enough for a picture
treated_expression <- log(counts(dds_treated, normalized = TRUE)[gene_of_interest,],2)
control_expression <- log(counts(dds_Control, normalized = TRUE)[gene_of_interest,],2)
lm_treated <- lm(treated_expression ~ dds_treated@colData$Day)
lm_control <- lm(control_expression ~ dds_Control@colData$Day)
d<- plotCounts(dds, gene=gene_of_interest, intgroup=c("Treatment", "Day"), returnData = TRUE)
p <- ggplot(d, aes(x=Day, y=log(count,2), color = Condition)) +
geom_point(position=position_jitter(w=0.1,h=0)) +
labs(title = "gene_of_interest") +
theme(plot.title = element_text(hjust = 0.5))
p + geom_abline(intercept = lm_treated$coefficients[1], slope = lm_treated$coefficients[2], color = "turquoise") +
geom_abline(intercept = lm_control$coefficients[1], slope = lm_control$coefficients[2], color = "red")
And it's working more or less as I want. Unless a sample has a read count of 0. I know I could add a pseudocount before logging, or just omit the zero counts, but I was wondering if there was a a way of fixing this closer to what DESeq does. DESeq results will return me the right slope for the regression line, but I don't see how to find the intercept, which is why I recalculate the regression line.