Question: Linear regression of the RNAseq results to get the genes whose expression change belong to the same trends
gravatar for Kai_Qi
10 weeks ago by
Chicago, IL
Kai_Qi100 wrote:

I have samples across 4 different developmental stages, each stage has 2 replicates. Using DEseq2, I can get the trending line of the counts of each gene, below is the code I have run:

sampleTable <-
countdata <- read.csv("CSHLRNASeq_counts_Rsubread.csv", stringsAsFactors = FALSE, row.names = 1)
colnames(countdata) <- gsub(".bam" , "", colnames(countdata))
coldata <- sampleTable
coldata$timepoint <- factor(coldata$timepoint, levels = c("E11", "E14", "E18", "Adult"))
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
                                 colData = coldata,
                                 design = ~ timepoint)
dds <- ddsMat
dds <- dds[rowSums(counts(dds)) > 1,] # remove the sample that reads from all group are 0
ddsTC <- DESeq(dds, test = "LRT", reduced = ~1)
resTC <- results(ddsTC)
resTC$symbol <- mapIds(, keys = row.names(resTC), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")
resTC$entrez <- mapIds(, keys = row.names(resTC), column = "ENTREZID", keytype = "ENSEMBL", multiVals = "first")
head(resTC[order(resTC$padj),], 4)
data_Flna <- plotCounts(ddsTC, which(resTC$symbol == "Flna"), intgroup = c("timepoint"), returnData = TRUE)
ggplot(data_Flna, aes(x=timepoint,y=count)) +
  geom_point() + stat_smooth(se=FALSE, method = "loess") +scale_y_log10()

Following the manual I typed:

res30 <- results(ddsTC, name = "timepoint", test = "Wald")

and got an error:

Error: subscript contains invalid names

One of the purpose of using timepoint is that I would like to see the comparisons between E11vsE14, E14vsE18, E18vsAdult, so that I can get the genes whose expression level is higher at later stages than earlier stages. I see in some papers linear regression is one solution. I am wondering how can I achieve the goal from what I have done so far? Thank you very much

sequencing rna-seq R • 278 views
ADD COMMENTlink modified 10 weeks ago by thyleal150 • written 10 weeks ago by Kai_Qi100
gravatar for thyleal
10 weeks ago by
thyleal150 wrote:


About your error: you probably mistyped the name argument in your second line of code here. Use resultsNames(ddsTC) to see available names.

About the analysis, you seem to want to test this in two distinc ways. One testing for any differences over multiple stages, exactly what you did with the LRT with full and reduced models. See more here.

If you want to test your groups as levels of a categorical level (timepoint), you can set your timepoint variable to a factor with levels ordered as: E11, E14, E18, adult (assuming this is the order from earliest to latest stages). Then, I suggest using fold-change shrinkage and extract coefficients according to your contrasts. By default, this will test all levels against the first level (baseline). See more of this here.

You may need to relevel your factor variable if you're using apeglm shrinkage. However, ashr and normal algorithms can deal with contrasts, so you can specify them as needed. Please see the aforementioned link.

Finally, If you want to see the linear effect of your stages (assuming linearity) you can change your timepoint to a numeric variable, like 0, 1, 2, 3 and use it in DESeq2 as usual. Thus, the reported log2 fold change is per unit of change of your stage variable. If your stages do not match the one unit transformation, this will make interpretation more difficult.

Hope it helps.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by thyleal150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour