Hi,
I am analyzing a bulk-RNAseq and I want to analyse the dataset using Deseq2. I am very confused so apologies if it's a stupid question. My dataset has 12 samples (3 per condition). the conditions are: treatment and control and 2 time points (0hr, 12hrs). So I wanted to identify A) DEGs across all condition and B) DEGs per condition and across time point (so treat vs control at time point1 and treat vs control at time point2). So I created a file with my metadata that look like with mix column dividing the 12 samples per condition and time:
SampleID Condition Time  mix     
  <chr>    <fct>     <fct> <fct>   
1 C1D20    Control   time1 control1
2 C2D20    Control   time1 control1
3 C3D20    Control   time1 control1
4 P1D20    treat   time1 treat1
5 P2D20    treat   time1 treat1
6 P3D20    treat  time1 treat1
and this is my code:
library( "DESeq2" )
library(ggplot2)
library(readxl)
metaData <- read_excel("Desktop/forR.xls", sheet = "GroupAssignmentsTable")
metaData$Time = factor(metaData$Time)
metaData$Condition = factor(metaData$Condition)
metaData$mix = factor(metaData$mix)#convert into fctors
head(metaData)
countData2 <-  read_excel("Desktop/forR.xls")
#remove columb
countData <- countData2[ -c(1) ] #remove geneid columb
countData[is.na(countData)] <- 0 #na into 0
dds <- DESeqDataSetFromMatrix(countData=countData,
                              colData=metaData, 
                              design= ~ mix)
rownames(dds) <- countData2$Geneid
# run deseq2
dds <- DESeq(dds)
res <- results(dds)
head(results(dds, tidy=TRUE))  #let's look at the results table
summary(res) #summary of results
res <- res[order(res$padj),]
head(res)
and this is what I get:
log2 fold change (MLE): mix treat2 vs control1 
Wald test p-value: mix treat2 vs control1 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                   <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
ENSG00000177469.13   6373.26        5.28825  0.118647   44.5713  0.00000e+00  0.00000e+00
ENSG00000125730.18   4440.13        5.40412  0.159754   33.8278 7.69185e-251 1.32496e-246
ENSG00000163359.17  26483.80        3.36964  0.103306   32.6180 2.28045e-233 2.61879e-229
ENSG00000174498.14   6609.97       -5.43614  0.168546  -32.2531 3.18097e-228 2.73969e-224
ENSG00000171867.18   4485.77        3.38421  0.106702   31.7163 9.25174e-221 6.37463e-217
ENSG00000124749.18   2830.87        6.37862  0.203390   31.3616 6.76592e-216 3.88488e-212
Is this mean that he calculate the log2FC and p-value between treat2 and control1 only? and if so, where are all the other comparison?
I checked also
mcols(res)$description
and I got basically the same results:
[1] "mean of normalized counts for all samples"        "log2 fold change (MLE): mix treat2 vs control1"
[3] "standard error: mix treat2 vs control1"         "Wald statistic: mix treat2 vs control1"        
[5] "Wald test p-value: mix treat2 vs control1"      "BH adjusted p-values"
if I want DEGs for time1 control vs treatment and for time2 control vs treatment do I have to create another file separating those samples? What I am doing wrong?
Thank you
Camilla
@camillab, Please look into Kevin Blighe's response in this post hope that helps.