I am using DESeq2 to analyze differential expressed genes (DEGs) from counts generated by Illumina sequencing.
I have a 3-factor experiment:
- genotype (with 3 levels): A, B, and C, each with 6 biological replicates (3 for each treatment)
- treatment (with 2 level): mDr (the treatment) and WW (the control)
- timepoint (with 3 levels): T1, T2 and T3.
In order to analyze genes that are influenced by an individual component and those influenced by two or all variables, I'm interested in the following results/questions:
How many DEGs are influenced by an individual component (genotype, treatment, timepoint)?
How many DEGs are influenced by the combination of two or all components (genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint)?.
So, I tried as follows:
library(DESeq2) # Load row counts countData<-read.delim("all_counts_matrix_T1_T2_T3", sep="\t", header=TRUE, row.names = 1) col_ordering = c(1,3,5,2,4,6,7,9,11,8,10,12,13,15,17,14,16,18, 19,21,23,20,22,24,25,27,29,26,28,30,31,33,35,32,34,36,37,39,41,38,40,42,43,45,47,44,46,48,49,51,53,50,52,54) countData_ord = countData[,col_ordering] # Create countData as matrix countData_m <- as.matrix(countData_ord) # Interger numbers in matrix storage.mode(countData_m) = "integer" #Create sample information colData<-read.delim("colData.txt", sep="\t", header=TRUE, row.names = 1) head(countData_m) colData rownames(colData) == colnames(countData_m) # Count matrix and sample information input and design formula dds<-DESeqDataSetFromMatrix(countData = countData_m, colData = colData, design = ~ genotype + treatment + timepoint + genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint) dds$treatment<- relevel(dds$treatment, "WW") dds$treatment dds$genotype dds$timepoint ## Filtering to remove rows with 0 reads or only a single count across all samples dds <- dds[ rowSums(counts(dds)) > 1, ] nrow(dds) dds # Run DESeq2 dds = DESeq(dds) # Calling results res <- results(dds) mcols(res, use.names=TRUE) # Print the coefficients resultsNames(dds) summary(res) write.table(res, file='genotype_treatment_timepoint/res_04082016.txt', col.names=T, row.names=T, quote=F, sep="\t")
In resultsNames(dds) I get:
 "Intercept" "genotype_B_vs_A" "genotype_C_vs_A" "treatment_mDr_vs_WW"
 "timepoint_T2_vs_T1" "timepoint_T3_vs_T1" "genotypeB.treatmentmDr" "genotypeC.treatmentmDr"
 "genotypeB.timepointT2" "genotypeC.timepointT2" "genotypeB.timepointT3" "genotypeC.timepointT3"
 "treatmentmDr.timepointT2" "treatmentmDr.timepointT3" "genotypeB.treatmentmDr.timepointT2" "genotypeC.treatmentmDr.timepointT2"  "genotypeB.treatmentmDr.timepointT3" "genotypeC.treatmentmDr.timepointT3"
First of all, is it correct?
And then, how can I extract the specific results I want (question 1 and 2)? I've tried to read up on this, but haven't been able to find how to code this.
Any guidance is much appreciated.
Thanks in advance