Dear all,
I got a matrix from the last step(htseq count), and I need to do differential expression using that. My species is a fish, and the aim is to find the genes related to the air-breathing organ. I have 7 time-point for development stages with two replicates, here is my code for DESeq2, but it is just for 2 time-point (two replicates), I need to do differential expression analysis with there 7 time-point, I have read the manual and it still pretty confused. I'd be really appreciated if you can give me some suggestions on how to correct the script in order to do the differential analysis for the 7-timepoint.
**
library("DESeq2")
count_tab <- read.table("deseq2_matrix",header = T,row.names = 1)
colData <- read.table("sample_info.txt",header = T)
colData$condition = factor(colData$condition, c("A","B"))
dds <- DESeqDataSetFromMatrix(countData = count_tab,
colData=colData,
design = ~ condition)
dds <- DESeq(dds)
resultsNames(dds)
res <- results(dds, name = "condition_B_vs_A")
res <- res[order(res$padj),]
resDF = as.data.frame(res)
resDF$gene_id = row.names(resDF)
resDF <- resDF[,c(7,1,2,3,4,5,6)]
write.table(resDF, file = "Chan21_chan51_DESeq2_DEG", sep = "\t", quote = FALSE, row.names = FALSE)
**
And the sample_info.txt is like this:
sample_id condition
chan2_1 A
chan2_2 A
chan5_1 B
chan5_2 B
so, if I use the 7 time-point, the sample_info2.txt will be like this: sample_id condition
chan2_1 A
chan2_2 A
chan5_1 B
chan5_2 B
chan6_1 C
chan6_2 C
chan7_1 D
chan7_2 D
chan8_1 E
chan8_2 E
chan9_1 F
chan9_2 F
chan10_1 G
chan10_2 G
Hi, Thanks for your kind reply! so it means I need to use "time" as another "condition"? (I read some manual but still a little confused, this will help us find the genes waved with times or..? if compared this way with normal differential expression analysis without time course.) And I also have another question is that if I can use the first day and its replicates (chan2-1.chan2-2) as the reference, then compare the left days to it? this only needs 6 compare..(I am not sure if this will work...)
Thanks again for your generous help! :)
If you really want to use time as the condition, yes it would be a bit like your table above - you'd have one table with time points A and B only, one table with time points A and C only, one with A and D, etc., B and C, B and D, etc. pp.,
Sure you can compare the first day with the other 6 days, you'll probably get significant DEGs, but there's so much variety in days 2-6 that any conclusions you draw will be questionable. Best to plot TPMs/FPKMs per gene over the 6 days and have a look?