Differential expression analysis 5 conditions 3 replicates each matrix counts
1
0
Entering edit mode
3.9 years ago
Pin.Bioinf ▴ 290

Hello, I have to do my first differential expression analysis. The thing is, a company already did it and Im doing it just to practice and compare results. They used DESeq2 and I want to use it too. I got the tables of counts using STAR mapper and cuantifier with the option --quantmode GeneCounts. The experiment is:

5 conditions (3 of them are naive or mock) -plant not infected (NAIVE) control -plant infected (With fly) -plant infected (With bacteria) -plant not infected but exposed to fly (mock) control -plant not infected but exposed to bacteria (mock) control

All these samples were taken at 2, 7, 14 and 21 days dpi, so im thinking about building a count matrix for each dpi and doing 4 differential expression analysis. (After that I will do a temporal analysis, so I just want to compare results)

Is it reasonable to build a matrix for each dpi like this?:

GENE_ID COUNT NAIVE21 INFECTED_FLY21 INFECTED_B21 NOTINFECTED_FLY NOTINFECTED_BACT

Also, I have 3 replicates for each, what do I do with them?? (Im new to analysis and I have no idea what to do with the 3 replicates of each sample, it would be a huge table if I add NAIVE 21a, NAIVE21b, NAIVE 21c , and so on...)

Thank you so much

Pilar

RNA-Seq • 1.7k views
1
Entering edit mode

Could you clarify your experimental design?

My guess would by that dpi is Days Post Infection, and you you have 3 biological replicates for each condition (infected with bacteria,naive...) at different times (2 days,7 days...) , if this is okay, the experimental design is crucial for creating a DEseq object ,could you update the question with a more clear explanation of the design so that we can help you

0
Entering edit mode

Hello, yes il explain better:

Samples were taken at 2, 7, 14 and 21 days post infection. There are 5 conditions , and 3 plants (or biological replicates) for each condition.

I decided to do a diferential expression analysis for each dpi independently with DESeq2, and I did the following :

sampleTable< -data.frame(row.names=c("Bm14a","Bm14b","Bm14c","BTY14a","BTY14b","BTY14c","Mm14a","Mm14b","Mm14c","MTY14a","MTY14b","MTY14c","N14a","N14b","N14c"), condition=as.factor(c(rep("Bm14",3), rep("BTY14", 3), rep("Mm14", 3), rep("MTY14", 3),rep("N14", 3))))

dds <- DESeqDataSetFromMatrix(countData = cts,colData = sampleTable,design = ~ condition)

Then, for every comparison (there are 7) I did this:

Comp_1<-results(dds, contrast=c("condition","N14","Bm14"))

...

And to count total diferentially expressed genes for each comparison I did:

Comp_1_resSig <- Comp_1[which(Comp_1$padj <0.1),] head(Comp_1_resSig[order(Comp_1_resSig$log2FoldChange, decreasing = TRUE),])

nrow(Comp_1_resSig)

Does this make sense? Or did I do something wrong?

3
Entering edit mode
3.9 years ago

There's no reason to manually build a matrix yourself. Rather, create a sample table listing the samples, their group associations and the files with the counts and give that to DESeq2, likely via the DESeqDatasetFromHTSeqCount() function, or something along those lines. You will then include all of the biological replicates, which DESeq2 already knows how to handle (and is worthless without).

0
Entering edit mode

Hello thank you so much, I was a little lost. So here is what I did once i got my counts matrix as cts. I dont know if the results I got are correct: I wanted to do many comparisons, I have 3 samples for each of 5 conditions:

sampleTable<-data.frame(row.names=c("Bm14a","Bm14b","Bm14c","BTY14a","BTY14b","BTY14c","Mm14a","Mm14b","Mm14c","MTY14a","MTY14b","MTY14c","N14a","N14b","N14c"), condition=as.factor(c(rep("Bm14",3), rep("BTY14", 3), rep("Mm14", 3), rep("MTY14", 3),rep("N14", 3))))
dds <- DESeqDataSetFromMatrix(countData = cts,colData = sampleTable,design = ~ condition)


Pre-filtering:

dds <- dds[ rowSums(counts(dds)) > 1, ]
dds <- DESeq(dds)


All my comparisons:

Comp_1<-results(dds, contrast=c("condition","N14","Bm14"))
Comp_2<-results(dds, contrast=c("condition","N14","BTY14"))
Comp_3<-results(dds, contrast=c("condition","N14","Mm14"))


and so on ...

Comp_7<-results(dds, contrast=c("condition","MTY14","Mm14"))


And to check the total differentially expressed genes for each comparison i did the following for each Comp_n:

Comp_1_resSig <- Comp_1[which(Comp_1$padj <0.1),] head(Comp_1_resSig[order(Comp_1_resSig$log2FoldChange, decreasing = TRUE),])
nrow(Comp_1_resSig)


Is what I did correct? Are my results reliable? Are the p-values adjusted to each comparison? (I did not do a relevel because I read that for so many comparisons it wont make a difference)

1
Entering edit mode
1. That looks correct, as long as you're truly just interested in pairwise comparisons.
2. The padj (or something along those lines) column has the adjusted p-values.
0
Entering edit mode

Thanks Devon. I dont know what else should I be interested in or what els I could do .. (maybe something regarding time series?)

Again, thank you