Question

model design in rna seq - deseq2

1

Entering edit mode

5.1 years ago

acebolladaso.iacs ▴ 10

I have 8 samples that correspond to 4 persons measured in two times, 0h and 20h. names_chip person time sample

1 IonCode_0109 A1 0 Donor 1- Day 0
2 IonCode_0110 A1 20 Donor 1- Day 20
3 IonCode_0111 A2 0 Donor 2- Day 0
4 IonCode_0112 A2 20 Donor 2- Day 20
5 IonCode_0113 A3 0 Donor 3- Day 0
6 IonCode_0114 A3 20 Donor 3- Day 20
7 IonCode_0115 A4 0 Donor 4- Day 0
8 IonCode_0116 A4 20 Donor 4- Day 20

The researchers would to see what genes are DE between the two timepoints. They hope there are many changes.

The service of genomic send me the rowdata counts with 20812 genes. I follow the pipelines of deseq2 library.

dds <- DESeqDataSetFromMatrix(countData = counts, colData = annotation, design = ~ time+person)

I have made pca plots and clustering of normalizated counts and i can see that the samples of the same person are closely to each other, but between persons are very separated.I could hope this. At the moment i don't filter by number of counts. I do

dds.parametric.wald<-DESeq(dds)
contrast_oe <- c("time","0","20")
res.parametric.wald <- results(dds.parametric.wald,contrast=contrast_oe,independentFiltering = T)
summary(res.parametric.wald)

and the follow result

out of 17633 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 6, 0.034%
LFC < 0 (down) : 14, 0.079%
outliers [1] : 0, 0%
low counts [2] : 2706, 15%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Oh! Only 20 DEG!!

If I study the contrasts between persons (e.g)

res.parametric.wald.a1.a2 <- results(dds.parametric.wald,contrast=c("subject","A1","A2"),independentFiltering = T)
summary(res.parametric.wald.a1.a2)

I get

out of 17633 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 4194, 24%
LFC < 0 (down) : 3317, 19%
outliers [1] : 0, 0%
low counts [2] : 4064, 23%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Is it possible when i contrast between timepoints there are some background noise by the variability of the persons, and thus got reduce the number of DEG? With the goal of increase the number of genes DE between timepoints, would it be correct select the genes that are not DE between persons, and only with this genes compare between timepoints?? Methodologically and statistically is correct?? Any suggestion/way/design to increase the number of DEG between timepoints?

RNA-Seq deseq2 model-design number of DEG • 1.4k views

ADD COMMENT • link updated 5.1 years ago by tw617 ▴ 40 • written 5.1 years ago by acebolladaso.iacs ▴ 10

0

Entering edit mode

Cross posted to Bioconductor

https://support.bioconductor.org/p/119352/

ADD REPLY • link 5.1 years ago by Michael Love ★ 2.6k

score 1 · Answer 1 · 2019-03-25

Your method of adjusting the p-value could be too stringent. Look into the default settings of DEseq and see which method is used. We should assume there are many false positives by the nature of the test, 20 DE genes is way too low. You should check the quality of your data while you are at it too, i.e. histogram of p-values, filter out low reads, etc. And did you normalize your data before constructing your design matrix?

Also, contrasting between subjects is not going to tell you anything unless they are at different time points. You will have biological variability between people and the treatment effect over timepoints will not be measured unless you compare a day 20 individual with a day 0 individual for instance. But still, you will have much more noise doing this. I would continue with approach #1: measure within your biological replicates.