Question

DEseq2 analysis with 2 treatments, 8 timepoints and no replicates

0

Entering edit mode

9.3 years ago

natsterbug ▴ 10

While I am well-aware that it is not advisable to conduct expression analysis without replicates, I am using data generated before I joined the program, and thus I had no control over the lack of replicates, for exploratory investigation. RNA samples were taken from either control or thaxtomin treated tissue at 8 timepoints. I am interested in seeing if there are genes that are deferentially expressed between the control and thaxtomin treated samples at each time point.

I am using as input a matrix of HTseq counts generated from a custom python script, the object Kalkaskareadcount.

row.names K2C K2T K4C K4T K6C K6T K8C K8T K10C K10T K18C K18T K24C K24T K20C K20T

PGSC0003DMG400000001 106 122 58 170 73 58 85 215 173 84 83 160 53 91 66 68

PGSC0003DMG400000002 347 311 183 169 321 242 213 376 270 267 310 241 214 156 206 192

PGSC0003DMG400000003 27 26 8 21 40 6 26 18 34 21 32 39 21 9 14 7

PGSC0003DMG400000004 851 985 1004 834 920 1067 990 1173 947 1011 1026 644 868 994 1179 1026

and supplying information about the sample condition in the object KalkaskaInfo

row.names condition type

K2C Control single-read

K2T Thaxtomin single-read

K4C Control single-read

K4T Thaxtomin single-read

The code I have so far is below and is guided by the DEseq2 vignette https://www.bioconductor.org/packages/3.3/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

Kalkaskareadcount<- read.table("Kalkaska_counts_merged.csv", header=TRUE, sep=",", row.names=1) KalkaskaInfo <- read.table("Kalkaska_design.csv", header=T,sep=",", row.names=1) dds <- DESeqDataSetFromMatrix(countData = Kalkaskareadcount, colData = KalkaskaInfo, design = ~ condition) dds dds <- DESeq(dds)

Based on my reading, it seems advisable to next use "rlogtransformation," as p-values would be misleading http://seqanswers.com/forums/showthread.php?t=31036. Would it then be best to conduct pairwise comparisons of control and treatment at each timepoint? I apologize for asking this question in a different iteration but was unable to find an answer that aligned with this experimental design.

DEseq no replicates RNA-Seq R • 3.6k views

ADD COMMENT • link updated 9.3 years ago by russhh 5.8k • written 9.3 years ago by natsterbug ▴ 10

2

Entering edit mode

This is just my opinion, so I'm not going to add it as an answer, but with the above experimental design being out of your control, you might consider using descriptive geometric analyses instead of trying to force the design into a statistical framework. DESeq2 requires 3 replicates per group to do its normalization properly, so I wouldn't trust anything coming out the other end without that minimum sample size.

Perhaps looking at distributions for the following will be informative:

Canonical correlation between timepoints (see how much the geometric subspaces overlap)

Hellinger transform + NMDS or PCA, see how the data separate

Reformat your matrix to: time by design (8 x gene#) for treatment and control (2 matrices) and do generalized SVD on each.

You could also collapse your timepoints and treat them as replicates if nothing else works. Hopefully others have recommendations that are better.

ADD REPLY • link 9.3 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

Thank you for your suggestions. I had also considered collapsing the time points as a last resort.

ADD REPLY • link 9.3 years ago by natsterbug ▴ 10

score 1 · Answer 1 · 2016-03-30

IMO you can't do C vs T at each time point. However, you could turn it into a pretty basic regression problem

hours <- rep(c(2,4,6,8,10,18,24,20), each = 2)

drug <- rep(c(0, 1), 8)

design <- cbind(const = 1, hours = (1-drug) * hours, drughours = drug * hours )

then contrast on (drughours - hours)/2