Question

Advice for RNAseq timepoint analysis

0

Entering edit mode

6.8 years ago

macmath ▴ 170

I have three different time points

12 weeks WT vs KO 4 biological replicates and 4 technical replicates

4 weeks WT vs KO 3 biological replicates and 3 technical replicates

0 weeks WT vs KO 3 biological and 3 technical replicates

Need suggestions for the design, to find the kinetics and also identify differential genes in different conditions. I have all counts derived from Htseq-count. How to proceed with the analysis. I would also like to see comparison in all combinations.

rna-seq • 1.9k views

ADD COMMENT • link updated 6.8 years ago by mforde84 ★ 1.4k • written 6.8 years ago by macmath ▴ 170

score 3 · Answer 1 · 2017-07-05

3

Entering edit mode

6.8 years ago

mforde84 ★ 1.4k

Hi macmath,

I'd suggest looking into limma and edgeR for analysis.

You have to set up the coefficients and contrasts for the linear model. It looks like you have 2 experimental factors and one error term, one being KO status, and the second being time, and the last being known batch groups. You want to correct for batch as it could potentially effect your result. If you don't know the batch status, you can use something like SVA to do a generalized batch correction.

So starting off let's generate the model. I assume you probably have an excel file or csv file with each sample in a row and each column being either KO status, time, and sequencing batch. So read those in and generate a design for a linear model:

factor.table <- read.table("omics.csv", header=TRUE, row.names=1, sep="\t")
ko_status <- as.factor(factor.table$ko_status)
time_status <- factor.table$time_status 
batch <- as.factor(factor.table$rnaseq_batch)
design <- model.matrix(~0 + ko_status + time_status + batch + ko_status:time_status)

So that's the basic design, and what it means is that the linear model will generate an intercept for each factor plus any interactions between KO and time status. You then generate contrasts to test for significant differences between defined groups, eg:

cont.matrix <- makeContrasts(KO_true = ko_status1 - ko_status2,
    ..., levels = design)

You'll have to play around with this a bit to generate the proper contrasts. There are working examples in the limma documentation which will help you figure it out, so do your reading.

Next you filter and normalized the data, then run the linear model:

y <- DGEList(counts = raw_counts)
y <- y[!rowSums(y$counts == 0) == ncol(raw_counts),] 
A <- aveLogCPM(y)
y2 <- y[A>1,]
y3 <- calcNormFactors(y2, method = "TMM")
dge <- voomWithQualityWeights(y3, design=design, normalization="quantile", plot=FALSE)
fit <- lmFit(dge, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit3 <- eBayes(fit2, trend = TRUE, robust = TRUE)

I believe fit2 gives you the main effects, and fit3 will give you the DEG for whichever contrast. I also believe you can use one of the fits to do a repeated measures anova, but I think you can flush out the effect using the interaction term in the design.

ADD COMMENT • link 6.8 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Thank you for the needful suggestions I have a doubt. I forgot to mention in the description of the question.

12 weeks WT vs KO Condition A
4 weeks WT vs KO Condition B
0 week WT vs KO Condition C

These KO are specific for each condition they are not common among the three

ADD REPLY • link 6.8 years ago by macmath ▴ 170

0

Entering edit mode

Not really sure what your objective is. Could you explain your groups in biological terminology.

ADD REPLY • link 6.8 years ago by mforde84 ★ 1.4k

0

Entering edit mode

WT = Wild-type, KO = knockout/mutant . The KO are not identical across the three conditions so these are separate experiments. Unclear how the time component fits in the picture.

ADD REPLY • link 6.8 years ago by GenoMax 141k

0

Entering edit mode

To clarify, biologically the KO are same and I am a bit confused with the KO status, time, and sequencing batch. My files compose of count files generated by HTseq count for each biological replicate and its respective technical replicates.

12 weeks WT vs KO 4 biological replicates and 4 technical replicates i.e. 32 count files

4 weeks WT vs KO 3 biological replicates and 3 technical replicates i.e. 27 count files

0 weeks WT vs KO 3 biological and 3 technical replicates i.e. 27 count files

I would also like to know if the technical replicates should be collapsed?

ADD REPLY • link 6.8 years ago by macmath ▴ 170

0

Entering edit mode

These KO are specific for each condition they are not common among the three

and

To clarify, biologically the KO are same

Those two statements (made in two separate posts) are conflicting. Which of those statements is correct? I would think the KO must be identical otherwise the time component has no meaning since the three sets would be separate non-comparable experiments.

Technical replicates for sequencing are not needed since sequencing has become pretty standard (unless you used different sequencer models that use different chemistry for sequencing e.g. NextSeq,HiSeq). You could collapse them.