General Question to shRNA-Seq analysis
2.4 years ago
scheitelt ▴ 10

Hi all, I would appreciate your help to better understand how to analyse our shRNA-Seq data. We have a shRNA-library from mouse. Similar to this paper, in Fig. 1 , we have two conditions (WT, KO) and two stages (before [b], after [a]), each with multiple replicates.

before samples - WT_b and KO_b
after samples - WT_a and KO_a


Also we are doing a loff-of-function analysis, so we would like to see if there is a dropout in the KO compared to the WT.

I have searched for tools and found the edgeR tutorial, which gives several examples. But all this examples have only two conditions compared against each other. I have a read count table after using segemehl to map and HTseq-count to quantify the samples.

Do I need to take all four conditions into account or should I only compare the after samples (KO_a vs. WT_a) to see if there are dropouts should my experimental design include all the samples or only the last two should it be something like that:

(KO_b / KO_a) - (WT_b / WT_b)


if my columns of the sampleData are condition and stage, like that:

sample  condition   stage
WT_1    WT  Input
...
WT_6    WT  after
...
WT_10   WT  after
KO_1    KO  Input
...
KO_10   KO  after


I would appreciate an idea of how to create the design matrix.

would model.matrix(~stage) or a more complex design such as model.matrix(~condition + condition:stage) would be here necessary?

May I know, You want to compare conditions or have a control to compare with?

In terms of Matrix design. Yes, if you doing with multiple conditions then the second command will come handy.

This is the point - I'm not sure which samples to take. The end goal is to identify genes which show a dropout in the comparison MUT vs. WT. But do I need to take all four sample groups or a comparison of the two experimental sample groups?

DESeq2 is very similar to EdgeR, and their tutorial has a very nice walk through of multi-factor designs, including common errors and how to address them. It's worth a read, and if you aren't married to EdgeR, I found DESeq2 to be more user friendly when I was starting out:

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

But does it also works fine with shRNA-Seq? it is a small number of genes analysed and the number of changed (differentially expressed) genes might be higher than what DESeq2 assume to begin with.