Rna-Seq : Biological And Technical Replicates In Expression Analysis (Deseq)
5.6 years ago
I've to analyze several RNA-Seq samples. I've samples from several runs, unstraned and straned, and several samples sequenced multiple times ( using different library kit ). I used htseq-count to have the read counts and want now to use DESeq to check for differential expression. So I've biological replicates and technical replicates (same sample sequences several times using a different lib kit. Is that correct ?).

So I did a design matrix. In my example, A.1 means sample A, sequencing 1. A.2 : sample A, sequencing 2,... So A is sequenced two times (One unstranded, one stranded), B three times (One unstranded, two stranded), C one time (one unstraned) and D one time (one stranded). ReplicateGroup is used to put together technical replicates.

designTable : 
Sample      Condition    Stranded   ReplicateGroup
A.1         Ctrl            No            A
B.1         Treated           No            B
C.1         Treated           No            C
A.2         Ctrl            Yes          A
B.2         Treated        Yes            B
B.3         Treated        Yes            B
D.1      Treated        Yes          D

After that I use DESeq. countTable is the read count matrix.

cdsFull = newCountDataSet( countTable, designTable )
cdsFull = estimateSizeFactors( cdsFull )
cdsFull = estimateDispersions( cdsFull )

But now I don't know how to fit a model on "condition" "stranded" and "replicateGroup".

like that ?

fit1 = fitNbinomGLMs( cdsFull, count ~ Condition + Stranded + ReplicateGroup )
fit0 = fitNbinomGLMs( cdsFull, count ~ Condition )
pvalsGLM = nbinomGLMTest( fit1, fit0 )
padjGLM = p.adjust( pvalsGLM, method="BH" )

Is it the good way to analyze technical replicated. I read that I have to merge them together.. but I don't think it's a good idea due to the fact that I use different library kits. So I'm stuck...

Thanks a lot in advance


5.6 years ago
Looks about right.

When I did a roughly similar analysis, I specified the sample IDs using rownames(). So, the matrix only included variables. However, this probably doesn't matter.

Also, I had to set estimateDispersions() method="blind". Otherwise, it didn't seem to work, even when there were replicates (although one of my variables was specifying paired samples, so this might not be a problem for you).

The model specification and comparison looks right.

Good luck!

