I am trying to compare gene expression using RNA Seq data from 50 and 50 human tissue samples with two phenotypically similar diseases that may exhibit distinct pathophysiological differences on the transcriptomic level (= my hypothesis).
Several, but not all subjects in the study provided more than one sample, in one case even one from each disease type. Thus, I would normally want to correct for the subject as a random effect.
Additional covariates include gender and a measure of disease severity.
In order to be able to incorporate both my fixed and random factors (as a blocking factor), I first used limma/voom
design <- model.matrix(~1+ targets$disease + targets$gender + targets$severity) y <- voom(x,design,plot=F, normalize="quantile") corfit <- duplicateCorrelation(y, design, block=targets$Patient) y <- voom(x,design,plot=FALSE, block=targets$Patient, correlation=corfit$consensus, normalize="quantile") fit=lmFit(y, design, block=targets$Patient, correlation=corfit$consensus) fit <- eBayes(fit)
But ran into problems:
1) weird results with many ribosomal proteins differentially expressed (does not make sense biologically)
In both cases, I was recommended to use DESeq2 instead.
As per the recommendation here by Devon, I have first incorporated the subject factor as a fixed factor (and not random), even though I have my doubts about doing so. Because of overlaps, I could not correct for gender and subject at the same time ("not full rank"), which is not optimal either.
dds <- DESeqDataSetFromMatrix(countData = countData, colData = targets, design = ~ subject + severity + disease) # + gender does not work
Do you have any idea how to incorporate our random effect "subject" properly in DESeq2, and, if so, how to circumvent the "not full rank" error with gender/subject ?