Question

Incorporating factors of unwanted variation from RUVr into EdgeR cell means model for DE

1

Entering edit mode

4.9 years ago

garbuzov ▴ 70

Hi there, I'm new to working with EdgeR's generalized linear models, but I have a data set where I'm trying to get complex comparisons between groups, times, and treatments, so here I am...

I've figured out that the best approach for me is to use the cell means model. Right now I have three time points, two groups and two treatments at each time point.

Here is my code:

#countsDGE_d3 is my count matrix
>key_3d
                       sample    group condition
F.3dayInjury.IP1 F.3dayInjury forelimb    injury
F.3dayInjury.IP2 F.3dayInjury forelimb    injury 
F.3dayInjury.IP3 F.3dayInjury forelimb    injury
F.Intact.IP1         F.Intact forelimb no_lesion 
F.Intact.IP2         F.Intact forelimb no_lesion 
F.Intact.IP3         F.Intact forelimb no_lesion 
H.3dayInjury.IP1 H.3dayInjury hindlimb    injury 
H.3dayInjury.IP2 H.3dayInjury hindlimb    injury 
H.3dayInjury.IP3 H.3dayInjury hindlimb    injury 
H.Intact.IP1         H.Intact hindlimb no_lesion 
H.Intact.IP2         H.Intact hindlimb no_lesion 
H.Intact.IP3         H.Intact hindlimb no_lesion 
H.Intact.IP4         H.Intact hindlimb no_lesion

design3d <- model.matrix(~0+key_3d$sample)
colnames(design3d) <- levels(key_3d$sample)

countsDGE_d3 <- calcNormFactors(countsDGE_d3)
countsDGE_d3 <- estimateDisp(countsDGE_d3, design3d, robust = TRUE)
countsDGE_d3$common.dispersion

fit <- glmQLFit(countsDGE_d3, design3d)

cont.matrix <- makeContrasts(NvsINJinFL=F.3dayInjury-F.Intact, 
                             NvsINJinHL=H.3dayInjury-H.Intact,
                             Diff=(F.3dayInjury-F.Intact)-(H.3dayInjury-H.Intact),
                             levels=design3d)

qlf <- glmQLFTest(fit, contrast=cont.matrix[,"NvsINJinFL"])
topTags(qlf, n = 20)
table <- topTags(qlf, n = Inf)
write.csv( as.data.frame(table), file="All_FLDay3_InjuredvsIntact_results.csv")

I then repeat running glmQLFTest on all three contrasts and export the topTags table.

My question is, due to the data being pretty noisy, I used the RUVseq package to calculate the factors of unwanted variation for my samples. I would like to incorporate these into my design matrix and into my contrasts. How do I construct this model matrix and contrast matrix?

Does this make sense? But then what should my contrasts be?

design3d <- model.matrix(~0+key_3d$sample+W_1, data=pData_3d)

Where pData_3d is a matrix of the factors of unwanted variation from:

seq_set <- newSeqExpressionSet(as.matrix(dge_counts, phenoData = key$sample))

design <- model.matrix(~0 + key$sample)
colnames(design) <- levels(key$sample)
dge <- DGEList(counts = dge_counts, group = key$sample)
dge <- calcNormFactors(dge, method="upperquartile", rubust = TRUE)
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)

fit <- glmFit(dge, design)
res <- residuals(fit, type = "deviance")

#use all the genes to estimate the factors of unwanted variation.
seqUQ <- betweenLaneNormalization(seq_set, which="upper")
controls <- rownames(seq_set)
seqRUVr <- RUVr(seqUQ, controls, k=1, res)

Thank you!

R EdgeR RUV RUVseq GLM • 1.2k views

ADD COMMENT • link 4.9 years ago by garbuzov ▴ 70