Question

limma voom - mean variance trend

0

Entering edit mode

4.8 years ago

furrdinand • 0

I am running a DE pipeline on a bulk blood dataset. Each patient has two RNA-seq samples, pre-exposure and post-exposure. I am using limma voom for the analysis. My mean-variance plot looks strange. I ran standard pre-processing (STAR + featurecounts), so I'm not sure why this would be unless bulk blood samples have properties that would cause these peculiarities. Are there any particular situations in which a plot like this would arise? Bulk blood is generally pretty egregious, so after re-running several times looking for errors in my code (and not finding anything), I suspect that might be the issue. Below is the mean-variance plot as well as my code.

design <- model.matrix(~0 + exposure + within_subject_cov_1 + within_subject_cov_2 + within_subject_cov_3,data=info)
counts <- voom(dge,design)
corfit <- duplicateCorrelation(counts,design,block=info$subject)
corfit$consensus
counts_voom <- voom(dge,design,block=info$subject,correlation=corfit$consensus)

Thanks.

enter image description here

RNA-seq limma voom limma voom mean-variance trend • 4.1k views

ADD COMMENT • link updated 4.8 years ago by Gordon Smyth ★ 7.0k • written 4.8 years ago by furrdinand • 0

0

Entering edit mode

You are right that looks strange! Seems like you mainly have very (!) large count? Also did you use edgeR::calcNormFactors?

ADD REPLY • link 4.8 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Yes, I used calcNormFactors with the TMM method prior to voom normalization on the dge object.

ADD REPLY • link 4.8 years ago by furrdinand • 0

0

Entering edit mode

Nice! Have you filtered your expression matrix a lot? It looks like it is missing all the low counts...

ADD REPLY • link 4.8 years ago by Kristoffer Vitting-Seerup ★ 4.0k

score 2 · Answer 1 · 2019-07-04

No, I've never seen a voom plot like that (concave decreasing) and there are no situations I can think of in which it should arise. The standard deviations look very large indeed, so there may be systematic lack of fit (a missing covariate). If I were you I'd go back and look at the raw counts to check that they look sensible.

Since you have paired data, it is usual to include Subject in the design matrix. If there are substantial baseline differences between the Subjects, then that might be the cause of the lack of fit symptoms you see in the plot.

PS. limma and edgeR questions are regularly answered if posted to the Bioconductor Support forum.