voom mean-variance plot has set of genes where variance increases with expression level
1
1
Entering edit mode
4 weeks ago
Jautis ▴ 330

I'm using limma + voom to model an expression dataset, but I'm observing a weird subset of genes where the standard deviation increases with the expression level rather than decreasing as is the case for most genes.

Any ideas for why this is occurring and/or what to do about it? I have found lots of advice regarding oddities at low-expression levels, but not this pattern. Thanks!

voom expression limma R • 509 views
2
Entering edit mode
4 weeks ago
Gordon Smyth ★ 3.5k

The high-variance genes probably have almost all counts equal to zero with just one or two very large non-zero counts. Normally such genes would be filtered out by filterByExpr.

The sort of pattern you see can also be caused by a hidden batch effect that affects a minority of genes and which is not accounted for by your design matrix.

I would start by identifying the wierd genes and examining their expression pattern, which may tell you something about quality or annotation issues with your data. Then you can either revise your filtering strategy to remove those genes or can you use eBayes() with robust=TRUE so that the high-variance genes will be isolated and their influence will be minimized.

0
Entering edit mode

Thanks for the suggestion, but unfortunately this doesn't seem to be driven by the pattern you describe. I had already been filtering for sites with >0 counts in at least 75% of the dataset and added an additional filter to remove sites where one sample had an expression level that was 5x higher than the mean (using cpm), but the pattern remained.

Do you have any suggestions for figuring out what these genes may be? I've been unable to create the mean/variance plot from the raw data, but I suspect that's a limitation on my understanding of what exactly voom is doing.

0
Entering edit mode

My bet is still that it is caused by one very large count outlier for these genes.

To identify the weird genes:

v <- voom(y, design, save.plot=TRUE)
is.weird.gene <- (v$voom.xy$x > 5 & v$voom.xy$y > 1.25)


Alternatively you could run

fit <- lmFit(v, design)
fit <- eBayes(fit, robust=TRUE)


and examine genes with small values of fit\$df.prior, i.e., those identified as hypervariable outliers.

0
Entering edit mode

Thanks! I didn't realize there was an option to save the plot axes from voom. Looking at the genes, they seem to have fine data, but just very large differences between two experimental conditions which explains the greatly increased variance.

1
Entering edit mode

I didn't realize there was an option to save the plot axes from voom

See ?voom for all the options.

Looking at the genes, they seem to have fine data

The data are definitely not fine, as the voom plot shows. Look more closely.

just very large differences between two experimental conditions which explains the greatly increased variance.

Differences between experimental conditions should have no influence on the variance at all. If this is causing a problem then the design matrix isn't specified correctly.

It would appear that there is a batch effect affecting the "weird" genes that has not been accounted for in the design matrix.