I'm using limma + voom to model an expression dataset, but I'm observing a weird subset of genes where the standard deviation increases with the expression level rather than decreasing as is the case for most genes.
Any ideas for why this is occurring and/or what to do about it? I have found lots of advice regarding oddities at low-expression levels, but not this pattern. Thanks!
Thanks for the suggestion, but unfortunately this doesn't seem to be driven by the pattern you describe. I had already been filtering for sites with >0 counts in at least 75% of the dataset and added an additional filter to remove sites where one sample had an expression level that was 5x higher than the mean (using cpm), but the pattern remained.
Do you have any suggestions for figuring out what these genes may be? I've been unable to create the mean/variance plot from the raw data, but I suspect that's a limitation on my understanding of what exactly voom is doing.
My bet is still that it is caused by one very large count outlier for these genes.
To identify the weird genes:
Alternatively you could run
and examine genes with small values of
fit$df.prior, i.e., those identified as hypervariable outliers.
Thanks! I didn't realize there was an option to save the plot axes from voom. Looking at the genes, they seem to have fine data, but just very large differences between two experimental conditions which explains the greatly increased variance.
?voomfor all the options.
The data are definitely not fine, as the voom plot shows. Look more closely.
Differences between experimental conditions should have no influence on the variance at all. If this is causing a problem then the design matrix isn't specified correctly.
It would appear that there is a batch effect affecting the "weird" genes that has not been accounted for in the design matrix.