Question

interpreting voom mean variance plots with large BCV and faint kink

0

Entering edit mode

14 months ago

wiscoyogi ▴ 40

I’m working with some data with large BCV (indicated below).

I made my design matrix accounting for all the covariates I possibly could with the metadata I had. I applied conservative gene expression thresholding (eg >= 0.75 log(CPM + 1) measured in >= k% of samples as described in the voom guides and various biostars/bioconductor posts)

I compared the mean-variance trends to Fig 1 of the Law et al 2014 voom paper. Both curves had faint upward kinks (in my eyes) around (3,1) when comparing to Law et al Fig 1 and I read mixed reviews on various posts so I wanted to post here. This kink does not seem like the ones people have described with under filtering.

Question: What is the cause of these kinks? Wondering if I should filter out more genes or if this kink around (3, 1) is not due to dropout? Is it OK to have? Is it suggestive of another issue that should be addressed?

Comparison 1: whole blood vs. PBMC from sick and healthy patients - I felt it was the closest to 1C in Law et al (BCV = 0.612) enter image description here

Comparison 2: saliva (healthy only) vs whole blood (sick/healthy) vs PBMC (sick/healthy) (BCV = 0.636) - I felt this was closest to 1E in Law et al. enter image description here

I made the decision to run separate DE since I'm not comparing the genes in comparison 1 to comparison 2 and because there's some covariates that do not apply to the third and I can't have nan in the design matrix.

filtering differential-expression voom gene • 665 views

ADD COMMENT • link updated 14 months ago by Gordon Smyth ★ 7.0k • written 14 months ago by wiscoyogi ▴ 40

score 0 · Answer 1 · 2023-02-13

I don't see any problems with the voom plots that you show, apart from the fact that the standard deviations and BCV are very large. There is no major significance to the faint kinks that you refer to and they won't cause problems for your analysis.

Saliva and whole blood are very different tissues so you should analyse them separately. It would be very unusual to try to combine such different samples.

There are some outlier genes in the voom plots, so I would recommend that you used eBayes() with robust=TRUE if you are not already doing so.

PS. You don't show any code so I have answered your question assuming that your coding is perfect. I assume you are using either voomWithQualityWeights() or voomLmFit(). The BCV you have though are really large and would be unusual for a bulk RNA-seq study with good quality RNA and standard protocols.