How to know if I should use voomWithQualityWeights() or not?
1
0
Entering edit mode
4.2 years ago
pedrodcb ▴ 90

I'm running an RNA-Seq analysis for which I have 72 samples, (6 replicates for each observed group). Using voomWithQualityWeights() instead of voom() returns a lot more diferentially expressed genes. I was wondering how can I know if I should use voomWithQualityWeights() or not?

I understand it has to do with outliers. I only have on 2 samples that seem to be outliers on the MDS plots. But as I have 72 samples I think it would be ok to simply remove them and run the analysis with regular voom(). Is there any way to test which voom method I should use?

RNA-Seq limma voom DEG • 3.7k views
ADD COMMENT
0
Entering edit mode

Now asked on Bioconductor: https://support.bioconductor.org/p/129039/

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Hello Kevin,

Thank you for your message. However I'm not sure it answers my question.

Matthew Ritchie's answer on the first link only states that voomWithQualityWeigths should be used if there is more heterogeneity in the data, but how do I know if there's still more heterogeneity in my data? Even after adding covariates to my model:

"If there is further sample heterogeneity, then running voomWithQualityWeights using the design matrix you've arrived at can often help get you more differential expression, as you have observed here."

Gordon Smyth's answer states that voomWithQualityWeigths should be used to handle outlier samples, but how can I tell if my data has too many outliers samples, or samples that are extreme outliers enough to justify its use?

"designed to handle outlier samples, and outlier samples may not cluster nicely in the heatmap. If all the samples separated beautifully according to simple clustering algorithm, then you probably wouldn't need to downweight outlier samples, would you?"

Thank you,

Pedro

ADD REPLY
1
Entering edit mode

Posting your question on Bioconductor was a good idea, as that is where Gordon Smyth is more active. I posted my other comments (on Bioconductor) to help users in the future to find as much information as possible by 'linking up' (connecting) different questions relating to the same topic.

In relation to Ritchie's answer, there is no set threshold to define what is / is not a heterogeneous dataset, just as there is no set threshold to define what is a statistically significantly differentially expressed gene / variable. If, by looking at your MDS bi-plots, you see a hetergeneous mixture of samples across your metadata variables of interest (e.g., tissue, treatment, etc), then you can use voomWithQualityWeights. This is also assuming that the first few MDS dimensions account for an appreciable amount of variation.

Again, for Gordon's answer, there is no set threshold to define what is an outlier. You could run PCA and look at the bi-plot for PC1 versus 2. If a particular sample has a Z-score loading > 1.96, then you could classify it as a statistically significant outlier.

ADD REPLY

Login before adding your answer.

Traffic: 2403 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6