Question: What problem does the MA plot diagnose? And how do you solve it?
0
gravatar for hbw
2.1 years ago by
hbw70
United States
hbw70 wrote:

I have an RNASeq experiment, and I am using DESeq2. After I get the results, I plot the MA plot. This is the output of plotMA: DESeq2 MAplot

And this is my attempt at the MA plot:

res$significant = (res$padj < .05)
res$significant = as.factor(res$significant)
res$significant[is.na(res$significant)] = F
ggplot(as.data.table(res), aes(x=log2(baseMean), y=log2FoldChange, color=significant)) + 
    geom_point() + 
    geom_hline(color = "blue3", yintercept = 0) + 
    stat_smooth(se = FALSE, method = "loess", color = "red3") + 
    scale_color_manual(values=c("Black","Red"))

My MA plot

  1. There is a slight bias at the end, so genes with a high A, tend to have a high M, and we are detecting more up-regulation than down. Is this a problem? What might be causing this, and more importantly, is there something we can do to fix it?
  2. Even if the slight effect is too little to be a problem, what causes problems like this? Imbalanced sampling depth at the two conditions? Why doesn't normalization (sample size factors) fix this?

Also, is there a reason why DESeq2::plotMA doesn't plot the best fit line?

Disclaimer: Cross posted to Biconductor questions.

rna-seq • 1.7k views
ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 2.1 years ago by hbw70
1
gravatar for Carlo Yague
2.1 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

There is a slight bias at the end, so genes with a high A, tend to have a high M, and we are detecting more up-regulation than down. Is this a problem?

I don't think that you should expect the number of up- and down regulated genes to be about the same. For instance, I had once a dataset where a mutation down-regulated a significant subset of genes, while only a few genes were up-regulated. We found a molecular mechanism explaining down-regulation of the genes and we think that the few upregulations are likely indirect effects ... so the real question is : "does your results make sense biologically ?"

However, there could be a problem when a lot of the genes are up/down-regulated because DESeq2 assumes that most of the genes are unaffected. EDIT : see Michael Love comment below for a more correct and detailed explanation.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Carlo Yague4.4k
3

The median ratio or TMM assumption for normalization is not that the majority of genes have LFC=0, but that the median or trimmed mean of ratios captures the technical shifts in counts. So it's more about the reliability of the center of the distribution of LFCs to capture technical shifts rather than requiring so many genes strictly to belong to the null. If the entire distribution is shifted, e.g. global increase in expression, then obviously computational normalization cannot be relied upon. But you could have, e.g. 40% upregulated and 20% downregulated and still have the median or trimmed mean roughly capturing the center of the distribution (the technical shift in counts due to sequencing depth).

ADD REPLYlink written 2.1 years ago by Michael Love1.8k

Thanks for this precision ! (I need to read your DESeq2 paper again)

ADD REPLYlink written 2.1 years ago by Carlo Yague4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1261 users visited in the last hour