Question

Horizontal lines in MA plot (DESeq2)

1

Entering edit mode

5.9 years ago

DVA ▴ 630

Could someone help me understand why my MA plot from DEseq2 has this line of points close to zero, and a second line at about 1.2?

I checked the genes from my count table corresponding to the points on these lines, and they have a very low read count in both samples. Thus I would really like to know what is going on mathematically. Could someone detail how exactly DEseq2 does normalization and obtains the log fold change? Thank you all for your help.

MA plot

deseq2 • 3.4k views

ADD COMMENT • link 5.9 years ago by DVA ▴ 630

score 1 · Answer 1 · 2018-05-15

1

Entering edit mode

5.9 years ago

Kevin Blighe 87k

I would expect to see things like this in one or more of the following situations:

lack of filtering of low count transcripts
many transcripts of constant or near constant variance
many transcript isoforms of the same / similar expression profiles are going into the analysis

Can you elaborate on what the data represents, your sample n, and the steps you took prior to and during normalisation?

ADD COMMENT • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Thank you for the reply. I think I kinda figured it out. Basically, it is described here: https://support.bioconductor.org/p/62927/

For every gene (row) DeSeq2 does a plus 1, to avoid 0 counts. Then it normalizes all samples based on the total count of each sample, and that gives a size factor for each sample (sizeFactor()). Thus like you said, if I have two samples contains 0 counts for many genes, I will have a same log2(1*sizeFactor) (y axis) for any of these genes, while the base mean being different depending on all samples of the row. This would gives me a horizontal line.

ADD REPLY • link 5.9 years ago by DVA ▴ 630

0

Entering edit mode

Yes, that's true. Also true that, for DESeq2's MA plot, the x-axis is the log of the mean expression + 1. What you could do prior to normalisation is remove all genes that have mean raw count (across all samples) <10 (a bit on the stringent side), or those that are 0 in a large proportion (e.g. 0 in >50%). There's no real standard cut-off.

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k