Question: MA plots DESeq: strange MA plots
3
gravatar for VHahaut
4.0 years ago by
VHahaut1.1k
Belgium
VHahaut1.1k wrote:

Hello!

I am currently analyzing with DESeq2 a group 32 ovine tumors against 3 healthy controls. 80% of the tumor samples are technical replicates. I tested the DESeq analysis with or without using the function collapseReplicates(dds, groupby= colData(dds)$Sample, renameCols=T).

In both cases the MA plot had a weird shape looking like this (+PCA and dispersion plots):

MA plot

Dispersion

PCA

 

 

 

 

Moreover, 272 (for the no collapsing condition) and 695 (for the collapsing condition) genes have been flagged as outliers.

I have already done other DESeq analysis with the same code on smaller datasets and never seen such distribution. Could someone tell me if this is a normal distribution for tumor samples or for this amount of samples? If not, what can be the reasons and what can I do?

 

Thanks in advance!

Vincent

 

 

rna-seq deseq maplot • 3.6k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by VHahaut1.1k
1

I think the reason why you have such an asymmetry in your MA plot is that you have a rather unbalanced design. you are comparing many heterogeneous tumor samples against a few very similar control samples. I guess there could also be a difference in depth between the samples, is it true?
basically what I think it's happening is that for some reason it's much easier to call downregulation rather than upregulation in your comparison. is it right that the downregulated genes in your MA plot are more highly expressed in the tumor samples compared to the control, or is it the opposite? if the former is true, I guess that's because there are a lot of genes not expressed in the control, but expressed in some of the tumor samples at different levels. that's why you get some kind of lines on the left part of the plot.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Martombo2.5k

In terms of sizeFactors they are ranging from 0.4094 to 2.5580 with a average of 1.14. So I don't think that the depth is really an issue. 

For downregulation vs upregulation, indeed my controls have a lot of genes that are expressed at low level (1-10 reads) compared with the tumors (+10-1000). You can even see it by eye when scrolling in the normalized count table.

What do you think I could do to obtain a better shape of MA plot? Is it something common when analyzing tumors with DESeq?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by VHahaut1.1k
2
what you could do to improve the analysis is to remove genes whose mean expression is very low. I can see from the MA plot that many are lower than 10. another possibility would be to compare single tumor samples to the controls, or to add a tumor variable to the design. that would reduce the dispersion that you get, which is quite high. a final suggestion would then be to downsample the bam files of your deepest samples, in order to reduce that difference in depth.
ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Martombo2.5k

Thanks for you suggestions. I will try to modify the design as suggested. 

Concerning the sizefactors of the controls there are all around 1.2 so not really different from the majority of the tumors. 

ADD REPLYlink written 4.0 years ago by VHahaut1.1k

a difference between 0.4 and 2.5 in size factors is quite high. it means that there is a 6 fold difference in sequencing depth between two samples. so am I right in assuming that the control samples have a lower depth, compared to the tumor samples? is this something correctly reflected by the size factors?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Martombo2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 621 users visited in the last hour