How DESeq2 formula correction works?
1
4
Entering edit mode
20 months ago
Rafael Soler ★ 1.2k

Hello,

I have some doubts after reading the DESeq2 vignette and different threads on this topic. If I have this data:

enter image description here

If my objective is to compare the Status of the samples, in principle the formula would be ~Status. However, there are big differences between the Zone, as we can see in this PCA:

enter image description here

Would we have to make the formula like ~Zone+Status in order to control the effect of the Zone over Status? Or is it better to do it only with ~Status? If a new variable such as Sex is added and we want to control its effect, we would have to reconvert the formula to ~Zone+Sex+Status? Where can I find an explanation of what this "control" is based on?

Thank you

Batch Comparison DESeq2 GLM Formula • 1.2k views
ADD COMMENT
3
Entering edit mode

I would rely on how much the Zones are affecting your Status. Did you try doing Status ~ Zones and checking the results? If you see that your Zones are really influencing your Status, then do ~ Status + Zones. Same thing for a new variable Sex as you mentioned.

ADD REPLY
1
Entering edit mode

Thanks. In the DESeq2 vignnete we can see that design = ~ batch + condition. Shouldn't it be in this order? design = ~ Zones + Status

Best

ADD REPLY
1
Entering edit mode
20 months ago

Use zone + status.

ADD COMMENT
2
Entering edit mode

Why its better to use this formula rather than only ~Status? If a new variable such as Sex, we would have to reconvert the formula to ~Zone+Sex+Status?

Thanks for the reply, although I would appreciate a bit of explanation.

Best

ADD REPLY
1
Entering edit mode

You have variability within each group caused by zone. If you omit zone, the software just thinks there is a lot of variability. If you include zone, then it understands that there is an underlying reason for some of the variability and it can be included in the mathematical model, so you get better results.

And yes, you can just pile sex, or batch in the design like that.

ADD REPLY
0
Entering edit mode

Thanks, now I understand it better.

ADD REPLY
0
Entering edit mode

Since Zone is such a huge contributor to variance in your data, I'd run it with each zone alone.

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6