DE analysis with multiple factors
1
0
Entering edit mode
11 weeks ago
Abhishek • 0

Hi all,

I'm using DEP R package to perform analysis (including DE analysis) on proteins across different conditions https://bioconductor.org/packages/devel/bioc/vignettes/DEP/inst/doc/DEP.html

The package uses limma for DE analysis

My experiment is structured as :

sample - disease_state( disease / healthy) - environment (env1 / env2)

I wish to perform DE analysis for

  1. env1 Vs env2 samples
  2. within diseased, env1 vs env2 samples
  3. within healthy, env1 vs env2 samples

For 1, I'm just ignoring the disease_state factor and performing differential expression analysis across conditions env1 vs env2. Is this the correct approach ?

For 2, filter out only diseased samples and then then perform differential expression analysis across conditions env1 vs env2. Is this correct ? or do all the healthy samples also need to be somehow included so as not to lose information ?

Please do share any articles which would explain the fundamentals in terms of why one of these approaches are incorrect

DE DEP limma • 316 views
ADD COMMENT
1
Entering edit mode

I'm not familiar with protein data or DEP but if the package uses limma then you can take a look at the article. A guide to creating design matrices for gene expression experiments. If your study design is a 2 by 2 experiment, simply merge the two factors into one factor. That's what the article suggests. Hope it helps.

ADD REPLY
0
Entering edit mode

Thank you for the suggestion. Merging the 2 factors seems to be a good approach.

However for comparison 1, env1 vs env2, is it okay to just perform with 1 factor ? Or should factors be combined and some averaging performed within the subgroups for env1 and env2 ? (i.e. some equivalent of (disease_env1 + healthy_env1)/2 vs (disease_env2 + healthy_env2)/2 contrast formula specified in 7.2 in the link you provided)

ADD REPLY
1
Entering edit mode
11 weeks ago

As a rule of thumb never exclude samples as this information is still used when calculating the observed variance - this article by Ji & Lui is a good primer on the problem. Instead, use a GLM approach and specify both factors as covariates in your model terms. If you would like to test for the most appropriate model and a possible interaction, you can use the glht function from the multicomp package.

ADD COMMENT
0
Entering edit mode

Thank you for the response. (As you suggested), so as not to exclude samples, cant I use the same limma based DEP package, but combine the 2 factors into 1 ? (based on jkim's suggestion).

ADD REPLY

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6