Question

EdgeR compare multiple groups in a 1 vs other comparison?

0

Entering edit mode

2.9 years ago

shanasabri ▴ 40

I have sample metadata set up as:

> metadata
       Protocol Donor Condition
A       TruSeq  1629  A
B       TruSeq  1629  B
C       TruSeq  1629  C
D       TruSeq  1630  A
E       TruSeq  1630  B
F       TruSeq  1630  C
G       Tn5  1629     A
H       Tn5  1629     B
I       Tn5  1629     C
J       Tn5  1630     A
K       Tn5  1630     B
L       Tn5  1630     C
M       UII  1629     A
N       UII  1629     B
O       UII  1629     C
P       UII  1630     A
Q       UII  1630     B
R       UII  1630     C

I'd like to test for differences in Protocols while correcting for baseline Donor and Condition effects. How exactly can EdgeR do this? Does this need to be computed for each pairwise comparison (Truseq vs Tn5, Truseq vs UII, etc) or can EdgeR compute a sort of 1 versus all other comparison (Truseq vs {TN5, UII})?

My current workflow looks something like this:

y <- DGEList(counts = raw_counts_df)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~Protocol + Donor + Condition, data = metadata)
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)
# qlf <- glmQLFTest(fit, coef=????) or qlf <- glmQLFTest(fit, contrast=????) 

Any help would be greatly appreciated!

Expression R EdgeR Differential • 2.3k views

ADD COMMENT • link 2.9 years ago by shanasabri ▴ 40

0

Entering edit mode

In your case, the two variable donor and condition are factors (categorical data) and not covariates (continious), so u can use both

design = model.matrix(~Protocol + Donor + Condition, data = metadata)
design = model.matrix(~0 + Protocol + Donor + Condition, data = metadata)

Both are same for multifactor analysis like urs. So u can go for the second option.

After the design, create a contrast of the pairs u want to compare.

contr.matrix <- makeContrasts(
   truevstn5 = TruSeq - Tn5, 
truevsUII = TruSeq - UII, 
UIIvstn5 = UII - Tn5
   levels = colnames(design))


  # Dispersion  
y <- estimateDisp(y, design)
# Fit
    fit <- glmQLFit(y, design)
# contrast fit
    vfit <- contrasts.fit(fit, contrasts=contr.matrix)
# Empirical Bayes Statistics for Differential Expression
    efit <- eBayes(vfit)
# summary
summary(decideTests(efit))

# log fc cutoff
tfit <- treat(vfit, lfc=1)
dt <- decideTests(tfit)
summary(dt)

Hope this helps.

ADD REPLY • link 2.9 years ago by pbpanigrahi ▴ 420

0

Entering edit mode

You can refer this to know more.

To answer to your query 1 vs rest, read section 5.4

ADD REPLY • link 2.9 years ago by pbpanigrahi ▴ 420

0

Entering edit mode

If I specify results <- glmQLFTest(fit, contrast = contr_mtx) then I'll get a results table with the following columns: "logFC.truevstn5" "logFC.truevsUII" "logFC.UIIvstn5" "logCPM" "F" "PValue" So I assume this is the correct way to get pairwise combos. I don't seem to have a P-value associated with each comparison. What does the PValue column correspond to in this case? Some average?

I'm looking through the documentation and don't really understand how to do a 1 vs all other comparison, for example True vs Tn5+UII. Any additional insights?

ADD REPLY • link 2.9 years ago by shanasabri ▴ 40

0

Entering edit mode

Hmm the code above errors out:

y <- DGEList(counts = data)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)

design <- model.matrix(~0 + Protocol + Donor + Condition, data = metadata)
contr_mtx <- limma::makeContrasts(
  truevstn5 = ProtocolTruSeq-ProtocolTn5, 
  truevsUII = ProtocolTruSeq-ProtocolUII, 
  UIIvstn5 = ProtocolUII-ProtocolTn5,
  levels = colnames(design)
)

y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)
vfit <- limma::contrasts.fit(fit, contrasts=contr_mtx)

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

ADD REPLY • link 2.9 years ago by shanasabri ▴ 40

0

Entering edit mode

Got it.

U are using glmQLFit and not lmFit The contrasts.fit expects Linear Model Fit

So what you can do is pass contrast while fitting the glm, then followed by Quasi-likelihood Tests

fit <- glmQLFit(y, design, contrast = contr_mtx)
results <- glmQLFTest(fit)  # statistical test
topTags(results)  # top results

Hope this helps.

ADD REPLY • link 2.9 years ago by pbpanigrahi ▴ 420

0

Entering edit mode

Got it. That works but I'm still unclear on how to extrapolate protocol-specific genes. The output of results will give me a list of differential genes but none seem to be enriched in a given protocol, rather they seem enriched by condition.. Here's a volcano plot of the results. It seems a top diffexp gene is CPLV. When I look at this gene I see it's enriched in a given condition, not protocol.

enter image description here

                       Row.names       CPVL        DUSP4 Protocol Donor Condition
1       IL22339_TruSeq_1629_ALPG  3.8021184  10.67046139   TruSeq  1629      ALPG
2  IL22340_TruSeq_1629_ALPG_MSLN  0.3261985 210.85469808   TruSeq  1629 ALPG_MSLN
3    IL22341_TruSeq_1629_neg_neg 25.4262200   3.08842044   TruSeq  1629   NEG_NEG
4       IL22342_TruSeq_1630_ALPG  3.0126745  20.94526107   TruSeq  1630      ALPG
5  IL22343_TruSeq_1630_ALPG_MSLN  0.5024283 183.18972367   TruSeq  1630 ALPG_MSLN
6    IL22344_TruSeq_1630_neg_neg 20.3249079   2.98078056   TruSeq  1630   NEG_NEG
7          IL22345_Tn5_1629_ALPG  2.3489160   3.32393767      Tn5  1629      ALPG
8     IL22346_Tn5_1629_ALPG_MSLN  0.2744550  23.58352737      Tn5  1629 ALPG_MSLN
9       IL22347_Tn5_1629_neg_neg 29.0192647   0.62770456      Tn5  1629   NEG_NEG
10         IL22348_Tn5_1630_ALPG  2.1085505   3.65318639      Tn5  1630      ALPG
11    IL22349_Tn5_1630_ALPG_MSLN  0.0227982  26.58269756      Tn5  1630 ALPG_MSLN
12      IL22350_Tn5_1630_neg_neg 22.1375507   0.03083224      Tn5  1630   NEG_NEG
13         IL22353_UII_1629_ALPG  2.3931062  14.40747595      UII  1629      ALPG
14    IL22354_UII_1629_ALPG_MSLN  0.5103596 255.43495712      UII  1629 ALPG_MSLN
15      IL22355_UII_1629_neg_neg 16.1154652   3.82900086      UII  1629   NEG_NEG
16         IL22356_UII_1630_ALPG  1.8830946  26.97077389      UII  1630      ALPG
17    IL22357_UII_1630_ALPG_MSLN  0.4894062 223.97323983      UII  1630 ALPG_MSLN
18      IL22358_UII_1630_neg_neg 15.8386480   4.27980488      UII  1630   NEG_NEG

ADD REPLY • link 2.9 years ago by shanasabri ▴ 40