Question: Building a biologically meaningful design matrix for limma/voom in a large RNASeq experiment
0
gravatar for stefanos.bamopoulos
10 months ago by
stefanos.bamopoulos30 wrote:

Hello guys,

I have a question regarding differential expression (DE) with limma/voom in an RNA-Seq experiment of ~250 samples. I have 4 different point mutations that are mutually exclusive and I wish to identify DE isoforms specific for each mutation. Additionally, I have reason to believe that the mutations cause similar differential expression for some isoforms and I also wish to identify those.

My questions are:

  • What is the best approach to identify the differences and the similarities of those mutations?
  • How do I adress the issue of confounding?

Specifically, I have 3, 4, 6 and 17 samples for each mutation respectively, as well as ~200 samples as a control group.

The approach I have come up with so far is to:

1) perform a DE analysis for each mutation comparing it to the control group (which does not include the other mutations), like this:

design.matrix <- model.matrix(~ factor(mut1), data)

design.matrix <- model.matrix(~ factor(mut2), data)... and so on

2) after doing this for each of the 4 mutations, combine them into a single binary variable and perform a DE analysis comparing all of them against the control group:

design.matrix <- model.matrix(~ factor(all.mutations), data

My way of thinking is that since the mutations are mutually exclusive, I can compare each mutation against the control group (which does not include the other mutations) in order to identify DE isoforms specific to each mutation. Afterwards by combing them, I hope to highlight isoforms that are similarly DE for all mutations. If I check the overlap of the last analysis with the preceding 4 I should be able to identify at least some isoforms that are affected in a common way. Is it maybe sufficient to just compare the overlap of the first 4 analyses without the 2nd step?

As to the 2nd question. Is there a rule of thumb as to how many confounders I can add in a limma analysis while avoiding overfitting? Since in a differential expression analysis we don't really have "events" I am unsure how to determine the number of confounders I can adjust for. Especially for the mutation with the smallest subset (only 3 mutations) I am unsure if the relatively large control group of ~200 samples permitts me to adjust for multiple confounders.

Any input is welcome, thanks in advance!

Stefan

voom rna-seq limma confounding • 513 views
ADD COMMENTlink modified 10 days ago by Gordon Smyth750 • written 10 months ago by stefanos.bamopoulos30
1

Since the mutations are mutually-exclusive I'd create a vector Muts, in which the mutation status is specified for each subject, i.e.:

Muts <- c("Mut1", "Mut1", "Mut2", .... , "Control", ...)

I would create a design matrix as below:

Mutations <- factor(Muts, levels = c("Mut1", "Mut2", ... , "Control"))
design <- model.matrix(~0+Mutations)
colnames(design) <- levels(Mutations)

Then fit the linear model:

fit <- lmFit(eset, design)

Then extract the contrasts:

cont_mat <- makeContrasts(
"WT-mut1",
"WT-mut2", ..., 
levels=design)
fit2 <- contrasts.fit(fit, cont_mat)
fit2 <- eBayes(fit2)
topTableF(fit2)
ADD REPLYlink written 10 months ago by egeulgen710
1

You might also ask this question directly at the BioC. Gordon Smyth (as well as other authors and maintainers of popular packages) are outstandingly responsive, especially to well-written questions like yours.

ADD REPLYlink written 9 months ago by ATpoint15k

Thank you! I will try this as well

ADD REPLYlink written 9 months ago by stefanos.bamopoulos30
0
gravatar for Gordon Smyth
10 days ago by
Gordon Smyth750
Australia
Gordon Smyth750 wrote:

Cross-posted https://support.bioconductor.org/p/110309/

ADD COMMENTlink modified 10 days ago by ATpoint15k • written 10 days ago by Gordon Smyth750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1589 users visited in the last hour