Variance decomposition analysis (solved)
0
1
Entering edit mode
9.5 years ago
juncheng ▴ 220

I'm doing biological gene expression data. For the genes I analysed, each gene can produce two isoforms, one dominant (Gene isoform 1) one minus can be treated as a by-product (Gene isoform 2).

I'm actually interested in is the expression change of the by-product following time series (Gene isoform 2). However, the problem is, the expression of "Gene isoform 1" is a confounder here. i.e. I want the variance of "isoform 2" expression is driven by itself, not driven by the variance of the dominant "isoform 1".

Two graph to explain my problem:

Situation one:

"Gene isoform 1" and "Gene isoform 2" both increase their expression over time, however they vary synchronously, it is likely the variation of Gene isoform 2 is due to the caused by the variation of "Gene isoform 1". This is not what I want!

Situation 2:

"Gene isoform 1" and "Gene isoform 2" both increase their expression over time, the increase of "Gene isoform 2" is much faster than "Gene isoform 1", this means at least partially "Gene isoform 2" vary independent of "Gene isoform 1", and this independent part of variance is exactly what I'm interested in.

The question is, how do I test the change of "Gene isoform 2" without the influence of "Gene isoform 1".

Thanks a tons for any answer!

RNA-Seq R hypothesis-test • 3.2k views
ADD COMMENT
0
Entering edit mode

Another way to formulate this problem is, how to test the ratio changes across time points, with each time points have replicates.

For example, here is a example expression of the two isoforms of one gene. The data has 4 time points and each time points have 4 replicates.

> cat(as.numeric(Isoform1['1642',]))
80 108  81  62 137 122 110 115 101 127 126  82 173 155 190 177

> cat(as.numeric(Isoform2['1642',]))
​31  32  32  26  32  31  28  32  33  32  28  25  34  31  36  38

How to test the ratio changes across time. A chi square test seams like do not support replicates.

ADD REPLY
0
Entering edit mode

Are these actually integer counts or have you rounded things? You might be able to use a beta-binomial test.

ADD REPLY
0
Entering edit mode

They are discrete data, but may have .5 for some reason. Could you give a little detail how to fit a beta-binomial model? I actually tried but did not figure out.

ADD REPLY
1
Entering edit mode

Sure, have a look at the "VGAM" package. You end up doing something like:

library(VGAM)
m <- matrix(t(read.table(text="80 108  81  62 137 122 110 115 101 127 126  82 173 155 190 177
31  32  32  26  32  31  28  32  33  32  28  25  34  31  36  38")), ncol=2)
m[,1] = rowSums(m)
groups=data.frame(time=factor(rep(c(1:4), each=4)))
fit <- vglm(m~time, data=groups, betabinomial) #You'll get warnings

Betabinomial models are kind of a pain to deal with, but they're powerful. You might have to play around with some of the defaults to get better fits.

ADD REPLY
0
Entering edit mode

I just found this package seems like doing some straightforward test. I' m in the MPI of aging. And you? B-it I guess.

ADD REPLY
0
Entering edit mode

That package is new to me, but given the increased utility of betabinomial models in NGS I'm not surprised that there are packages being added (I'm slowly working on one myself).

Always nice to know some of the other bioinfo folks in the area. I'm at the DZNE (molecular and cellular cognition group) in Bonn.

ADD REPLY
0
Entering edit mode

Nice. Thanks a lot, you actually helped me a lot. :).

ADD REPLY
0
Entering edit mode

Hi Devon, how can I do actual significance test after fitting with the model? For example, in this case, test the significance across different time points.

ADD REPLY
0
Entering edit mode

If you just want to see if there's a time dependent effect in general, then you can also fit to a null model (~1) and then use a likelihood ratio test (it's convenient to just use the anova() function).

ADD REPLY
0
Entering edit mode

You mean fit with a anova model, or fit with betabinomial model and test the parameters by anova? Sorry, I'm not familiar with betabinomial, as you said, some playing around is definitely necessary.

ADD REPLY
1
Entering edit mode

The latter, fit with betabinomial models and then test for a difference in them with the anova function. The anova function is a convenient way to perform a likelihood ratio test between two model fits.

ADD REPLY
0
Entering edit mode

Hi Devon, I tried with different packages.

With ibb:

bb.test( Isoform1, Isoform1 + Isoform1 ), group = rep(seq(1,4),each=4))

This seams dose not reach enough power, for example, something like this cannot be detected:

I also tried package VGAM and aod (easy to use). The later can easily use anova() to compare models.

What I do like:

m <- data.frame(t(read.table(text="80 108  81  62 137 122 110 115 101 127 126  82 173 155 190 177 31  32  32  26  32  31  28  32  33  32  28  25  34  31  36  38?)), ncol=2)

t1=m[1:4,] # First time points
t2=m[5:8,] # First time points

fm1 <- betabin(cbind(X1, X1 - X2) ~ 1, ~ 1,data=t1)
fm2 <- betabin(cbind(X1, X1 - X2) ~ 1, ~ 1,data=t2)

anova(fm1,fm2)
Analysis of Deviance Table (beta-binomial models)

fm1: fixed = cbind(X1, X1 - X2) ~ 1; random = ~1 
fm2: fixed = cbind(X1, X1 - X2) ~ 1; random = ~1 

      logL k   AIC  AICc   BIC Resid. dev. Resid. Df    Test Deviance Df P(> Chi2)
fm1 -11.51 2 27.01 39.01 25.79     0.25263         2                              
fm2 -12.11 2 28.23 40.23 27.00     0.04411         2 fm1-fm2   -1.216  0         0

Does not seams like correct. Do I need to fit the BetaB model for each time point, and do likelihood ratio test with anova? Please point out if I did wrong, thank you very much again.

ADD REPLY
1
Entering edit mode

Since aod doesn't allow contrasts (that I know of), you can use the subsetting method as follows:

library(aod)
m <- as.data.frame(matrix(unlist(read.table(text="80 108  81  62 137 122 110 115 101 127 126  82 173 155 190 177 31  32  32  26  32  31  28  32  33  32  28  25  34  31  36  38", header=F)), ncol=2, byrow=F))
m$timepoint = factor(c(rep(1:4,each=4)))#keep things together
m$V2 = m$V1-m$V2
fitFull = betabin(cbind(V1, V2) ~ timepoint, ~ 1, data=m[c(1:8),])
fitNull = betabin(cbind(V1, V2) ~ 1, ~ 1, data=m[c(1:8),])
anova(fitFull, fitNull)

Note that fitFull has fitting issues. Beta binomial models are a pain to deal with, so you'll have to play around with the settings fed to optim().

You might also just fit the full model (not subset) and then contrast the wald statistics as needed.

ADD REPLY
0
Entering edit mode

BTW, out of curiosity, whose lab are you in in Cologne?

ADD REPLY

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6