Question: How can I find the correct transformation for a continuous covariate in a gene expression linear model?
3
gravatar for Ryan Thompson
3.9 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

Sometimes, in an experiment, I want to model RNA expression as a function of some continuous variable such as age, dose, or time after treatment, using a linear model. Doing this is easy enough, but the problem is that, as the name suggests, the log expression is modelled as a linear function on the covariate in question. But how do I know that a linear relationship is the correct one? What if the covariate needs to be log-transformed, or square-root transformed? How would I figure that out? Obviously I could try a bunch of common functions and see which one works "best", but that constitutes data snooping. Also, simply plotting expression vs the covariate of interest might work if there is only one covariate, but it will be less effective if there are multiple such covariates.

So, is there a statistically principled way to determine the appropriate transformation for a continuous covariate in a linear model?

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Ryan Thompson3.4k
1

Are we excluding doing a pilot experiment or subsetting the data and doing the snooping and testing on different subsets? I strongly suspect that those are the only really reliable methods without data snooping (assuming no a priori knowledge about what the covariate relationship might reasonably be like).

ADD REPLYlink written 3.9 years ago by Devon Ryan89k

Yes, I'm asking if there's a way to determine the appropriate transformation from the data itself. Perhaps by discovering the globally optimal transformation across all genes, so that any one gene only contributes a tiny fraction and data snooping is minimized?

ADD REPLYlink written 3.9 years ago by Ryan Thompson3.4k

I suspect that the answer will be that the closest you can get is to try to interpret a PCA plot. The data snooping there is about as low as you're going to get. You might want to post this to cross-validated and see what the statistics folks think, hopefully they know of a better option.

ADD REPLYlink written 3.9 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1099 users visited in the last hour