Sometimes, in an experiment, I want to model RNA expression as a function of some continuous variable such as age, dose, or time after treatment, using a linear model. Doing this is easy enough, but the problem is that, as the name suggests, the log expression is modelled as a linear function on the covariate in question. But how do I know that a linear relationship is the correct one? What if the covariate needs to be log-transformed, or square-root transformed? How would I figure that out? Obviously I could try a bunch of common functions and see which one works "best", but that constitutes data snooping. Also, simply plotting expression vs the covariate of interest might work if there is only one covariate, but it will be less effective if there are multiple such covariates.
So, is there a statistically principled way to determine the appropriate transformation for a continuous covariate in a linear model?