I have tumor age (nominal variable, divided into age brackets), gene expression (in RPKM) and sex (binary) for many tumors in people at different ages (not paired..ie. not time series of same tumors but different tumors at each age). I want to find the genes that are most differentially expressed with age, controlling for sex. What would be the best way to do this?
1) Do I need to transform the RPKM values to rank (as per this thread: How to normalize RPKM values to use in regression models ? )? The difference is that they use regularized regression.
2) Is a linear regression appropriate? This paper (http://www.biomedcentral.com/1471-2164/10/S3/S16) seems to suggest quantile regression is better (however there are other features of their algorithm, and their age is not ordinal) to linear regression which is used in much of the aging gene expression studies with microarray.
3) Because I am repurposing data and do not know batch would it be appropriate to run sva (or something similar) on it?