Covariate Selection For Microarray Data
1
0
Entering edit mode
10.7 years ago
aditi.qamra ▴ 270

Hi,

can anybody suggest good R packages for variable selection for building linear models for analysing microarray data ? Literature suggests Bayesian variable selection/ Random forest etc. but I still wanted an opinion from experienced folks.

Essentially i wish to determine DEG between 2 groups in diseased samples. I also have expression value from paired normal samples as well. But even after thorough preprocessing of the data - there seems to be a lot of noise and I am certain it is because of the associated covariates like age, disease stage, disease class, presence of infection etc.

My problem is thus of a multiple regression model for each gene for each group in the diseased sample than just a 2*2 factorial design ( disease group and disease/normals )

I want to be able to select the variables that have the most impact on gene exp of diseased samples and then include them as covariates in my final model rather than use backward/forward elimination.

Thanks !

r microarray • 3.1k views
ADD COMMENT
1
Entering edit mode
10.7 years ago

The problem, of course, is that you are looking for effects for each gene independently. A model that fits one gene well will not fit another. In practice, a thorough unsupervised analysis and subset supervised analyses may give you a sense of the important covariates (in terms of effects on gene expression). A more structured approach is to use something like SVA to define the latent variables apparent in the data.

ADD COMMENT
0
Entering edit mode

Thank you. that was a very helpful lead. So am I right to understand that SVA would then help find the DEG between group1 and group2 irrespective of all the other biological factors such as disease stage, type, age of patient etc.?

What I finally want is a list of DEGs in the following contrast - (Group1.Diseased- Group1.Normals) - (Group2.Diseased - Group2.Normal) just after making sure that none of the factors such as disease stage, type blah blah blah are masking the true difference. Right now I barely get a handful of genes from 25K+ if I only use the above contrast matrix

ADD REPLY
0
Entering edit mode

If your sample size is large enough, you can just include your covariates in a linear model (or GLM). Limma (or DESeq2 or edgeR for RNA-seq) will allow such models and you can then use your contrast of interest to find DEG "controlling" for the covariates.

ADD REPLY

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6