Hi,
can anybody suggest good R packages for variable selection for building linear models for analysing microarray data ? Literature suggests Bayesian variable selection/ Random forest etc. but I still wanted an opinion from experienced folks.
Essentially i wish to determine DEG between 2 groups in diseased samples. I also have expression value from paired normal samples as well. But even after thorough preprocessing of the data - there seems to be a lot of noise and I am certain it is because of the associated covariates like age, disease stage, disease class, presence of infection etc.
My problem is thus of a multiple regression model for each gene for each group in the diseased sample than just a 2*2 factorial design ( disease group and disease/normals )
I want to be able to select the variables that have the most impact on gene exp of diseased samples and then include them as covariates in my final model rather than use backward/forward elimination.
Thanks !
Thank you. that was a very helpful lead. So am I right to understand that SVA would then help find the DEG between group1 and group2 irrespective of all the other biological factors such as disease stage, type, age of patient etc.?
What I finally want is a list of DEGs in the following contrast - (Group1.Diseased- Group1.Normals) - (Group2.Diseased - Group2.Normal) just after making sure that none of the factors such as disease stage, type blah blah blah are masking the true difference. Right now I barely get a handful of genes from 25K+ if I only use the above contrast matrix
If your sample size is large enough, you can just include your covariates in a linear model (or GLM). Limma (or DESeq2 or edgeR for RNA-seq) will allow such models and you can then use your contrast of interest to find DEG "controlling" for the covariates.