My project is working on a large dataset of RPKM values for patients with and without Schizophrenia.
After some preprocessing steps including dumping genes with lots of zero RPKM values and log2-transforming, I have applied Non-negative Matrix Factorization (NNMF) as a dimensionality reduction technique. I am looking for statistically significant correlations between the resulting groups of genes ('metagenes') and schizophrenia.
Until now, I have been using a simple t-test, with Bonferroni correction, to test the metagene expression values for correlation with Schizophrenia. I think that the normality condition is fine because there are about 150 cases and 170 controls - so CLT holds. Some of the results have such very low adjusted p-values that I am relatively certain I have found something interesting.
However, I need to be sure absolutely sure that this is not down to confounding factors. There are slight imbalances by demographic in the schizophrenia vs. non-schizophrenia groups - I need to correct for a few variables, both discrete and continuous - the full list I want to correct for is: Age, Sex, Race, Smoking or not, Postmortem interval, sample pH, and RNA integrity number.
Is there a statistical test, more advanced than the t-test, that can be applied that will ACCOUNT for the impact of these confounding covariates, and make sure that I really have found statistically significant correlations with Schizophrenia? If there is not, then can you recommend how I could change my procedure to best guard against the the confounding factors?
Want to make sure I'm reporting solid results! Thanks for any help you can give.