How to check if there is an association between clustering output (groups) and phenotypic variables (gender, disease stage)
I am interested to know whether there is any association between my clustering groups and phenotypic features (e.g. gender and Braak stages). I have applied spectral clustering on gene expression data from 200 samples (AD and controls) with a fixed number of clusters = 2. Ideally, I expected all AD samples to be clustered separately from the control samples. Now that I have mixed results, clusters 1 and 2 both have AD and control samples in them. I was wondering if by some kind of analysis I could find out if there is any association between cluster grouping (1 and 2) and phenotypic features such as gender and Braak stages of AD disease, mean figuring out if these variables have any influence on the clustering. Again here, I expect high Braak stages (representing advanced disease stage) to clustered in one cluster and lower (representing early disease stage) in another but again the results are mixed if I plot them. I would like to know if there could be a statistical test that shows the association between these variables and the clustering group.

If somebody knows any such kind of statistical analysis in R then please guide me towards it. Thank you!

If I understand correctly, you have clustered your samples into 2 clusters, and want to know if some categorical variables (such as gender) are more represented in one cluster or another? Why don't classify your samples into the 2 clusters and do Chi-squared tests of over-representation of those variables?

