Question: How do I build a centroid-based predictor for clustering gene expression data (microarray)?
I'm trying to assign breast cancer intrinsic subtypes to a cohort of tumor samples using an expression dataset (microarray). I've clustered the samples into subtypes using the PAM50 centroid-based predictor, but I'm interested in assigning the claudin-low subtype as well (not included in PAM50). I have the list of ~800 genes whose expression can predict this subtype, yet I'm not sure how to create the centroid-based predictor using this gene list.

