I read a paper on cancer subtype classification of Glioblastoma. After they do unsupervised clustering on the gene expression data, the actually select sample (patients) that have positive silhouette score for supervised classification. From my idea, I don't think this a good approach or even wrong. The classifier might heavily overfitted with the samples (patients) one selected.
What your idear? Is it correct to choose training samples based on there performance of unsupervised clustering?