I want to cluster genes based on a expression matrix and annotation. With unsupervised learning algorithms (Hierachical, K-means,...), the clusters are only based on the correlation of the gen expression. So my idea was to supervise the learning with annotation data, which could lead to more meaningful clusters (depends on the annotation input). Random Forest is used on expression data, but I only found examples with two classifications, that were both included in the training set. I also want to find new clusters, that didn´t exist in the training set. So the algorithm is trained with an expression matrix (feature), some annotations (feature) and many clusters of different sizes (classifications). On the test data, the algorithm should decide, which genes belong together in a cluster. It would find clusters that already existed in the training but also new clusters. Is Random Forest the right algorithm for that?