Hello, I have two sets of different genes I have identified from my previous analysis. Let's call these sets "A" and "B". A contains 10 genes "g1","g2", "g3", "g4".."g10", B contains 14 genes "g11", "g12"..."g24". I want to compare in each sample (transcriptome) the distribution of these two sets of genes A vs B. These sets of genes are predictive for survival. I know that when A is strongly expressed with respect B the patient has a bad survival. I thought to use the Kolmogorov-Smirnov test (Ks,test) to compare the distributions of A vs B. It works very well...all the patients whose pvalue is significant show a different survival. Do you think that the Ks.test is statistically correct? Do you recommend other methods to classify each single patient based on these two sets of genes? any other suggestions is more than welcome. Thank you
What you're doing is not entirely clear. Do you use the KS test to assess whether the 10 values of set A and the 14 values of set B come from the same distribution ? If so, I think the KS test is inappropriate here because set A and set B are not mutually independent (they are genes measured in the same sample). In this case, a permutation test would seem more appropriate.
However, if the goal is to classify the samples/patients, you could try various machine learning approaches using the vectors of 24 gene values as input data. If you have training data (i.e. vectors with ground truth label), then build a classifier. If you don't have or do not want to use training data then try clustering. Which particular method/algorithm to use is up to you but could depend on details you haven't given.