Question: What is the best test to compare the expression of two different sets of genes in the same transcriptome?
gravatar for Pas
10 months ago by
Pas20 wrote:

Hello, I have two sets of different genes I have identified from my previous analysis. Let's call these sets "A" and "B". A contains 10 genes "g1","g2", "g3", "g4".."g10", B contains 14 genes "g11", "g12"..."g24". I want to compare in each sample (transcriptome) the distribution of these two sets of genes A vs B. These sets of genes are predictive for survival. I know that when A is strongly expressed with respect B the patient has a bad survival. I thought to use the Kolmogorov-Smirnov test (Ks,test) to compare the distributions of A vs B. It works very well...all the patients whose pvalue is significant show a different survival. Do you think that the Ks.test is statistically correct? Do you recommend other methods to classify each single patient based on these two sets of genes? any other suggestions is more than welcome. Thank you

rna-seq R gene • 391 views
ADD COMMENTlink modified 9 months ago by Jean-Karim Heriche16k • written 10 months ago by Pas20
gravatar for Jean-Karim Heriche
9 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche16k wrote:

What you're doing is not entirely clear. Do you use the KS test to assess whether the 10 values of set A and the 14 values of set B come from the same distribution ? If so, I think the KS test is inappropriate here because set A and set B are not mutually independent (they are genes measured in the same sample). In this case, a permutation test would seem more appropriate.
However, if the goal is to classify the samples/patients, you could try various machine learning approaches using the vectors of 24 gene values as input data. If you have training data (i.e. vectors with ground truth label), then build a classifier. If you don't have or do not want to use training data then try clustering. Which particular method/algorithm to use is up to you but could depend on details you haven't given.

ADD COMMENTlink written 9 months ago by Jean-Karim Heriche16k

Thank you Jean Karim. I am aware of the assumptions underlying the Ks test, this why posted here. the data set is small, so that I can't use any ML approach.Do you know an alternative test to the Ks test that doesn't assume independency? other words: if you have 1 sample ..only 1 sample where you want to compare 2 set of genes, which test do you suggest?

ADD REPLYlink written 9 months ago by Pas20

You can still use ML approaches when the data set is small. It depends on how small is small. For example, if you want to associate a probability to two classes (e.g. good/bad prognosis), you could try logistic regression. If you still want to do a statistical test for some difference between set A and set B, go for a permutation test. You could go with the KS statistics if it works well for you, only compute the p-value using permutations.

ADD REPLYlink written 9 months ago by Jean-Karim Heriche16k

Thank you Jean Karim!

ADD REPLYlink written 9 months ago by Pas20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour