What is the best test to compare the expression of two different sets of genes in the same transcriptome?
2
1
Entering edit mode
6.6 years ago
Pas ▴ 30

Hello, I have two sets of different genes I have identified from my previous analysis. Let's call these sets "A" and "B". A contains 10 genes "g1","g2", "g3", "g4".."g10", B contains 14 genes "g11", "g12"..."g24". I want to compare in each sample (transcriptome) the distribution of these two sets of genes A vs B. These sets of genes are predictive for survival. I know that when A is strongly expressed with respect B the patient has a bad survival. I thought to use the Kolmogorov-Smirnov test (Ks,test) to compare the distributions of A vs B. It works very well...all the patients whose pvalue is significant show a different survival. Do you think that the Ks.test is statistically correct? Do you recommend other methods to classify each single patient based on these two sets of genes? any other suggestions is more than welcome. Thank you

RNA-Seq R gene • 2.1k views
ADD COMMENT
1
Entering edit mode
6.6 years ago

What you're doing is not entirely clear. Do you use the KS test to assess whether the 10 values of set A and the 14 values of set B come from the same distribution ? If so, I think the KS test is inappropriate here because set A and set B are not mutually independent (they are genes measured in the same sample). In this case, a permutation test would seem more appropriate.
However, if the goal is to classify the samples/patients, you could try various machine learning approaches using the vectors of 24 gene values as input data. If you have training data (i.e. vectors with ground truth label), then build a classifier. If you don't have or do not want to use training data then try clustering. Which particular method/algorithm to use is up to you but could depend on details you haven't given.

ADD COMMENT
0
Entering edit mode

Thank you Jean Karim. I am aware of the assumptions underlying the Ks test, this why posted here. the data set is small, so that I can't use any ML approach.Do you know an alternative test to the Ks test that doesn't assume independency? ..in other words: if you have 1 sample ..only 1 sample where you want to compare 2 set of genes, which test do you suggest?

ADD REPLY
0
Entering edit mode

You can still use ML approaches when the data set is small. It depends on how small is small. For example, if you want to associate a probability to two classes (e.g. good/bad prognosis), you could try logistic regression. If you still want to do a statistical test for some difference between set A and set B, go for a permutation test. You could go with the KS statistics if it works well for you, only compute the p-value using permutations.

ADD REPLY
0
Entering edit mode

Thank you Jean Karim!

ADD REPLY
0
Entering edit mode
4.5 years ago
ritarebollo ▴ 70

Hello, I am highjacking this post (sorry!). I am also comparing two gene sets in a same transcriptome. But in my case, I have 150 genes in set A and all the other genes in set B (10000 genes or so...). I would like to see if genes from set A are highly expressed compared to the rest of the genes. I'm not sure I can actually do this... I was thinking of making random lists from set B with the same size as set A (so roughly 150 genes). I also thought to remove all genes that had 0 counts... Anyone has an idea on this? Any tool that might exist that I somehow missed? Thank you very much! R

ADD COMMENT
1
Entering edit mode

If the answer to the original question doesn't apply to your case, you should create another question. Anyway, you need to give more information on the data you have. If you have expression values, one of the thing you can start with is look at the histograms of values for A and B. An obvious difference will be more convincing than a statistical test.

ADD REPLY

Login before adding your answer.

Traffic: 2340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6