Question: Working With Catmap
0
Assa Yeroslaviz1.2k wrote:

Hi everybody,

did anyone work with the the Catmap algorithm? It is an algorithm for the significance analysis on a categorical level of microarray data.

I am working with it and would like to have some help understanding the results. According to the paper you get two output files. As far as I understand it, what I was looking for is in the main result file. The p-value says how significant is the accumulation of the genes from the specific category. Q1. What about the ROC area? what does this value tells me? Does it has anything to do with ROC curves? if so than what? Q2. What is the meaning of the percentile? What do I need to know them for?

Q3. What information do I get from the companion file? All I have there is a list of the genes found in each category. What does it tells me?

I would appreciate any help I can get, as I find the paper quite insufficient to understand the results.

Thanks A.

gene enrichment • 1.3k views
modified 7.4 years ago by Neilfws48k • written 7.4 years ago by Assa Yeroslaviz1.2k
1
Neilfws48k wrote:

I haven't used Catmap, but I can offer some suggestions.

First, make sure you have a good understanding of p-values and corrections for multiple testing (if any) performed by the algorithm. A p-value isn't really a measure of "how significant" something is. In a sense, it's a measure of how often (the probability) you'd expect to see the given result in the absence of a real effect.

Q1. Yes, ROC area has something to do with ROC curves - specifically, it is the area under the ROC curve (AUC). What it tells you is "how well" the class of an observation is predicted by the algorithm. As it happens, the AUC is related mathematically to the Wilcoxon rank sum, as the authors mention in their paper. Think of it like this: if you randomly selected an instance from each of 2 classes, the AUC is the probability that your classifier would "get it right" - that is, assign the positive instance to the positive class. So an AUC of 0.5 is "no better than random" and higher = better.

Q2. The paper mentions the 25th, 50th, and 75th percentiles of the ranks. Percentiles just tell you something about the distributions of values. 25% of observations have a value less than or equal to the 25th percentile. Conversely, 75% of observations have a value less than or equal to the 75th percentile. You might be more familiar with the terms first quartile, median and third quartile for 25th, 50th and 75th percentile.

Basically, percentiles just tell you whether a value is particularly high, low or somewhere in the middle.

Q3. According to the paper, there's more in the companion file than a "list of the genes found in each category": it's supposed to contain all categories, genes and their ranks. It may not be useful to you at all, depending on what you intend to do with the results.

In general, I think you'll understand the paper better if you read up on (or refresh your knowledge of) some of the statistical methods mentioned here. They all assume the normal distribution, which is a good starting point.