Question: Information Content (IC) of GO terms
gravatar for int11ap1
3.8 years ago by
int11ap1380 wrote:

Dear all,

I have downloaded all GO terms of UniProt genes (file). I would like to calculate the Information Content (IC). According to my knowledge, it is calculated as -log(frequency of GO term / all GO terms). Nevertheless, I think I must take into account all children, right?

So, if I have a total list of 1.000.000 GO terms, and I want to calculate the IC of term X (let's say, it is seen 3 times among the 1.000.000 terms) having a unique child Y (it's seen 1 time), is it correct to calculate the IC as -log(3/1000000 + 1/1000000)?

There's some publicy available file having all ICs for UniProt?

Thank you very much.

go uniprot ic • 1.7k views
ADD COMMENTlink modified 3.8 years ago by Jean-Karim Heriche21k • written 3.8 years ago by int11ap1380
gravatar for Jean-Karim Heriche
3.8 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

The information content is -log(p(term)). In general, p(term) is the probability of seeing the term in the data. For GO it is usually taken as the number of genes annotated with the GO term (or any of its children) over all the genes in the data set under consideration. A much less common alternative is to derive the information content from the number of children of the term. It seems to me you're referring to this second definition of information content. See this paper for how to compute it (I don't think you're doing it correctly).

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Jean-Karim Heriche21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2247 users visited in the last hour