Information Content (IC) of GO terms
1
1
Entering edit mode
8.3 years ago
int11ap1 ▴ 470

Dear all,

I have downloaded all GO terms of UniProt genes (file). I would like to calculate the Information Content (IC). According to my knowledge, it is calculated as -log(frequency of GO term / all GO terms). Nevertheless, I think I must take into account all children, right?

So, if I have a total list of 1.000.000 GO terms, and I want to calculate the IC of term X (let's say, it is seen 3 times among the 1.000.000 terms) having a unique child Y (it's seen 1 time), is it correct to calculate the IC as -log(3/1000000 + 1/1000000)?

There's some publicy available file having all ICs for UniProt?

Thank you very much.

go uniprot ic • 3.2k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

The information content is -log(p(term)). In general, p(term) is the probability of seeing the term in the data. For GO it is usually taken as the number of genes annotated with the GO term (or any of its children) over all the genes in the data set under consideration. A much less common alternative is to derive the information content from the number of children of the term. It seems to me you're referring to this second definition of information content. See this paper for how to compute it (I don't think you're doing it correctly).

ADD COMMENT

Login before adding your answer.

Traffic: 2871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6