I have downloaded all GO terms of UniProt genes (file). I would like to calculate the Information Content (IC). According to my knowledge, it is calculated as -log(frequency of GO term / all GO terms). Nevertheless, I think I must take into account all children, right?
So, if I have a total list of 1.000.000 GO terms, and I want to calculate the IC of term X (let's say, it is seen 3 times among the 1.000.000 terms) having a unique child Y (it's seen 1 time), is it correct to calculate the IC as -log(3/1000000 + 1/1000000)?
There's some publicy available file having all ICs for UniProt?
Thank you very much.