Observed to Expected CpG
2
0
Entering edit mode
4.8 years ago

Observed to Expected CpG is calculated as below :

Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)   where N = length of sequence.


I also don't understand the expected CpG which is:

Expected =  (Number of C * Number of G)/ N


Can someone give an example or intuition for the above formula ?

genome CpG • 3.6k views
0
Entering edit mode
4.3 years ago

Example: a sequence list is

ccattcgactcatcacgctccccccccc cccccccccccttatccgttccgttcgacgtatttcgttgtctaatttctgacgtaactt gttccctgttaagtaccgtttatggcctatactccggtatttaaaacgacgacgattcca ccgtaaagccgtcaaccagatgaacgacctcgctcgttatatttttccggca

GC content=(70+31)/200=0.505=50.5%

Obs/Exp CpG =19 * 200/70/31

expected=(70 * 31)/200

0
Entering edit mode
3.4 years ago
lukelahood • 0

This is my way of thinking of the formula for "expected"

Lets say you just have a single C in a 200 long nucleotide chain. The probability that the next nucleotide is G (and thus, the probability that you have a single CpG island) is #G/200. The probability in this case is the expected number of CpG repeats. So if 50 of the nucleotides are G, the chances of getting 1 CpG islands is 0.25, and on average you'd expect 0.25 CpG islands.

However, usually there is more than 1 C in a nucleotide chain. Every C you have is another shot at having a CpG island, so since every C gives you another chance, you multiply the above calculated probability by the C. This gives you #G/200 * #C