Sequence logo calculation and sketch question
0
0
Entering edit mode
4.3 years ago
jbrody11 • 0

I am doing a practice exam question shown here:

I understand how to do the frequency matrix but I am unsure about the sequence logo formula calculations. Should my information content maximum be 2 bits? or could it be up to 4 or even 8?

sequence • 1.3k views
0
Entering edit mode

Yes, it is a maximum of 2 bit.

0
Entering edit mode

Okay thanks, but I still see an issue with the logo formula they give. I understand the first column of logo will have an A to the height of 2 bits, but my calculations for other columns, for instance the second column are ending up with bit values like 3.295 which is above the maximum. Would you kindly look at how the second column's bit value is calculated? The frequencies are A=0.125 C=0.25 G=0.625 T=0

0
Entering edit mode

Maximum information content (MIC) in logo representations is log2(N), where N is the number of unique residue types. That means MIC is 2 for nucleic acids, 4.321928095 for proteins.

0
Entering edit mode

Okay thanks, I'm still confused as to the formula they provided for the sequence logo. For instance, in the second column, G has a frequency of 0.625, therefore I calculate its information content to be: 2 - (0.625 x log2(0.625)) but this answer is 2.42 which is above the maximum IC ... ?

0
Entering edit mode

The formula you have is incorrect. The way it was written for you, it should be 2 + ....

Specifically, the formula is

IC = log2(N) - (H + e)


where N is the number of unique residue types, H is Shannon's uncertainty and e is a small-number correction. Since H itself is a negative of the sum (H = - sigma ( Fbi * log2(Fbi))), it essentially becomes 2 + ... if we ignore e as was done in your formula above.

See here for and here details.

0
Entering edit mode

Okay thanks, thats interesting because that question is from a previous year university exam so I'm surprised they gave the wrong formula. Just one last question please? So in that column if I use the correct formula, I get G=1.58 C=1.5 A=1.625 but now I have to multiply each of these by their frequencies to get the actual information content heights. So G will now be 0.99, C will be 0.375 and A will be 0.2 Therefore my logo in that column at the bottom will look like a small A to the height of 0.2 bits, then C to a height of 0.575 (0.375 + 0.2) and G to a height of 1.565 bits (0.99 + 0.375 + 0.2) Am I correct with this?