Entering edit mode

4.3 years ago

jbrody11
•
0

I am doing a practice exam question shown here:

I understand how to do the frequency matrix but I am unsure about the sequence logo formula calculations. Should my information content maximum be 2 bits? or could it be up to 4 or even 8?

Yes, it is a maximum of 2 bit.

Okay thanks, but I still see an issue with the logo formula they give. I understand the first column of logo will have an A to the height of 2 bits, but my calculations for other columns, for instance the second column are ending up with bit values like 3.295 which is above the maximum. Would you kindly look at how the second column's bit value is calculated? The frequencies are A=0.125 C=0.25 G=0.625 T=0

Maximum information content (MIC) in logo representations is log2(N), where

`N`

is the number of unique residue types. That means MIC is 2 for nucleic acids, 4.321928095 for proteins.Okay thanks, I'm still confused as to the formula they provided for the sequence logo. For instance, in the second column, G has a frequency of 0.625, therefore I calculate its information content to be: 2 - (0.625 x log2(0.625)) but this answer is 2.42 which is above the maximum IC ... ?

The formula you have is incorrect. The way it was written for you, it should be

`2 + ...`

.Specifically, the formula is

where

`N`

is the number of unique residue types,`H`

is Shannon's uncertainty and`e`

is a small-number correction. Since`H`

itself is a negative of the sum (`H = - sigma ( Fbi * log2(Fbi))`

), it essentially becomes`2 + ...`

if we ignore`e`

as was done in your formula above.See

herefor andheredetails.Okay thanks, thats interesting because that question is from a previous year university exam so I'm surprised they gave the wrong formula. Just one last question please? So in that column if I use the correct formula, I get G=1.58 C=1.5 A=1.625 but now I have to multiply each of these by their frequencies to get the actual information content heights. So G will now be 0.99, C will be 0.375 and A will be 0.2 Therefore my logo in that column at the bottom will look like a small A to the height of 0.2 bits, then C to a height of 0.575 (0.375 + 0.2) and G to a height of 1.565 bits (0.99 + 0.375 + 0.2) Am I correct with this?