3.4 years ago by
Seattle, WA USA
Ignore the bit units for a moment. It's a measurement of information or certainty. Roughly speaking, the relative height is how certain you are to observe a particular residue or nucleotide at a particular position.
Randomness is maximum uncertainty — all events can happen with equal probability. The most simple case is flipping a fair coin and having equal chance to get a heads or tails.
The opposite of randomness is certainty — you expect some event to happen to the exclusion of most or all other possibilities, like when you roll a weighted die in a crooked casino run by Ricky Jay, and one face comes up more than all the others.
In the original paper by Crooks et al. they call the measure at each base a measurement of conservation, which is defined as the difference between the uncertainty of what you observe in reality (which is low for one or two residues that you see more frequently than all others, such as in a transcription factor binding site), and the frequency you'd expect if the biology of where TFs bind was completely random (like a factor that binds without caring about the DNA sequence: you have a pure 1-in-4 chance of seeing one of A, T, C, or G, at any position).
High heights indicate high conservation: low uncertainty.
Transcription factor binding sites are highly conserved, biologically or evolutionarily speaking, because they control how segments of DNA get turned on, and different parts of the DNA need to get controlled at different times and in specific ways, in order for the concert of proteins to do their thing and keep the organism alive.
Like tossing a spanner into an engine of a moving car, mutations to TF binding sites will more often than not break the biological machinery of organisms and so weaken or kill them before they can make copies of themselves.
Over time, therefore, organisms have evolved genomes that conserve these regulatory, functional sites, so as to stay alive long enough to reproduce.
That's why sequence logos are good visual representations of TF sites. Logos show you where and which nucleotides are conserved to the exclusion of others. Logos show how different transcription factors have evolved a preference (a higher "certainty") for binding to different sequences of DNA. Further, logos offer quantitative or informational measures of that certainty — "bit scores" — which are based on mathematics in information theory.