So this question came up in a lecture on "Introduction to transcription". Our professor asked, what is the probability of having the same starting base on the forward and the reverse strand. In other words, if I have

``````5'_______3'
ATTGCCATAT
TAACGGTATA
3'_______5'
``````

What are the odds of that happening? (same for other bases, T,G,C) My answer is as follows:
P(A)=P(T)=P(G)=P(C)=1/4 So, P(Aon5' and Aon3')=P(A).P(A) =1/16
and, P(Gon5' and Gon3')=P(G).P(G)=1/16, and going on like this, we add up (for all bases) and get 1/4. Am I correct?
The issue here is this fact: The probability of having same base on reverse strand =1/4 = Probability of having any one base
Is there any significance to this?

This model assumes that the p for each base is 1/4. Coming from a biological standpoint, transcription (start) sites are highly clustered by binding motifs, so by far not a random distribution of nucleotides. To have it accurately, one probably needs to correct for factors like GC content. So I would say a naive probability as you propose will not be accurate. Maybe you have a look at papers about motif enrichments and how they model nucleotide occurrence.

Thanks for replying. Will look into factors like GC content.

Yes, assuming equal probability bases, the probability of having a base and its complement at opposite ends of a given interval is 25%. Yes, this is because the probability of any base is 25% and would be different otherwise. Of course this is unlikely to match what happens in any organism for the reasons ATPoint mentioned.

Thanks for clearing.

By curiosity , what is other bias to consider for this kind of problem ?

Is there influences of genetic code ? (i thought about proportion of base on codon position )