How often to expect this particular consensus sequence?
7.3 years ago
gms2005gms • 0

How often should I expect to see a consensus sequence of GGNGC, where N is any base and there is less than 120 nucleotides separating this consensus sequence to the start of another of the same sequence? I really have no clue where to start. Should I take into account all four possible consensus sequences replacing nucleotide N?

1. Is this a homework question?
2. What do you know? That is, do you know the actual 5mer frequency or do you have to assume that they're all equally distributed? This question alone should give you a hint on how to get started.
7.3 years ago
Cytosine ▴ 460

Sounds like a statistics homework. I'll try to give it a shot, but I'm no statistics expert...

Assuming you have a multinomial distribution of equal probabilities for every nucleotide, you can calculate the chance for this consensus sequence to occur with the info you provided.

After you calculate the probability for this sequence, you can use the information about the occurence frequency to calculate the chance to see this sequence every 120 nucleotides, assuming again that you have a multinomial distribution of equal probabilites.

That percentage will tell the probability of the given sequence appearing exactly once in windows of 120 nucleotides. If the percentage you observe in your test data is higher, then I suppose that means the sequence is enriched.