Exact pattern matching: why is it more specific to look for a non-consecutive subsequence?
5 days ago
Bethan • 0

I'm doing an online Bioinformatics course and the following example was given but not really explained.

If you're searching for an exact pattern P in text T, e.g.:

T = CGTGCGTGCTT...(etc.)

It is apparently more specific to look for a non-consecutive subsequence of P in T, than a consecutive substring of P in T.

In other words, searching for this exact subsequence of P within T...

P = GC_T_C_

is supposed to give more specific results than searching for this one:

P = GCGT___

The question is: is that really true and if so, why? Either way the program is looking for 4 bases and I'm assuming those bases have a roughly equal chance of being A, C, T or G (ignoring GC skew). Therefore, shouldn't searching for either pattern in T be equally specific?

