I'm doing an online Bioinformatics course and the following example was given but not really explained.
If you're searching for an exact pattern P in text T, e.g.:
T = CGTGCGTGCTT...(etc.) P = GCGTACT
It is apparently more specific to look for a non-consecutive subsequence of P in T, than a consecutive substring of P in T.
In other words, searching for this exact subsequence of P within T...
P = GC_T_C_
is supposed to give more specific results than searching for this one:
P = GCGT___
The question is: is that really true and if so, why? Either way the program is looking for 4 bases and I'm assuming those bases have a roughly equal chance of being A, C, T or G (ignoring GC skew). Therefore, shouldn't searching for either pattern in T be equally specific?