Confused about sequence motif regular expressions
1
0
Entering edit mode
3.3 years ago

Consider the regular expression for this motif:

A, followed by anything but C, followed by either A or T, followed by G.... i.e.: A{C}[AT]G

Is this regular expression used to describe variation in motifs across individuals, or within individuals? For example, if I have this motif in my own DNA, can I have multiple sequences from this regular expression? Or do I only have one of them? Like can I have both AGAG and AGTG, since they both satisfy the regex, or does this apply to separate individuals?

genome genetics RNA-Seq • 713 views
ADD COMMENT
1
Entering edit mode

Forget about regular expressions when you're discussing the biological concept of motifs. Let's say you're looking at a motif representing the binding sequence of a transcription factor (TFBS), an individual could have multiple actual sequences conforming to the motif each of them upstream of a different gene, and all would bind the TF.

ADD REPLY
1
Entering edit mode
3.3 years ago
Mensur Dlakic ★ 27k

Regular expressions in this case represent sequences, not individuals. If the sequences that were used to build this expression came from a single individual, it will refer to that individual. For practical purposes, short expressions such as the one you are using are guaranteed to occur multiple times, in all combinations, and in all individuals. I can say that with certainty because the number of all tetranucleotides - not just the ones conforming to your expression - is relatively small, and all of them occur many times in all genomes. Assuming uniform distribution of nucleotides, all nucleotide tetramers are expected to appear ( length-of-genome / 256) times, which for human genome is a huge number.

ADD COMMENT

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6