Question: Why my regular expression fails to match all the reasonable seq?
1
gravatar for jinkuozhang
2.2 years ago by
jinkuozhang30
jinkuozhang30 wrote:

I try to find all the possible "N20NGG" sequence in a target sequnce like:

example_seq ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGG

What I used is python regular expression:

import re
example_seq = "ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGG"
pattern = re.compile(r'(.{20}).GG')
all_matched_seq = pattern.finditer(example_seq)

for record in all_matched_seq:
    print(record.group(1), end="\t")
    print(record.span())

I ony got two matched sequences:

  1. ACAATTGTAGTATATAAAAA (13, 36)
  2. AAAACGGTCGGGACCGAAAA (45, 68)

My script failed to retrieve the other 4 matched sequences:

CAATTGTAGTATATAAAAAA; AAAAAGGGAGTAACCGAAAA; AGGGAGTAACCGAAAACGGT; GGGAGTAACCGAAAACGGTC;

How can I modify my script to get all the reasonable ones?

sequence • 630 views
ADD COMMENTlink modified 2.2 years ago by John12k • written 2.2 years ago by jinkuozhang30
2
gravatar for John
2.2 years ago by
John12k
Germany
John12k wrote:

Regex's by default in python are non-overlapping. You have to use the lookahead operator ?=

import re
example_seq = 'ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGG'
pattern = re.compile(r'(?=((.{21})GG))')
all_matched_seq = pattern.finditer(example_seq)
for record in all_matched_seq:
    print(record.group(1), end="\t")
    print(record.span())
ACAATTGTAGTATATAAAAAAGG (13, 13)
CAATTGTAGTATATAAAAAAGGG (14, 14)
AAAAAGGGAGTAACCGAAAACGG (29, 29)
AGGGAGTAACCGAAAACGGTCGG (33, 33)
GGGAGTAACCGAAAACGGTCGGG (34, 34)
AAAACGGTCGGGACCGAAAACGG (45, 45)
  
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by John12k
1

This precisely answered my question. John, Thanks!

ADD REPLYlink written 2.2 years ago by jinkuozhang30
1

If this answered your question it's appropriate to mark this answer as "accepted".

ADD REPLYlink written 2.2 years ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1105 users visited in the last hour