Question

Rosalind exercise: Finding a Motif in DNA

2

Entering edit mode

3.8 years ago

caro-ca ▴ 20

Hi, community!

I am trying to find a motif in a DNA sequence. This is my code:

#!/usr/bin/env python3

from sys import argv
import re 

#Functions
def find_motif(seq_dna, motif):
    results = re.finditer(motif, seq_dna)
    r = []
    for result in results:
        r.append(result.span()[0] + 1)
    print(" ".join(map(str, r)))

if __name__=='__main__':
    seq_dna = argv[1]
    motif = argv[2]    
    find_motif(seq_dna, motif)

By running my code as python finding_motif.py "GATATATGCATATACTT" "ATAT", this is the stdout:

2 10

However, there is another motif in index 3 that is not counted. Could somebody help me with a way how to tackle this? The real output is:

2 4 10

Thank you for your help in advance

python rosalind • 5.9k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 3.8 years ago by caro-ca ▴ 20

1

Entering edit mode

btw., you can do print(*r) instead of the print-join-map business, which is equivalent to print(r[0], r[1],etc.).

ADD REPLY • link 3.8 years ago by user_without_id ▴ 150

0

Entering edit mode

Wow! I did not know about that. Thank you!! It worked.

ADD REPLY • link 3.8 years ago by caro-ca ▴ 20

score 2 · Answer 1 · 2020-07-18

2

Entering edit mode

3.8 years ago

hugo.avila ▴ 490

I dont know why your code doesn't work, maybe it is because "re" restarts looking after the index of the first match. Here is my solution:

s = "GATATATGCATATACTT" 

for i in range(len(s)):
    if s[i:].startswith("ATAT"):
        print(i+1)

ADD COMMENT • link 3.8 years ago by hugo.avila ▴ 490

1

Entering edit mode

Thank you! You were really helpful. It worked!

ADD REPLY • link 3.8 years ago by caro-ca ▴ 20

score 2 · Answer 2 · 2020-07-18

2

Entering edit mode

3.8 years ago

Mensur Dlakic ★ 27k

It says on re library documentation page:

re.finditer(pattern, string, flags=0)
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Your matches are overlapping, so it will find only the first of them. You should be able to solve this by looping through the string using re.search, which is described on the same page, or using re.findall as described here.

ADD COMMENT • link 3.8 years ago by Mensur Dlakic ★ 27k

1

Entering edit mode

Another option is to use the third party regex module, which is a drop-in replacement for re but supports overlapping matches natively IIRC.

ADD REPLY • link 3.8 years ago by Joe 21k

0

Entering edit mode

Thank you so much for your help; however, I could not understand what flags meant, but I don't know if changing it to another number it's possible to assign overlapped matches? I could not find that information in the documentation page.

ADD REPLY • link 3.8 years ago by caro-ca ▴ 20

1

Entering edit mode

Here is an explanation of those flags. There is no flag for overlapping matches. But the flags re.MULTILINE and re.IGNORECASE could be useful in another context.

ADD REPLY • link 3.8 years ago by user_without_id ▴ 150

0

Entering edit mode

Thank you so much, It was really helpful.

ADD REPLY • link 3.8 years ago by caro-ca ▴ 20

score 1 · Answer 3 · 2020-07-19

1

Entering edit mode

3.8 years ago

user_without_id ▴ 150

This is only a minor modification of your code using the lookahead from this answer.

import re

OFFSET = 1

def find_motif(seq_dna, motif):
    return list((m.start() + OFFSET) for m in re.finditer("(?=" + motif + ")", seq_dna))

print(find_motif("GATATATGCATATACTT", "ATAT"))

Prints: [2, 4, 10]

ADD COMMENT • link 3.8 years ago by user_without_id ▴ 150

0

Entering edit mode

start() and ?= were new to me. Great comment! Thank you

ADD REPLY • link 3.8 years ago by caro-ca ▴ 20

0

Entering edit mode

How can I do if I need the result without the [] and the ,. I need the results like this: 2 4 10 how can I do that?

ADD REPLY • link 3.1 years ago by carmengozalbo200140 • 0

0

Entering edit mode

[ ] means its a list in python. You need to access a specific entry of the list. You can do that in a number of different ways.

https://www.programiz.com/python-programming/list

ADD REPLY • link 3.1 years ago by Joe 21k