Counting Repeat Sequence
5
1
Entering edit mode
10.4 years ago
Takeo ▴ 10

Hello-

I want to count repeats in DNA sequnences using Phython

ex ) "AGTCATCATGTGTAAGCGTAGCATCATCATCATCATCATCATCATCATCATCCGTGAGTCAGAGAT"

1. How many time repeat 'CAT'

2. if 'CAT' < 4(times of repeat) = 'boy' , CAT >= 5 = 'girl' (just example!! :-)

Finally, i hope see that

" your 'CAT' repeat is 6,

      so, you are a girl!!!


repeats sequence python • 7.3k views
0
Entering edit mode

Phython? No such language, so far as I know :)

0
Entering edit mode

@Takeo, welcome to Biostars.org

7
Entering edit mode
10.4 years ago
fransua ▴ 390

perhaps a fastest way:

s="AGTCATCATGTGTAAGCGTAG*CATCATCATCATCATCATCATCATCATCAT*CCGTGAGTCAGAGA"
print 'girl' if s.count('CAT') > 4 else 'boy'


EDIT: in order to fin specifically repeats:

import re
print 'girl' if len (re.findall('((?<=CAT)CAT)', s)) > 4 else 'boy'

0
Entering edit mode

+1 Wow, this is the best.

0
Entering edit mode

This counts all occurrences not just repeat occurrences.

0
Entering edit mode

@Farhat @Aleksandr Levchuk this is true, and than your solution is good... I also edited my post in order to give an other solution. thanks

0
Entering edit mode

Great! That's a better way.

0
Entering edit mode

@fransua Thank you so much!!! and i have a one question!! if i will add some options, how i can do? ex) if CAT repeat < 4 times ---- boy if CAT repeat > 4 times ---- gilr if CAT repeat > 5 times ---- blue eye gilr if CAT repeat > 6 times ---- black eye gilr if CAT repeat > 7 times ---- brown eye gilr (ALSO, JUST EXAMPLE!!!!)

5
Entering edit mode
10.4 years ago
Farhat ★ 2.9k

Regular expressions would be ideal for dealing with this.

import re

patt='(CAT)+'

string='asdsaCATCATCATsdaCATasa'

p=re.compile(patt)
replen=[sp.end()-sp.start() for sp in p.finditer(string)]

print max(replen)/(len(patt)-3)

4
Entering edit mode
10.4 years ago

I think the question is, "was CAT repeated 4 times in a row?". That would be useful for counting tandem repeats. Using that definition, Aleksandr's code which reports 4 for "CATGGGGGCATCATCAT" wouldn't give the correct answer. Here's quick and dirty code to get the maximum number of consecutive repeats for a string:

s="AGTCATCATGTGTAAGCGTAG*CATCATCATCATCATCATCATCATCATCAT*CCGTGAGTCAGAGA"
search = "CAT"
N = len(s)
n = len(search)
x = 0
reps = 0
last=(-1*n)-1
maxreps=0
while x > -1:
x = s.find( search, x)
print x
if x>-1:
if x==last+n:
reps += 1
if reps>maxreps:
maxreps=reps
else:
reps=1
maxreps=1
last = x
x=x+n

print maxreps # returns 10
print maxreps>4 # returns True

1
Entering edit mode
10.4 years ago

Here is one way to do it:

def count_repeats(seq):
subject = "CAT"
return len(seq.split(subject)) - 1

# Testing
assert count_repeats("CATGGGGGCATCATCAT") == 4
assert count_repeats("ACATGGGGGCATCATCATGGGGGG") == 4
assert count_repeats("ACATGGGGGCATCATCAT") == 4
assert count_repeats("CATGGGGGCATCATCATGGGGGG") == 4

if count_repeats("CATGGGGGCATCATCAT") < 4:
print "Boy"
else:
print "Girl"

0
Entering edit mode
10.4 years ago
Eric Fournier ★ 1.4k

Is there any particular reason why you want to use Python for this? If you're dealing with Repeats, RepeatMasker is the way to go. It will detect short tandem repeats like the one you have, and tell you just how many instances of the repeated element are found.