Counting Repeat Sequence
5
1
Entering edit mode
12.7 years ago
Takeo ▴ 10

Hello-

I want to count repeats in DNA sequnences using Phython

ex ) "AGTCATCATGTGTAAGCGTAGCATCATCATCATCATCATCATCATCATCATCCGTGAGTCAGAGAT"

  1. How many time repeat 'CAT'

  2. if 'CAT' < 4(times of repeat) = 'boy' , CAT >= 5 = 'girl' (just example!! :-)

Finally, i hope see that

" your 'CAT' repeat is 6,

      so, you are a girl!!!

please tell adn help me how to make this souse...

repeats sequence python • 9.4k views
ADD COMMENT
0
Entering edit mode

Phython? No such language, so far as I know :)

ADD REPLY
0
Entering edit mode

@Takeo, welcome to Biostars.org

ADD REPLY
7
Entering edit mode
12.7 years ago
fransua ▴ 390

perhaps a fastest way:

s="AGTCATCATGTGTAAGCGTAG*CATCATCATCATCATCATCATCATCATCAT*CCGTGAGTCAGAGA"
print 'girl' if s.count('CAT') > 4 else 'boy'

EDIT: in order to fin specifically repeats:

import re
print 'girl' if len (re.findall('((?<=CAT)CAT)', s)) > 4 else 'boy'
ADD COMMENT
0
Entering edit mode

+1 Wow, this is the best.

ADD REPLY
0
Entering edit mode

This counts all occurrences not just repeat occurrences.

ADD REPLY
0
Entering edit mode

@Farhat @Aleksandr Levchuk this is true, and than your solution is good... I also edited my post in order to give an other solution. thanks

ADD REPLY
0
Entering edit mode

Great! That's a better way.

ADD REPLY
0
Entering edit mode

@fransua Thank you so much!!! and i have a one question!! if i will add some options, how i can do? ex) if CAT repeat < 4 times ---- boy if CAT repeat > 4 times ---- gilr if CAT repeat > 5 times ---- blue eye gilr if CAT repeat > 6 times ---- black eye gilr if CAT repeat > 7 times ---- brown eye gilr (ALSO, JUST EXAMPLE!!!!)

ADD REPLY
5
Entering edit mode
12.7 years ago
Farhat ★ 2.9k

Regular expressions would be ideal for dealing with this.

import re

patt='(CAT)+'

string='asdsaCATCATCATsdaCATasa'

p=re.compile(patt)
replen=[sp.end()-sp.start() for sp in p.finditer(string)]

print max(replen)/(len(patt)-3)
ADD COMMENT
4
Entering edit mode
12.7 years ago

I think the question is, "was CAT repeated 4 times in a row?". That would be useful for counting tandem repeats. Using that definition, Aleksandr's code which reports 4 for "CATGGGGGCATCATCAT" wouldn't give the correct answer. Here's quick and dirty code to get the maximum number of consecutive repeats for a string:

s="AGTCATCATGTGTAAGCGTAG*CATCATCATCATCATCATCATCATCATCAT*CCGTGAGTCAGAGA"
search = "CAT"
N = len(s)
n = len(search)
x = 0
reps = 0
last=(-1*n)-1
maxreps=0
while x > -1:
    x = s.find( search, x)
    print x
    if x>-1:
        if x==last+n:
            reps += 1
            if reps>maxreps:
                maxreps=reps
        else:
            reps=1
            maxreps=1
        last = x
        x=x+n

print maxreps # returns 10
print maxreps>4 # returns True
ADD COMMENT
1
Entering edit mode
12.7 years ago

Here is one way to do it:

def count_repeats(seq):
  subject = "CAT"
  return len(seq.split(subject)) - 1

# Testing
assert count_repeats("CATGGGGGCATCATCAT") == 4
assert count_repeats("ACATGGGGGCATCATCATGGGGGG") == 4
assert count_repeats("ACATGGGGGCATCATCAT") == 4
assert count_repeats("CATGGGGGCATCATCATGGGGGG") == 4

if count_repeats("CATGGGGGCATCATCAT") < 4:
   print "Boy"
else:
   print "Girl"
ADD COMMENT
0
Entering edit mode
12.7 years ago
Eric Fournier ★ 1.4k

Is there any particular reason why you want to use Python for this? If you're dealing with Repeats, RepeatMasker is the way to go. It will detect short tandem repeats like the one you have, and tell you just how many instances of the repeated element are found.

ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6