python function to kmer with windown size
1
0
Entering edit mode
21 months ago
flogin ▴ 270

Hello guys, I was reading about build-in functions in python to work with kmer, and I found this one:

mySeq = 'AAATTAAAGACAAAATCCCAGAATGCCCG'

def getKmers(sequence, size):
    return [sequence[x:x+size].upper() for x in range(len(sequence) - size + 1)]

Which returns:

['AAATTA', 'AATTAA', 'ATTAAA', 'TTAAAG', 'TAAAGA', 'AAAGAC', 'AAGACA', 'AGACAA', 'GACAAA', 'ACAAAA', 'CAAAAT', 'AAAATC', 'AAATCC', 'AATCCC', 'ATCCCA', 'TCCCAG', 'CCCAGA', 'CCAGAA', 'CAGAAT', 'AGAATG', 'GAATGC', 'AATGCC', 'ATGCCC', 'TGCCCG']

As we can see, the kmers are created in a windown range equals to 1, I'm thinking how I can define a windown range major than 1, for example, 3, to generate kmers in that form:

['AAATTA', 'TTAAAG', 'AAGACA', 'ACAAAA', 'AAATCC'...']

Can anyone help?

python kmer biopython fasta • 1.2k views
ADD COMMENT
1
Entering edit mode
21 months ago
cschu181 ★ 2.6k
def getKmers(sequence, size, step):    
  return [sequence[x:x+size] for x in range(0, len(sequence) - size, step)]

You should probably write it as a generator, though:

def getKmers(sequence, size, step):    
  for x in range(0, len(sequence) - size, step):
    yield sequence[x:x+size]
ADD COMMENT
0
Entering edit mode

thanks cschu181, it's exactly it !

ADD REPLY

Login before adding your answer.

Traffic: 1037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6