python function to kmer with windown size
2
0
Entering edit mode
4.3 years ago
flogin ▴ 280

Hello guys, I was reading about build-in functions in python to work with kmer, and I found this one:

mySeq = 'AAATTAAAGACAAAATCCCAGAATGCCCG'

def getKmers(sequence, size):
    return [sequence[x:x+size].upper() for x in range(len(sequence) - size + 1)]

Which returns:

['AAATTA', 'AATTAA', 'ATTAAA', 'TTAAAG', 'TAAAGA', 'AAAGAC', 'AAGACA', 'AGACAA', 'GACAAA', 'ACAAAA', 'CAAAAT', 'AAAATC', 'AAATCC', 'AATCCC', 'ATCCCA', 'TCCCAG', 'CCCAGA', 'CCAGAA', 'CAGAAT', 'AGAATG', 'GAATGC', 'AATGCC', 'ATGCCC', 'TGCCCG']

As we can see, the kmers are created in a windown range equals to 1, I'm thinking how I can define a windown range major than 1, for example, 3, to generate kmers in that form:

['AAATTA', 'TTAAAG', 'AAGACA', 'ACAAAA', 'AAATCC'...']

Can anyone help?

python kmer biopython fasta • 3.1k views
ADD COMMENT
3
Entering edit mode
4.3 years ago
cschu181 ★ 2.8k
def getKmers(sequence, size, step):    
  return [sequence[x:x+size] for x in range(0, len(sequence) - size, step)]

You should probably write it as a generator, though:

def getKmers(sequence, size, step):    
  for x in range(0, len(sequence) - size, step):
    yield sequence[x:x+size]
ADD COMMENT
0
Entering edit mode

thanks cschu181, it's exactly it !

ADD REPLY
0
Entering edit mode
14 months ago

Consider using this fast parser:

https://github.com/moorembioinfo/KmerAperture/tree/main/parser

ADD COMMENT

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6