I'm trying to write a python script that uses a sliding window. Here is the code:
v = open("ex.fasta", "r") def sliding_window(sequence, winSize, step): numOfChunks = ((len(sequence)-winSize)/step)+1 for i in range(0,numOfChunks*step,step): yield sequence[i:i+winSize] size = int(14786) w = 500 while size > w: for line in v: if not line.startswith(">"): myseq = line.rstrip() myvect = sliding_window(myseq, 500, 500) for r in myvect: print(r)
I want it to be able to produce chunks of the sequence in window sizes of 500, with a step size of 500, i.e. no overlap. However, the trouble I'm having is the lines for the fasta file are 76bp long for all lines except the last is ~40bp. Choosing anything <= 76 it will produce the desired outcome. Anything > 76 does not work. I've tried creating a string (using the whole sequence rather than a list, but it still does not work. Any help is appreciated.