Python: Slicing Sequences In Fasta File
1
0
Entering edit mode
10.7 years ago

I want to slice sequences of fasta file,I take the first three sequences( I must calculate the length of each sequence), for example: I have this three sequences I want to divide each sequences on sub-sequences have the same length.

ie:length of the first is 28 , the second is 39 , and the third is 46 I divide each sequence on 9 28/9=3 the rest is 1 so the last sub-sequence contain one base 'G' in this cases I must add this character '-', 39/9=4 ( do the same thing as the first sequence),46/9=5(the same )

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTG

>gi|2765657|emb|Z78532.1|CCZ78532 C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATC


>gi|2765646|emb|Z78521.1|CCZ78521 C.calceolus 5.8S rRNA gene and ITS1 and ITS2 DNA
GTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAGTAGAATATAT

then, I take three sub-sequences from each sequences

CGTAACAAG GTTTCCGTA GGTGAACCT 
CGTAACAAG GTTTCCGTA GGTGAACCT
GTAGGTGAA CCTGCGGAA GGATCATTG

then, I apply some function on each sub-group :

function1('CGTAACAAG'), function1('GTTTCCGTA'), ...

The same thing with

function2

I want to apply this on all sequences in fasta file, it means each time I take three sequences.

what can I do?

python sequence • 3.2k views
ADD COMMENT
0
Entering edit mode
10.7 years ago

from you example it is unclear what is the role of the - character is, your output does not seem to show these. The string splitting at fixed size could be done like so

step = 9
seq = "CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATC"

parts = []
for i in range(len(seq)/step): 
    sub = seq[i * step: (i + 1) * step]
    parts.append(sub)

print parts

would print

['CGTAACAAG', 'GTTTCCGTA', 'GGTGAACCT', 'GCGGAAGGA']
ADD COMMENT

Login before adding your answer.

Traffic: 4027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6