Question

reading a multi FASTA slicing sequences and outputing several different files

0

Entering edit mode

8.0 years ago

djd17 • 0

Hello I am running a simple script with bio python that slices up a nucleotide sequence into chunks of 40

x=range(100,500, 100)

sequence='AAAGGG...some sequence'

export_file=open('_SLQuery.fasta', 'w')

for i in x:
    name='query%d' % i
    query=sequence[i:i+40]
    export_file.write('>'+str(name)+'\n'+str(query)+'\n\n')

export_file.close()

Currently I have to manually enter the sequence from a multi fasta file (500+ records) 1 by 1 and specify a new output file name each time. Any ideas how I could get this script to work by importing the the whole fasta file at once and exporting each query to a different output file(23_SLQuery.fasta, 74_SLQuery.fasta etc....where 23,74 are the record id's). I have tried SeqIO.parse but it still only calls one sequence. I could not figure it out using SeqIO.index any help would be appreciated thanks

sequence bio python • 2.5k views

ADD COMMENT • link updated 8.0 years ago by natasha.sernova ★ 4.0k • written 8.0 years ago by djd17 • 0

score 0 · Answer 1 · 2016-05-07

0

Entering edit mode

8.0 years ago

natasha.sernova ★ 4.0k

See the following posts:

How To Split One Big Sequence File Into Multiple Files With Less Than 1000 Sequences In A Single File

How To Split A Multiple Fasta There are several python solutions here:

how to convert a long fasta-file into many separate single fasta sequences

ADD COMMENT • link 8.0 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

thanks but I am not trying to split the original fasta file. I want to read a multi fasta file (contig 1, 2, ...501..etc) for each seq in the file run the script to slice the sequences into 40 nt chunks from 100 upstream to 500 upstream and output those query seq into a seprate Fasta file for each seq in the original fasta. I apologize if I am missing something in your answer.