Question: Update Multiple Sequences From Fasta File And Write Them As A New Fasta In Biopython
1
gravatar for Abdullah
8.2 years ago by
Abdullah100
Germany
Abdullah100 wrote:

Hi,

I have a set of sequences in a fasta file, i read them into biopython and then i want to make some changes to them (adding or replacing or deleting) and then i want to save them to a new fastafile.

what i'm doing in the example is a small shifting of 5 nucleotides :

>     allrecords=list(SeqIO.parse("test.fasta", 'fasta'))
>             fullseq=0
>             tochangeseq =0 
>              oldfullseq=0     
> 
> for y in range(len(allrecords)):
> 
> 
>           oldfullseq=allrecords[y].seq
>             tochange=5        #might be dynamically included by another for loop
>             tochangeseq = allrecords[y].seq[0:tochange]
>             fullseq = oldfullseq[tochange:len(oldfullseq)]
>             fullseq += tochangeseq
>             allrecords[y].seq = fullseq
> print allrecords[0].seq    ### it should print the new updated sequence, but it's not

but the problem is when i print the sequences after the loop, i found that they did not changed, how can i update them and then write the new version to a fasta file.

help would be appreciated

biopython • 3.2k views
ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 8.2 years ago by Abdullah100
2

The code for the sequence manipulation is a bit convoluted. You can just do this:

newSeq = allrecords[y].seq[tochange:] + allrecords[y].seq[:tochange]

allrecords[y].seq = newSeq

But I don't see anything wrong with it. It looks like you deleted a lot of the code. Can you post the full code?

ADD REPLYlink modified 8.2 years ago • written 8.2 years ago by Damian Kao15k

Hi, when i try your method, i get Memory error .. besides .. that's the whole code .. i just want to update the sequences and output the new ones to a new fasta file .. but it's not working

ADD REPLYlink written 8.2 years ago by Abdullah100

Just do this then:

tochange = 5
for record in SeqIO.parse('test.fasta','fasta')):
   seq = str(record.seq)
   print '>' + record.description
   print seq[tochange:] + seq[:tochange]

pipe the results to a new file.

ADD REPLYlink modified 8.2 years ago • written 8.2 years ago by Damian Kao15k

i think this not a good idea to do this .. i want to fill the new sequences inside an array because i might need them for other computational stuff ...the question is why my method is not working, if i print full seq it prints the new sequence but it's not being assigned to the record object ..

ADD REPLYlink written 8.2 years ago by Abdullah100
2

Honestly I don't really know why it doesn't work for you. That's why I thought maybe you didn't post the full code. I tried this:

from Bio import SeqIO

allRecords = list(SeqIO.parse('test.fa','fasta'))

tochange = 5
for i in range(len(allRecords)):
    newSeq = allRecords[i].seq[tochange:] + '.' + allRecords[i].seq[:tochange]
    allRecords[i].seq = newSeq

print allRecords[0].seq
print allRecords[1].seq

On this test.fa:

>test1
AGTGCTAGCTAGCTATTATATGCGACTGATCGTACTGGTAGCTGGTCA
>test2
ACGATCGTAGTCTTGATGCTAGTTTATATGGTCGTAGTCAGTCGTACA

And it worked for me. Giving me this:

TAGCTAGCTATTATATGCGACTGATCGTACTGGTAGCTGGTCA.AGTGC
CGTAGTCTTGATGCTAGTTTATATGGTCGTAGTCAGTCGTACA.ACGAT
ADD REPLYlink modified 8.2 years ago • written 8.2 years ago by Damian Kao15k

It's not a good idea to work on large sequences (which I assume you are doing) and store them all in-memory. This is why Damian's previously suggested method resulted in a memory error for you.

On another note, I also tried your posted code with a smaller mock fasta file, it works just fine. It prints the new sequences, and when I inspected the SeqRecord objects, their sequences have changed as well. Have you posted the entire code?

ADD REPLYlink written 8.2 years ago by bow790
1

Hi. I don't see the problem here. Your code works for me...

>>> allrecords[0].seq

Seq('GGAACGAGTTATCTGAACCTTCGGGGGACGATAACGGCGTCGAGCGGCGGACGG...AGC', SingleLetterAlphabet())

>>>for y in range(len(allrecords)):
    oldfullseq=allrecords[y].seq
    tochange=5    
    tochangeseq = allrecords[y].seq[0:tochange]
    fullseq = oldfullseq[tochange:len(oldfullseq)]
    fullseq += tochangeseq
    allrecords[y].seq = fullseq

.

>>>allrecords[0].seq

Seq('GAGTTATCTGAACCTTCGGGGGACGATAACGGCGTCGAGCGGCGGACGGGTGAG...AAC',SingleLetterAlphabet())

ADD REPLYlink modified 8.2 years ago • written 8.2 years ago by Manu Prestat4.0k

Hi, thank you all for your help, i found the error .. it's working now .. and your codes also works ..

ADD REPLYlink written 8.2 years ago by Abdullah100

You forgot to open the 'test.fasta' file:) Try: allrecords = list(SeqIO.parse(open("test.fasta"), "fasta"))

ADD REPLYlink modified 8.2 years ago • written 8.2 years ago by a.zielezinski9.4k

No, in modern Biopython, SeqIO.parse can take a filename as a string or a file handle

ADD REPLYlink written 8.2 years ago by Ben2.0k

good to know. thanks

ADD REPLYlink written 8.2 years ago by a.zielezinski9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1037 users visited in the last hour
_