Change sequence in SEQIO
1
0
Entering edit mode
4.3 years ago

Hi Biostars, I think I have a very basic question, for which however could not find an answer in web.

I need to read fasta using SeqIO, change sequence for every record, and then write new records to file using again SeqIO.

What I do is:

from Bio import SeqIO
import re
sequences=[]

for record in SeqIO.parse("test.fasta", "fasta") :
record.seq = re.sub('[^GATC]', "", str(record.seq).upper())
sequences.append(record)

SeqIO.write(sequences,"output.fasta","fasta")


I get multiple errors running this code:

Traceback (most recent call last):
File "script.py", line 15, in <module>
SeqIO.write(sequences,"output.fasta","fasta")
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
count = writer_class(fp).write_file(sequences)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in    write_records
self.write_record(record)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in   write_record
data = self._get_seq_string(record)  # Catches sequence being None
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
% record.id)
TypeError: SeqRecord (id=TCONS_00000001) has an invalid sequence.


I can bypass SeqIO.write method by parsing list and creating file, but I am wondering what is the problem here. Thanks

SeqIO Biopython • 3.4k views
0
Entering edit mode

In order to provide closure for this thread it may be best to include @Peter's solution as an answer below (with the link attribution above) and then accept that answer.

2
Entering edit mode
4.3 years ago

"Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))


For FASTA output there is no need to set the sequence's alphabet."