Question: Change sequence in SEQIO
0
gravatar for grant.hovhannisyan
16 months ago by
grant.hovhannisyan1.2k wrote:

Hi Biostars, I think I have a very basic question, for which however could not find an answer in web.

I need to read fasta using SeqIO, change sequence for every record, and then write new records to file using again SeqIO.

What I do is:

from Bio import SeqIO
import re
sequences=[]

for record in SeqIO.parse("test.fasta", "fasta") :
    record.seq = re.sub('[^GATC]', "", str(record.seq).upper())
    sequences.append(record)

SeqIO.write(sequences,"output.fasta","fasta")

I get multiple errors running this code:

Traceback (most recent call last):
  File "script.py", line 15, in <module>
  SeqIO.write(sequences,"output.fasta","fasta")
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
  count = writer_class(fp).write_file(sequences)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in    write_records
    self.write_record(record)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in   write_record
    data = self._get_seq_string(record)  # Catches sequence being None
   File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
    % record.id)
TypeError: SeqRecord (id=TCONS_00000001) has an invalid sequence.

I can bypass SeqIO.write method by parsing list and creating file, but I am wondering what is the problem here. Thanks

seqio biopython • 960 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by grant.hovhannisyan1.2k

In order to provide closure for this thread it may be best to include @Peter's solution as an answer below (with the link attribution above) and then accept that answer.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax58k
1
gravatar for grant.hovhannisyan
16 months ago by
grant.hovhannisyan1.2k wrote:

Answered in StackOverflow by peterjc

"Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))

For FASTA output there is no need to set the sequence's alphabet."

ADD COMMENTlink written 16 months ago by grant.hovhannisyan1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour