Question: Change sequence in SEQIO
0
gravatar for grant.hovhannisyan
11 months ago by
grant.hovhannisyan880 wrote:

Hi Biostars, I think I have a very basic question, for which however could not find an answer in web.

I need to read fasta using SeqIO, change sequence for every record, and then write new records to file using again SeqIO.

What I do is:

from Bio import SeqIO
import re
sequences=[]

for record in SeqIO.parse("test.fasta", "fasta") :
    record.seq = re.sub('[^GATC]', "", str(record.seq).upper())
    sequences.append(record)

SeqIO.write(sequences,"output.fasta","fasta")

I get multiple errors running this code:

Traceback (most recent call last):
  File "script.py", line 15, in <module>
  SeqIO.write(sequences,"output.fasta","fasta")
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
  count = writer_class(fp).write_file(sequences)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in    write_records
    self.write_record(record)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in   write_record
    data = self._get_seq_string(record)  # Catches sequence being None
   File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
    % record.id)
TypeError: SeqRecord (id=TCONS_00000001) has an invalid sequence.

I can bypass SeqIO.write method by parsing list and creating file, but I am wondering what is the problem here. Thanks

seqio biopython • 620 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by grant.hovhannisyan880

In order to provide closure for this thread it may be best to include @Peter's solution as an answer below (with the link attribution above) and then accept that answer.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax49k
1
gravatar for grant.hovhannisyan
11 months ago by
grant.hovhannisyan880 wrote:

Answered in StackOverflow by peterjc

"Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))

For FASTA output there is no need to set the sequence's alphabet."

ADD COMMENTlink written 11 months ago by grant.hovhannisyan880
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 777 users visited in the last hour