Question: Change sequence in SEQIO
gravatar for grant.hovhannisyan
2.4 years ago by
grant.hovhannisyan1.8k wrote:

Hi Biostars, I think I have a very basic question, for which however could not find an answer in web.

I need to read fasta using SeqIO, change sequence for every record, and then write new records to file using again SeqIO.

What I do is:

from Bio import SeqIO
import re

for record in SeqIO.parse("test.fasta", "fasta") :
    record.seq = re.sub('[^GATC]', "", str(record.seq).upper())


I get multiple errors running this code:

Traceback (most recent call last):
  File "", line 15, in <module>
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/", line 481, in write
  count = writer_class(fp).write_file(sequences)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/", line 194, in    write_records
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/", line 202, in   write_record
    data = self._get_seq_string(record)  # Catches sequence being None
   File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/", line 100, in _get_seq_string
TypeError: SeqRecord (id=TCONS_00000001) has an invalid sequence.

I can bypass SeqIO.write method by parsing list and creating file, but I am wondering what is the problem here. Thanks

seqio biopython • 1.7k views
ADD COMMENTlink modified 2.3 years ago • written 2.4 years ago by grant.hovhannisyan1.8k

In order to provide closure for this thread it may be best to include @Peter's solution as an answer below (with the link attribution above) and then accept that answer.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax74k
gravatar for grant.hovhannisyan
2.3 years ago by
grant.hovhannisyan1.8k wrote:

Answered in StackOverflow by peterjc

"Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))

For FASTA output there is no need to set the sequence's alphabet."

ADD COMMENTlink written 2.3 years ago by grant.hovhannisyan1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 904 users visited in the last hour