Biopython SeqIO: AttributeError: 'str' object has no attribute 'id'
1
0
Entering edit mode
7 weeks ago
mdgn ▴ 10

I am trying to filter out sequences using SeqIO but I am getting this error.

Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
.
.
.
SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'


I checked other similar questions but still couldn't understand what is wrong with my script.

Here is the relevant part of the script I am trying:

fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
desired_proteins=seq
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()


I have a separate paralagos_in_all list and that is the ID source. When print name it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1@10.

Can you help me understand my problem? Thanks in advance.

filtering biopython fasta python sequence • 465 views
2
Entering edit mode

Bio.SeqIO.write() is expecting SeqRecord. Instead of desired_proteins, you can do SeqIO.write(s_record, ...).

Note: filtered.fasta will only have the last s_record in lineageV_paralog_warning_genes.fasta that is found in paralogs_in_all because filtered.fasta will be overwritten during the loop.

0
Entering edit mode

Thank you I understand it now. You were right about overwriting as well. After fixing that script worked smoothly!

1
Entering edit mode

I guess you need to write s_record instead of parsed sequence i.e. desired_proteins. Something like this?

.........
output_file=SeqIO.write(s_record, "filtered.fasta","fast")
.........

0
Entering edit mode

desired_proteins=seq you are assigning only sequence here.

2
Entering edit mode
6 weeks ago
Shred ▴ 440

You're passing the wrong object to SeqIO.write: it expects a SeqRecord, and instead it gets a Seq object. documentation

Output file gets overwrited at every IF .. TRUE condition. I could suggest to add ID-seq as key:value of a dictionary: when looping through records ends, you could write to file every sequence you've stored before. Something like:

fh=open('lineageV_paralog_warning_genes.fasta')
paralog = {}
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
paralog[name] = str(seq)
fh.close()

with open('filtered.fasta', 'w') as oput_file:
for key in paralog.keys():
oput_file.write(key +'\n' + paralog[key])

0
Entering edit mode

Thank you for the detailed correction. As above suggestions, both SeqRecord with a for loop for overwriting and your dictionary approached solved my issue. This was the first time I used SeqIO, now I got it!

Just for sake of mentioning I changed the last line and its a bit more tidy now:)

oput_file.write('>' + key +'\n' + paralog[key] +'\n')


Lastly, and just out of curiosity, the order of the sequences are reversed in the dictionary. Can you explain the reason of this?

1
Entering edit mode

Dictionaries in Python are not ordered if you're using a version < 3.6. If you're interested in preserving sequences order, consider updating your Python version or use a OrderedDict