Biopython SeqIO: AttributeError: 'str' object has no attribute 'id'
1
0
Entering edit mode
2.7 years ago
mdgn ▴ 10

I am trying to filter out sequences using SeqIO but I am getting this error.

Traceback (most recent call last):
  File "paralog_warning_filter.py", line 61, in <module>
.
.
.
    SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'

I checked other similar questions but still couldn't understand what is wrong with my script.

Here is the relevant part of the script I am trying:

fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
    name = s_record.id
    seq = s_record.seq
    for i in paralogs_in_all:
        if name.endswith(i):
            desired_proteins=seq
            output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()

I have a separate paralagos_in_all list and that is the ID source. When print name it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1@10.

Can you help me understand my problem? Thanks in advance.

filtering biopython fasta python sequence • 3.8k views
ADD COMMENT
2
Entering edit mode

Bio.SeqIO.write() is expecting SeqRecord. Instead of desired_proteins, you can do SeqIO.write(s_record, ...).

Note: filtered.fasta will only have the last s_record in lineageV_paralog_warning_genes.fasta that is found in paralogs_in_all because filtered.fasta will be overwritten during the loop.

ADD REPLY
0
Entering edit mode

Thank you I understand it now. You were right about overwriting as well. After fixing that script worked smoothly!

ADD REPLY
1
Entering edit mode

I guess you need to write s_record instead of parsed sequence i.e. desired_proteins. Something like this?

.........
output_file=SeqIO.write(s_record, "filtered.fasta","fast")
.........
ADD REPLY
0
Entering edit mode

desired_proteins=seq you are assigning only sequence here.

ADD REPLY
2
Entering edit mode
2.7 years ago
Shred ★ 1.4k

You're passing the wrong object to SeqIO.write: it expects a SeqRecord, and instead it gets a Seq object. documentation

Output file gets overwrited at every IF .. TRUE condition. I could suggest to add ID-seq as key:value of a dictionary: when looping through records ends, you could write to file every sequence you've stored before. Something like:

fh=open('lineageV_paralog_warning_genes.fasta')
paralog = {}
for s_record in SeqIO.parse(fh,'fasta'):
    name = s_record.id
    seq = s_record.seq
    for i in paralogs_in_all:
        if name.endswith(i):
          paralog[name] = str(seq)
fh.close()

with open('filtered.fasta', 'w') as oput_file:
  for key in paralog.keys():
    oput_file.write(key +'\n' + paralog[key])
ADD COMMENT
0
Entering edit mode

Thank you for the detailed correction. As above suggestions, both SeqRecord with a for loop for overwriting and your dictionary approached solved my issue. This was the first time I used SeqIO, now I got it!

Just for sake of mentioning I changed the last line and its a bit more tidy now:)

oput_file.write('>' + key +'\n' + paralog[key] +'\n')

Lastly, and just out of curiosity, the order of the sequences are reversed in the dictionary. Can you explain the reason of this?

ADD REPLY
1
Entering edit mode

Dictionaries in Python are not ordered if you're using a version < 3.6. If you're interested in preserving sequences order, consider updating your Python version or use a OrderedDict

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6