AttributeError: 'list' object has no attribute 'SeqRecord' - Slice multiple sequences with Biopython>SeqIO from fasta file
1
0
Entering edit mode
2.8 years ago
coyot001 ▴ 10

I am trying to generate varying length N and C termini Slices (1,2,3,4,5,6,7). But before I get there I am having problems just reading in my fasta files. I was following the 'Random subsequences' head tutorial from:https://biopython.org/wiki/SeqIO . But in this case there is only one sequence so maybe that is where I went wrong. The code with example sequences and my errors. Any help would be much appreciated. I am clearly out of my depth. Thanks!

Two example sequences in my file domains.fasta:

>GA98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE
>GB98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE


my code that is not working:

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

domains = list(SeqIO.parse("domains.fa",'fasta'))

#set up receiving arrays
home=[]
num=1

#slice data
for i in range(0, 6):
num = num+1
domain = domains
seq_n = domains.seq[0:num]
seq_c = domains.seq[len(domain)-num:len(domain)]
name = domains.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")


error I get is:

Traceback (most recent call last):
File "~/fasta_nc_sequences.py", line 20, in <module>
seq_n = domains.seq[0:num]
AttributeError: 'list' object has no attribute 'SeqRecord'


When I print out 'domains = list(SeqIO.parse("domains.fa",'fasta'))' I get this:

[SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE', SingleLetterAlphabet()), id='GA98', name='GA98', description='GA98', dbxrefs=[]), SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE', SingleLetterAlphabet()), id='GB98', name='GB98', description='GB98', dbxrefs=[])]


I am not sure why I cannot access what is within the SeqRecord. Maybe it is because I wrapped the SeqIO.parse in a list because before I was being thrown a different error:

AttributeError: 'generator' object has no attribute 'seq'

python biopython sequence SeqIO • 2.8k views
0
Entering edit mode

Hello coyot001,

this gives you a list of sequences:

domains = list(SeqIO.parse("domains.fa",'fasta'))


Later you try this

seq_n = domains.seq[0:num]


You have to define which element of the list you want to access, e.g.

seq_n = domains[0].seq[0:num]


By the way:

domain = domains


Why do you copy domains to domain and never use it later?

fin swimmer

0
Entering edit mode

I was trying to run slice all the sequences in the list. Do I have to iterate through them in an additional for loop? For some reason I was under the impression that SeqIO.parse() would handle them...

domain I call later in: record_d = SeqRecord(domain,'%s' % (name), '', '') So that I can keep a copy of the complete domains as well as the sliced sequences.

1
Entering edit mode

Leiven is correct, you're one level too 'high' in your list.

You can't slice all elements of a list in one go like that (you might be able to hack something with map() and some of the hidden methods for the object, but thats not a good way to go.

You have a few options:

Use SeqIO in a loop:

for record in SeqIO.parse(...):
for i in range(0,6):
# do slicing


Use another loop over your list of domains (which is functionally equivalent to the above, but can be done range-sequence instead of sequence-range (which is better, I don't know, but I suspect the former). This will be slower than 1, though probably negligibly so.

Use list comprehensions. This is a bit faster and can lead to more compact code but they aren't the easiest if you're new to python.

Fundamentally however, all of the above are just extra layers of loops. I'd go with option 1 personally.

1
Entering edit mode

Thanks, that helped a lot. I had to debug a little bit afterwards but it was straightforward. I attached the working version below as an answer.

0
Entering edit mode
2.8 years ago
coyot001 ▴ 10

Working code. I was not iterating through lines.

# Load data:
domains = list(SeqIO.parse("examples/data/domains.fa",'fasta'))
print(domains)
#set up receiving arrays

home=[]
#num=1
#subset data
for record in (domains):
num = 0
domain = record.seq
name = record.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
for i in range(0, 6):
num= num+1
seq_n = record.seq[0:num]
seq_c = record.seq[len(record.seq)-num:len(record.seq)]
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")