Question

AttributeError: 'list' object has no attribute 'SeqRecord' - Slice multiple sequences with Biopython>SeqIO from fasta file

0

Entering edit mode

4.2 years ago

coyot001 ▴ 10

I am trying to generate varying length N and C termini Slices (1,2,3,4,5,6,7). But before I get there I am having problems just reading in my fasta files. I was following the 'Random subsequences' head tutorial from:https://biopython.org/wiki/SeqIO . But in this case there is only one sequence so maybe that is where I went wrong. The code with example sequences and my errors. Any help would be much appreciated. I am clearly out of my depth. Thanks!

Two example sequences in my file domains.fasta:

>GA98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE
>GB98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE

my code that is not working:

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord


# Load data:
domains = list(SeqIO.parse("domains.fa",'fasta'))

#set up receiving arrays
home=[]
num=1

#slice data
for i in range(0, 6):
    num = num+1
    domain = domains
    seq_n = domains.seq[0:num]
    seq_c = domains.seq[len(domain)-num:len(domain)]
    name = domains.id
    record_d = SeqRecord(domain,'%s' % (name), '', '')
    home.append(record_d)
    record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
    home.append(record_n)
    record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
    home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")

error I get is:

Traceback (most recent call last):
  File "~/fasta_nc_sequences.py", line 20, in <module>
    seq_n = domains.seq[0:num]
AttributeError: 'list' object has no attribute 'SeqRecord'

When I print out 'domains = list(SeqIO.parse("domains.fa",'fasta'))' I get this:

[SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE', SingleLetterAlphabet()), id='GA98', name='GA98', description='GA98', dbxrefs=[]), SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE', SingleLetterAlphabet()), id='GB98', name='GB98', description='GB98', dbxrefs=[])]

I am not sure why I cannot access what is within the SeqRecord. Maybe it is because I wrapped the SeqIO.parse in a list because before I was being thrown a different error:

AttributeError: 'generator' object has no attribute 'seq'

python biopython sequence SeqIO • 4.5k views

ADD COMMENT • link 4.2 years ago by coyot001 ▴ 10

0

Entering edit mode

Hello coyot001,

this gives you a list of sequences:

domains = list(SeqIO.parse("domains.fa",'fasta'))

Later you try this

seq_n = domains.seq[0:num]

You have to define which element of the list you want to access, e.g.

seq_n = domains[0].seq[0:num]

By the way:

domain = domains

Why do you copy domains to domain and never use it later?

fin swimmer

ADD REPLY • link 4.2 years ago by finswimmer 16k

0

Entering edit mode

I was trying to run slice all the sequences in the list. Do I have to iterate through them in an additional for loop? For some reason I was under the impression that SeqIO.parse() would handle them...

domain I call later in: record_d = SeqRecord(domain,'%s' % (name), '', '') So that I can keep a copy of the complete domains as well as the sliced sequences.

ADD REPLY • link 4.2 years ago by coyot001 ▴ 10

1

Entering edit mode

Leiven is correct, you're one level too 'high' in your list.

You can't slice all elements of a list in one go like that (you might be able to hack something with map() and some of the hidden methods for the object, but thats not a good way to go.

You have a few options:

Use SeqIO in a loop:

for record in SeqIO.parse(...):
    for i in range(0,6):
          # do slicing

Use another loop over your list of domains (which is functionally equivalent to the above, but can be done range-sequence instead of sequence-range (which is better, I don't know, but I suspect the former). This will be slower than 1, though probably negligibly so.

Use list comprehensions. This is a bit faster and can lead to more compact code but they aren't the easiest if you're new to python.

Fundamentally however, all of the above are just extra layers of loops. I'd go with option 1 personally.

ADD REPLY • link 4.2 years ago by Joe 21k

1

Entering edit mode

Thanks, that helped a lot. I had to debug a little bit afterwards but it was straightforward. I attached the working version below as an answer.

ADD REPLY • link 4.2 years ago by coyot001 ▴ 10

score 0 · Answer 1 · 2020-02-10

Working code. I was not iterating through lines.

# Load data:
domains = list(SeqIO.parse("examples/data/domains.fa",'fasta'))
print(domains)
#set up receiving arrays

home=[]
#num=1
#subset data
for record in (domains):
    num = 0
    domain = record.seq
    name = record.id
    record_d = SeqRecord(domain,'%s' % (name), '', '')
    home.append(record_d)
    for i in range(0, 6):
        num= num+1
        seq_n = record.seq[0:num]
        seq_c = record.seq[len(record.seq)-num:len(record.seq)]
        record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
        home.append(record_n)
        record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
        home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")