Question: Code for splitting long string into Genbank record gives error
gravatar for toth.joe
19 months ago by
toth.joe20 wrote:

I am trying to read a long amino acid string into a Biopython Genbank record object so that a Genbank file can be written. Here is a truncated example

FEATURES Location/Qualifiers
CDS 687..3158

My code reads a csv file with the data I want put into the Genbank file

        feature = Feature()
        feature.key = "CDS"
        feature.location = "1..{}".format(len(row['DNA']))
        feature.qualifiers = ["/translation=", "{}".format(row['Seq'])]
        with open(row['FullCloneName'] + '.gb', 'w') as output_file:

However, I get this error:

File "/usr/local/lib/python2.7/dist-packages/Bio/GenBank/", line 631, in __str__
if no_space_key in qualifier.key:
AttributeError: 'str' object has no attribute 'key'

Can someone explain how the Feature method from Genbank Record source takes information from the Qualifier method? My input string has no breaks. It is a long 120 character string. How does this method break up the long string to format it for the Genbank file? Do I need to break up the string with a split character ','?

class Feature(object):
604 """Hold information about a Feature in the Feature Table of GenBank record.
606 Attributes:
607 - key - The key name of the feature (ie. source)
608 - location - The string specifying the location of the feature.
609 - qualifiers - A listing Qualifier objects in the feature.
611 """
613 - def __init__(self):
614 """Initialize."""
615 self.key = ''
616 self.location = ''
617 self.qualifiers = []
619 - def __str__(self):
620 """Return feature as a GenBank format string."""
621 output = Record.INTERNAL_FEATURE_FORMAT % self.key
622 output += _wrapped_genbank(self.location, Record.GB_FEATURE_INDENT,
623 split_char=',')
624 for qualifier in self.qualifiers:
625 output += " " * Record.GB_FEATURE_INDENT
627 # determine whether we can wrap on spaces
628 space_wrap = 1
629 for no_space_key in \
630 Bio.GenBank._BaseGenBankConsumer.remove_space_keys:
631 if no_space_key in qualifier.key:
632 space_wrap = 0
634 output += _wrapped_genbank(qualifier.key + qualifier.value,
635 Record.GB_FEATURE_INDENT, space_wrap)
636 return output
639 -class Qualifier(object):
640 """Hold information about a qualifier in a GenBank feature.
642 Attributes:
643 - key - The key name of the qualifier (ie. /organism=)
644 - value - The value of the qualifier ("Dictyostelium discoideum").
646 """
648 - def __init__(self):
649 """Initialize."""
650 self.key = ''
651 self.value = ''

genbank biopython • 564 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by toth.joe20

Might be time to call in the experts @ Peter

ADD REPLYlink written 19 months ago by Joe18k
gravatar for Peter
19 months ago by
Scotland, UK
Peter5.8k wrote:

The SeqRecord approach is intended to be more 'high level' with less of the file format details exposed directly. The GenBank specific Record object approach is quite 'low level' with lots of details you have to do yourself. But the immediate problem is you need to use as list of Qualifier objects, something like this:

from Bio.GenBank.Record import Record, Feature, Qualifier

record = Record(...)

for row in ...:
    feature = Feature()
    feature.key = "CDS"
    feature.location = "1..{}".format(len(row['DNA']))
    # feature.qualifiers should be a list of Qualifier objects:
    feature.qualifiers = [
        # These values do not need quoting:
        Qualifier("/transl_table=", "1"),
        Qualifier("/codon_start=", "1"),
        # If the value needs double quotes, you must add them:
        Qualifier("/translation=", '"%s"' % row['Seq']),
    with open(row['FullCloneName'] + '.gb', 'w') as output_file:

Note your example has every CDS starting at base one, but it looks like you are making minimal GenBank files each with only one CDS.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Peter5.8k

Thanks for clarifying the source code. I entered the format as you suggest but still get an error.

feature = Feature()
        feature.key = "CDS"
        feature.location = "1..{}".format(len(row['DNA']))
        feature.qualifiers = [
            Qualifier("/translation=", "%s % row['Seq']"),

File "", line 84, in main
Qualifier("/translation=", "%s % row['Seq']"),
TypeError: __init__() takes exactly 1 argument (3 given)

I'm still trying to understand the Qualifier class. I tried this approach but still can't get the Qualifier data into the feature object:

        feature = Feature()
        feature.key = "CDS"
        feature.location = "1..{}".format(len(row['DNA']))
        qualifier = Qualifier() 
        Qualifier.key = "/translation="
        Qualifier.value = "{}".format(row['Seq'])
        feature.qualifiers = [qualifier]    
        print (feature)
        print (Qualifier.key, Qualifier.value)
        print feature.qualifiers

There is no translation line in the Genbank file written by the script. The Qualifier attributes were read correctly, but not passed to the Feature method. The dot means "no data" for Genbank files. Here is the terminal output from the print statements.

CDS             1..375  

('/translation=', 'ERNDAYGHFIS')
[<bio.genbank.record.qualifier object="" at="" 0x7f3fb1f6a6d0="">]

ADD REPLYlink modified 19 months ago • written 19 months ago by toth.joe20

You need Biopython 1.73 for this to work:

qualifier = Qualifier("/translation=", '"%s"' % row['Seq'])

As you worked out, on older versions you must use:

qualifier = Qualifier()
qualifier.key = "/translation="
qualifier.value = '"%s"' % row['Seq']
ADD REPLYlink modified 19 months ago • written 19 months ago by Peter5.8k
gravatar for toth.joe
19 months ago by
toth.joe20 wrote:

Thanks for the help everyone. I was able to get the Genbank feature writer to work. It wasn't a problem with long strings, rather I had to pass the Qualifier key and value attributes correctly to the Feature method.

container = Record() = row['Sample']
container.size = len(row['DNA'])
container.data_file_division="PRI" = ("%d-%b-%Y")) # today's date
container.definition = row['FullCloneName']
container.accession = [row['Vgene']]
container.comment = 'project xyz'
container.version = getpass.getuser()
container.keywords = [row['ProjectName']]
feature = Feature()
feature.key = "CDS"
feature.location = "1..{}".format(len(row['DNA']))
Qualifier.key = "/translation="
Qualifier.value = '"{}"'.format(row['Seq'])

/# Save as GenBank file
with open(row['ProjectName'] + '_' + row['FullCloneName'] + '.gb', 'w') as output_file:

ADD COMMENTlink written 19 months ago by toth.joe20

It seems you must have been using an older version of Biopython, the code I shared needed Biopython 1.73 onwards (which I didn't realise at the time). It is always a good idea to state the version of a tool you are using in your question - and also the version of Python as that can be very important too.

ADD REPLYlink written 19 months ago by Peter5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1949 users visited in the last hour