Question: Problem Adding Qualifiers To A Genbank File With Python
0
gravatar for jcastrofigueroa
5.7 years ago by
Norwich, UK
jcastrofigueroa140 wrote:

Hi guys:

I have this GenBank file:

LOCUS       sctg_0006_0001        172997 bp    DNA              UNK 01-JAN-1980
DEFINITION  sctg_0006_0001  length=172997
ACCESSION   sctg_0006_0001
VERSION     sctg_0006_0001
KEYWORDS    .
SOURCE      .
  ORGANISM  .

FEATURES             Location/Qualifiers
     CDS             <3..182
                     /note="ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.722;conf=99.97;score=35.07;cscore=31.85;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
     CDS             372..1145
                     /note="ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.755;conf=100.00;score=149.21;cscore=143.89;sscore=5.32;rscore=-0.60;uscore=1.69;tscore=4.88;"
     CDS[Many Many More]...

And as you can see it has the features and their respective location and a qualifier note. What I'm trying to do in to add a new qualifier called locus_tag to each CDS in this big file.

I have written this code, but I'm getting some problems:

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.SeqRecord import SeqRecord


annotation_handle = open("/Users/jcastrof/Desktop/prueba/prueba_str.gbk","rU")

for record in SeqIO.parse(annotation_handle,"genbank"):

    a = len(record.features)

    for_rast = open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w")

    for x in range(0, a):

        locus_tag = {"locus_tag":"%s_%s" % record.id,x+1)}


        new_record = (SeqFeature(qualifiers = locus_tag))


        record.features.append(new_record)


        SeqIO.write(record, for_rast, "genbank")
for_rast.close()

And I've got this error:

Traceback (most recent call last):
  File "/Users/k/Desktop/add_tag_locus.py", line 32, in <module>
    SeqIO.write(record, for_rast, "genbank")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 426, in write
    count = writer_class(fp).write_file(sequences)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 254, in write_file
    count = self.write_records(records)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 239, in write_records
    self.write_record(record)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 775, in write_record
    self._write_feature(feature, rec_length)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 305, in _write_feature
    assert feature.type, feature
AssertionError: type: 
location: None
qualifiers: 
    Key: locus_tag, Value: sctg_0006_0001_1

What would you suggest? (please try to help me out :D ). Thanks!

feature biopython genbank • 3.6k views
ADD COMMENTlink modified 5.7 years ago by Brad Chapman9.4k • written 5.7 years ago by jcastrofigueroa140

I remember that using Artemis ( http://www.sanger.ac.uk/resources/software/artemis/) you can load your Genbank file and add qualifiers (such as locus_tag) to all or specifically filtered features (for example easily to all CDS). Maybe this could save you some work.

ADD REPLYlink written 5.7 years ago by skymningen330

Thanks, I'll try it. But what I need is to do this for many files.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by jcastrofigueroa140

I think it will be fine for "many" as in 5 to 10, but if it's more around 200 you might have to get back to another solution.

ADD REPLYlink written 5.7 years ago by skymningen330
2
gravatar for Brad Chapman
5.7 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

Your code creates a new Feature with only your locus qualifier in it. The error message is because this new feature does not possess a type (like CDS), so it can't write it out to GenBank format.

It sounds from your description like you want to add a qualifier to CDS features, rather than making a new feature so want something like:

x = 0
final_features = []
for f in record.features:
    if f.type == "CDS":
        f.qualifiers["locus_tag"] = "%s_%s" % record.id, x+1)
        x += 1
    final_features.append(f)

record.features = final_features
with open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w") as for_rast:
    SeqIO.write(record, for_rast, "genbank")

Hope this helps

ADD COMMENTlink written 5.7 years ago by Brad Chapman9.4k
2

Tip: You can simplify the last two lines by just calling the write function with a filename instead of a handle.

Brad: Do you need the final_features list bit? Can't you remove that as you are editing the features in situ?

ADD REPLYlink modified 5.6 years ago • written 5.7 years ago by Peter5.8k

Peter, you're right on both accounts. For final_features, I was just trying to be explicit about the modification. I'm picking up that habit from working with immutable objects in Clojure. Your approach will work great as well and be shorter.

ADD REPLYlink written 5.7 years ago by Brad Chapman9.4k

You can (I presume) still edit your answer if you want to. The clojure influence makes sense now you've explained that ;)

ADD REPLYlink written 5.6 years ago by Peter5.8k

Yeah it worked!

ADD REPLYlink written 5.7 years ago by jcastrofigueroa140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour