Problem Adding Qualifiers To A Genbank File With Python
1
0
Entering edit mode
7.7 years ago

Hi guys:

I have this GenBank file:

LOCUS       sctg_0006_0001        172997 bp    DNA              UNK 01-JAN-1980
DEFINITION  sctg_0006_0001  length=172997
ACCESSION   sctg_0006_0001
VERSION     sctg_0006_0001
KEYWORDS    .
SOURCE      .
  ORGANISM  .

FEATURES             Location/Qualifiers
     CDS             <3..182
                     /note="ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.722;conf=99.97;score=35.07;cscore=31.85;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
     CDS             372..1145
                     /note="ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.755;conf=100.00;score=149.21;cscore=143.89;sscore=5.32;rscore=-0.60;uscore=1.69;tscore=4.88;"
     CDS[Many Many More]...

And as you can see it has the features and their respective location and a qualifier note. What I'm trying to do in to add a new qualifier called locus_tag to each CDS in this big file.

I have written this code, but I'm getting some problems:

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.SeqRecord import SeqRecord


annotation_handle = open("/Users/jcastrof/Desktop/prueba/prueba_str.gbk","rU")

for record in SeqIO.parse(annotation_handle,"genbank"):

    a = len(record.features)

    for_rast = open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w")

    for x in range(0, a):

        locus_tag = {"locus_tag":"%s_%s" % record.id,x+1)}


        new_record = (SeqFeature(qualifiers = locus_tag))


        record.features.append(new_record)


        SeqIO.write(record, for_rast, "genbank")
for_rast.close()

And I've got this error:

Traceback (most recent call last):
  File "/Users/k/Desktop/add_tag_locus.py", line 32, in <module>
    SeqIO.write(record, for_rast, "genbank")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 426, in write
    count = writer_class(fp).write_file(sequences)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 254, in write_file
    count = self.write_records(records)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 239, in write_records
    self.write_record(record)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 775, in write_record
    self._write_feature(feature, rec_length)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 305, in _write_feature
    assert feature.type, feature
AssertionError: type: 
location: None
qualifiers: 
    Key: locus_tag, Value: sctg_0006_0001_1

What would you suggest? (please try to help me out :D ). Thanks!

biopython genbank feature • 4.4k views
ADD COMMENT
0
Entering edit mode

I remember that using Artemis ( http://www.sanger.ac.uk/resources/software/artemis/) you can load your Genbank file and add qualifiers (such as locus_tag) to all or specifically filtered features (for example easily to all CDS). Maybe this could save you some work.

ADD REPLY
0
Entering edit mode

Thanks, I'll try it. But what I need is to do this for many files.

ADD REPLY
0
Entering edit mode

I think it will be fine for "many" as in 5 to 10, but if it's more around 200 you might have to get back to another solution.

ADD REPLY
4
Entering edit mode
7.7 years ago

Your code creates a new Feature with only your locus qualifier in it. The error message is because this new feature does not possess a type (like CDS), so it can't write it out to GenBank format.

It sounds from your description like you want to add a qualifier to CDS features, rather than making a new feature so want something like:

x = 0
final_features = []
for f in record.features:
    if f.type == "CDS":
        f.qualifiers["locus_tag"] = "%s_%s" % record.id, x+1)
        x += 1
    final_features.append(f)

record.features = final_features
with open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w") as for_rast:
    SeqIO.write(record, for_rast, "genbank")

Hope this helps

ADD COMMENT
2
Entering edit mode

Tip: You can simplify the last two lines by just calling the write function with a filename instead of a handle.

Brad: Do you need the final_features list bit? Can't you remove that as you are editing the features in situ?

ADD REPLY
0
Entering edit mode

Peter, you're right on both accounts. For final_features, I was just trying to be explicit about the modification. I'm picking up that habit from working with immutable objects in Clojure. Your approach will work great as well and be shorter.

ADD REPLY
0
Entering edit mode

You can (I presume) still edit your answer if you want to. The clojure influence makes sense now you've explained that ;)

ADD REPLY
0
Entering edit mode

Yeah it worked!

ADD REPLY

Login before adding your answer.

Traffic: 2101 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6