Hi guys:
I have this GenBank file:
LOCUS       sctg_0006_0001        172997 bp    DNA              UNK 01-JAN-1980
DEFINITION  sctg_0006_0001  length=172997
ACCESSION   sctg_0006_0001
VERSION     sctg_0006_0001
KEYWORDS    .
SOURCE      .
  ORGANISM  .
FEATURES             Location/Qualifiers
     CDS             <3..182
                     /note="ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.722;conf=99.97;score=35.07;cscore=31.85;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
     CDS             372..1145
                     /note="ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.755;conf=100.00;score=149.21;cscore=143.89;sscore=5.32;rscore=-0.60;uscore=1.69;tscore=4.88;"
     CDS[Many Many More]...
And as you can see it has the features and their respective location and a qualifier note. What I'm trying to do in to add a new qualifier called locus_tag to each CDS in this big file.
I have written this code, but I'm getting some problems:
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.SeqRecord import SeqRecord
annotation_handle = open("/Users/jcastrof/Desktop/prueba/prueba_str.gbk","rU")
for record in SeqIO.parse(annotation_handle,"genbank"):
    a = len(record.features)
    for_rast = open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w")
    for x in range(0, a):
        locus_tag = {"locus_tag":"%s_%s" % record.id,x+1)}
        new_record = (SeqFeature(qualifiers = locus_tag))
        record.features.append(new_record)
        SeqIO.write(record, for_rast, "genbank")
for_rast.close()
And I've got this error:
Traceback (most recent call last):
  File "/Users/k/Desktop/add_tag_locus.py", line 32, in <module>
    SeqIO.write(record, for_rast, "genbank")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 426, in write
    count = writer_class(fp).write_file(sequences)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 254, in write_file
    count = self.write_records(records)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 239, in write_records
    self.write_record(record)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 775, in write_record
    self._write_feature(feature, rec_length)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 305, in _write_feature
    assert feature.type, feature
AssertionError: type: 
location: None
qualifiers: 
    Key: locus_tag, Value: sctg_0006_0001_1
What would you suggest? (please try to help me out :D ). Thanks!
I remember that using Artemis ( http://www.sanger.ac.uk/resources/software/artemis/) you can load your Genbank file and add qualifiers (such as locus_tag) to all or specifically filtered features (for example easily to all CDS). Maybe this could save you some work.
Thanks, I'll try it. But what I need is to do this for many files.
I think it will be fine for "many" as in 5 to 10, but if it's more around 200 you might have to get back to another solution.