Question

Problem with writing edited feature qualifier to a new gbk file in Biopython

0

Entering edit mode

9.1 years ago

A • 0

Hi (Bio) python experts,

I have written a script in (bio)python below which parses through a gbk file, locates feature.qualifier['protein_id'] by matching it to protein_ids which I have given it (as keys in a dictionary) and amends the feature.qualifier['product'] (given as values to corresponding keys in dictionary). When I print the changes in the end I can see that the feature.qualifier['product'] has changed but the problem is when I try saving the changed features/record as a new gbk file it goes into a never ending loop.

What I am looking for to have is a new gbk file with everything else the same as the original gbk file but only the feature.qualifier['product'] changed. Kindly suggest where I am going wrong. I would really really appreciate your help.

PS: I have used all possible forums where they have added/changed features and written new gbk file but nothing seems to solve it.

Very many thanks in advance

A

locustaglist=[]
newannotationlist=[]
LTandNAdic={}

with open ("CJM1cam_consensus_annotation_table_for_gbk_110315.txt", "rU") as file1:
    for l,line in enumerate (file1):
        line=line.rstrip()
        if l==0:
            pass
        else:
            CJM1camlocustag,CJM1camlocus,CJM1cam_consensuslocus,CJM1cam_annotation,CJM1cam_consensus_annotation,COG_description=line.split("\t")
            modifiedlocustag="VBC:"+str(CJM1camlocustag)
            locustaglist.append(modifiedlocustag)
            newannotationlist.append(str(CJM1cam_consensus_annotation))
for loc in range(len(locustaglist)):
    LTandNAdic[locustaglist[loc]]=newannotationlist[loc]
#print LTandNAdic

for k, v in LTandNAdic.items():
    #print k, v
    final_features=[]
    for index,record in enumerate (SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank")):
    #print record.annotations
        for f, feature in enumerate(record.features):
            if feature.type == "CDS":
                if str(feature.qualifiers['protein_id'][0])==k:
                    feature.qualifiers['product'][0]=v
                   # print feature.qualifiers['protein_id'],feature.qualifiers['product'](#desired chnage has happened but prints an ever-ending loop)
                    #print feature(#again continuous loop)
            final_features.append(feature)
        record.features=final_features
        #print record.features
        with open ("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w") as new_gbk:
            SeqIO.write(record,new_gbk,"genbank")(# continuous loop)

gene-sequence genome-sequencing • 3.4k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by A • 0

Ram · Accepted Answer · 2015-03-17

3

Entering edit mode

9.1 years ago

Peter 6.0k

Each time round the loop you are creating a new file CJM1cam_CJM1locustag_consensus_locus_annotation.gbk and writing a single GenBank record to it. i.e. You are overwriting the output file each time round the loop. [And in fact you're doing this with two levels of looping, looping over LTandNAdic and looping over the input file.]

This is the "deliberate mistake" in the example length_filter_naive.py in my introductory Biopython workshop, see https://github.com/peterjc/biopython_workshop/tree/master/writing_sequence_files

Instead open a handle to CJM1cam_CJM1locustag_consensus_locus_annotation.gbk once, do the loop over the input file (CJM1cam_CJM1locustag_consensus_annot.gbk), edit the features using your mapping dictionary LTandNAdic, write out the revised record, and then after the loop close the output file.

Try something like this (untested):

output_handle = open("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w")
for record in SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank"):
    for feature in record.features:
        if feature.type == "CDS" and "protein_id" in feature.qualifiers:
            prot_id = feature.qualifiers['protein_id'][0]
            if prot_id in LTandNAdic:
                feature.qualifiers['product'][0] = LTandNAdic[prot_id]
    SeqIO.write(record, output_handle, "genbank")
output_handle.close()

ADD COMMENT • link 9.1 years ago by Peter 6.0k

0

Entering edit mode

Many thanks for such a super quick response, Peter. I am trying to edit the script now as you have suggested. I will provide feedback if it works. Regards, A

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by A • 0

0

Entering edit mode

Dear Peter,

The script now works wonderfully, thanks to your magic bit of code. Many upvotes to your answer!

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by A • 0

1

Entering edit mode

Great - thanks for remembering to let us know it worked for you.

ADD REPLY • link 9.1 years ago by Peter 6.0k