Question: Problem with writing edited feature qualifier to a new gbk file in Biopython
0
gravatar for A
2.7 years ago by
A0
United Kingdom
A0 wrote:

Hi (Bio) python experts,

I have written a script in (bio)python below which parses through a gbk file, locates feature.qualifier['protein_id'] by matching it to protein_ids which I have given it (as keys in a dictionary) and amends the feature.qualifier['product'] (given as values to corresponding keys in dictionary). When I print the changes in the end I can see that the feature.qualifier['product'] has changed but the problem is when I try saving the changed features/record as a new gbk file it goes into a never ending loop.

What I am looking for to have is a new gbk file with everything else the same as the original gbk file but only the feature.qualifier['product'] changed. Kindly suggest where I am going wrong. I would really really appreciate your help.

PS: I have used all possible forums where they have added/changed features and written new gbk file but nothing  seems to solve it.

Very many thanks in advance

A

----------------------------------------------------------------------------------------------------------------------------------------------------------------

locustaglist=[]
newannotationlist=[]
LTandNAdic={}

with open ("CJM1cam_consensus_annotation_table_for_gbk_110315.txt", "rU") as file1:
    for l,line in enumerate (file1):
        line=line.rstrip()
        if l==0:
            pass
        else:
            CJM1camlocustag,CJM1camlocus,CJM1cam_consensuslocus,CJM1cam_annotation,CJM1cam_consensus_annotation,COG_description=line.split("\t")
            modifiedlocustag="VBC:"+str(CJM1camlocustag)
            locustaglist.append(modifiedlocustag)
            newannotationlist.append(str(CJM1cam_consensus_annotation))
for loc in range(len(locustaglist)):
    LTandNAdic[locustaglist[loc]]=newannotationlist[loc]
#print LTandNAdic

for k, v in LTandNAdic.items():
    #print k, v
    final_features=[]
    for index,record in enumerate (SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank")):
    #print record.annotations
        for f, feature in enumerate(record.features):
            if feature.type == "CDS":
                if str(feature.qualifiers['protein_id'][0])==k:
                    feature.qualifiers['product'][0]=v
                   # print feature.qualifiers['protein_id'],feature.qualifiers['product'](#desired chnage has happened but prints an ever-ending loop)
                    #print feature(#again continuous loop)
            final_features.append(feature)
        record.features=final_features
        #print record.features
        with open ("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w") as new_gbk:
            SeqIO.write(record,new_gbk,"genbank")(# continuous loop)
ADD COMMENTlink modified 2.7 years ago by Peter5.6k • written 2.7 years ago by A0
3
gravatar for Peter
2.7 years ago by
Peter5.6k
Scotland, UK
Peter5.6k wrote:

Each time round the loop you are creating a new file CJM1cam_CJM1locustag_consensus_locus_annotation.gbk and writing a single GenBank record to it. i.e. You are overwriting the output file each time round the loop. [And in fact you're doing this with two levels of looping, looping over LTandNAdic and looping over the input file.]

This is the "deliberate mistake" in the example length_filter_naive.py in my introductory Biopython workshop, see https://github.com/peterjc/biopython_workshop/tree/master/writing_sequence_files

Instead open a handle to CJM1cam_CJM1locustag_consensus_locus_annotation.gbk once, do the loop over the input file (CJM1cam_CJM1locustag_consensus_annot.gbk), edit the features using your mapping dictionary LTandNAdic, write out the revised record, and then after the loop close the output file.

Try something like this (untested):

output_handle = open("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w")
for record in SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank"):
    for feature in record.features:
        if feature.type == "CDS" and "protein_id" in feature.qualifiers:
            prot_id = feature.qualifiers['protein_id'][0]
            if prot_id in LTandNAdic:
                feature.qualifiers['product'][0] = LTandNAdic[prot_id]
    SeqIO.write(record, output_handle, "genbank")
output_handle.close()
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Peter5.6k

Many thanks for such a super quick response, Peter. I am trying to edit the script now as you have suggested. I will provide feedback if it works. Regards, A 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by A0

Dear Peter,

The script now works wonderfully, thanks to your magic bit of code. Many upvotes to your answer!

 

ADD REPLYlink written 2.7 years ago by A0
1

Great - thanks for remembering to let us know it worked for you.

ADD REPLYlink written 2.7 years ago by Peter5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 615 users visited in the last hour