Problem with writing edited feature qualifier to a new gbk file in Biopython
1
0
Entering edit mode
6.1 years ago
A • 0

Hi (Bio) python experts,

I have written a script in (bio)python below which parses through a gbk file, locates feature.qualifier['protein_id'] by matching it to protein_ids which I have given it (as keys in a dictionary) and amends the feature.qualifier['product'] (given as values to corresponding keys in dictionary). When I print the changes in the end I can see that the feature.qualifier['product'] has changed but the problem is when I try saving the changed features/record as a new gbk file it goes into a never ending loop.

What I am looking for to have is a new gbk file with everything else the same as the original gbk file but only the feature.qualifier['product'] changed. Kindly suggest where I am going wrong. I would really really appreciate your help.

PS: I have used all possible forums where they have added/changed features and written new gbk file but nothing  seems to solve it.

Very many thanks in advance

A

----------------------------------------------------------------------------------------------------------------------------------------------------------------

locustaglist=[]
newannotationlist=[]
LTandNAdic={}

with open ("CJM1cam_consensus_annotation_table_for_gbk_110315.txt", "rU") as file1:
    for l,line in enumerate (file1):
        line=line.rstrip()
        if l==0:
            pass
        else:
            CJM1camlocustag,CJM1camlocus,CJM1cam_consensuslocus,CJM1cam_annotation,CJM1cam_consensus_annotation,COG_description=line.split("\t")
            modifiedlocustag="VBC:"+str(CJM1camlocustag)
            locustaglist.append(modifiedlocustag)
            newannotationlist.append(str(CJM1cam_consensus_annotation))
for loc in range(len(locustaglist)):
    LTandNAdic[locustaglist[loc]]=newannotationlist[loc]
#print LTandNAdic

for k, v in LTandNAdic.items():
    #print k, v
    final_features=[]
    for index,record in enumerate (SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank")):
    #print record.annotations
        for f, feature in enumerate(record.features):
            if feature.type == "CDS":
                if str(feature.qualifiers['protein_id'][0])==k:
                    feature.qualifiers['product'][0]=v
                   # print feature.qualifiers['protein_id'],feature.qualifiers['product'](#desired chnage has happened but prints an ever-ending loop)
                    #print feature(#again continuous loop)
            final_features.append(feature)
        record.features=final_features
        #print record.features
        with open ("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w") as new_gbk:
            SeqIO.write(record,new_gbk,"genbank")(# continuous loop)
genome sequencing gene sequence Forum • 2.3k views
ADD COMMENT
3
Entering edit mode
6.1 years ago
Peter 5.9k

Each time round the loop you are creating a new file CJM1cam_CJM1locustag_consensus_locus_annotation.gbk and writing a single GenBank record to it. i.e. You are overwriting the output file each time round the loop. [And in fact you're doing this with two levels of looping, looping over LTandNAdic and looping over the input file.]

This is the "deliberate mistake" in the example length_filter_naive.py in my introductory Biopython workshop, see https://github.com/peterjc/biopython_workshop/tree/master/writing_sequence_files

Instead open a handle to CJM1cam_CJM1locustag_consensus_locus_annotation.gbk once, do the loop over the input file (CJM1cam_CJM1locustag_consensus_annot.gbk), edit the features using your mapping dictionary LTandNAdic, write out the revised record, and then after the loop close the output file.

Try something like this (untested):

output_handle = open("CJM1cam_CJM1locustag_consensus_locus_annotation.gbk", "w")
for record in SeqIO.parse("CJM1cam_CJM1locustag_consensus_annot.gbk", "genbank"):
    for feature in record.features:
        if feature.type == "CDS" and "protein_id" in feature.qualifiers:
            prot_id = feature.qualifiers['protein_id'][0]
            if prot_id in LTandNAdic:
                feature.qualifiers['product'][0] = LTandNAdic[prot_id]
    SeqIO.write(record, output_handle, "genbank")
output_handle.close()
ADD COMMENT
0
Entering edit mode

Many thanks for such a super quick response, Peter. I am trying to edit the script now as you have suggested. I will provide feedback if it works. Regards, A 

ADD REPLY
0
Entering edit mode

Dear Peter,

The script now works wonderfully, thanks to your magic bit of code. Many upvotes to your answer!

 

ADD REPLY
1
Entering edit mode

Great - thanks for remembering to let us know it worked for you.

ADD REPLY

Login before adding your answer.

Traffic: 1195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6