Question: parsing gbk file to get fna file
0
gravatar for ulises.rodriguez
11 months ago by
ulises.rodriguez0 wrote:

I'm using a script to get .fna files from .gbk files

import sys
from Bio import SeqIO

lista = open('list_gbk.txt')  
for line in lista:
        line = line.rstrip()  
        fasta = line+".fna" 
        sys.stdout=open(fasta,"w")
        for rec in SeqIO.parse(line, "genbank"):
                if rec.features:
                        for feature in rec.features:
                                if feature.type == "CDS":
                                        print ">", feature.location, feature.qualifiers['product'],"\n",feature.location.extract(rec).seq
        sys.stdout.close()

But I'm getting the next message error

Traceback (most recent call last):
  File "parser_gbk_2.py", line 14, in <module>
    print ">", feature.location, feature.qualifiers['product'],"\n",feature.location.extract(rec).seq
KeyError: 'product'
python • 503 views
ADD COMMENTlink modified 11 months ago by finswimmer13k • written 11 months ago by ulises.rodriguez0

Side note: Looks like you're using python2.7 - time to abandon that and switch to python3.6. Python 2.7 will be retired in under 6 months: https://pythonclock.org/

ADD REPLYlink written 11 months ago by RamRS27k
0
gravatar for finswimmer
11 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

you have a CDS feature that hasn't a "product" information. This is why python could'nt find that key.

As RamRS it looks like you are using python2 and it's time to python3. Here are some more hints on your code if you like:

line = line.rstrip()  
fasta = line+".fna"

In python 3.6 you can use f-Strings to shorten this to:

fasta = f"{line.strip()}.fa"

sys.stdout=open(fasta,"w")

I don't see any reason why you are using sys.stdout here to write to a file. It's better to use the with statement. So python will take care for you about closing the file again.

print ">", feature.location, feature.qualifiers['product'],"\n",feature.location.extract(rec).seq

In python3 print is a method. So you would need to use print(...).

So at all you code could look like this:

from Bio import SeqIO

with open("list_gbk.txt") as lista:
    for line in lista:
        with open(f"{line.strip()}.fa", "w") as outfile:
            for rec in SeqIO.parse(line, "genbank"):
                if rec.features:
                    for feature in rec.features:
                        if feature.type == "CDS":
                            outfile.write(
                                f">{feature.location} {feature.qualifiers['product'] if 'product' in feature.qualifiers else ''}")
                            outfile.write(feature.location.extract(rec).seq)
ADD COMMENTlink written 11 months ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1029 users visited in the last hour