Question: Print gene locations from a .gbk file
1
gravatar for kxd419
5.7 years ago by
kxd41910
United Kingdom
kxd41910 wrote:

Hello, 

I am trying extract certain information from a gbk file I can extract the locus tag and the amino acid sequence however I am struggling to extract the gene location as it not in the same format in the file eg:  /locus_tag="NCTC86_00002" 

 

This is my scrip so far: 
 

from Bio import GenBank
from Bio import SeqIO

gbk_filename = "HS.gb"
faa_filename = "HS_converted.faa"

input_handle  = open(gbk_filename, "r")
output_handle = open(faa_filename, "w")

for seq_record in SeqIO.parse(input_handle, "genbank") :
    print "Dealing with GenBank record %s" % seq_record.id
    for seq_feature in seq_record.features :
        if seq_feature.type=="CDS" :
            assert len(seq_feature.qualifiers['translation'])==1
            output_handle.write(">%s from %s\n%s\n" % (
                   seq_feature.qualifiers['locus_tag'][0],
                   seq_record.name,
                   seq_feature.qualifiers['translation'][0]))

output_handle.close()
input_handle.close()
print "Done"

 

 

gbk python gene • 3.4k views
ADD COMMENTlink modified 3.3 years ago by nhaituan10 • written 5.7 years ago by kxd41910

So also if I wanted to print the gene annotation, not every  CDS entry contains a /gene="" 

Do I need to put in a if there is no /gene="" clause?

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by kxd41910
3
gravatar for mgalactus
5.7 years ago by
mgalactus750
United Kingdom
mgalactus750 wrote:

Hi,

the SeqFeature objects have a "location" attribute that contains the start/stop position of the feature.

from Bio import SeqIO

gbk_filename = "HS.gb"
faa_filename = "HS_converted.faa"

output_handle = open(faa_filename, "w")

for seq_record in SeqIO.parse(gbk_filename, "genbank") :
    print "Dealing with GenBank record %s" % seq_record.id
    for seq_feature in seq_record.features :
        if seq_feature.type=="CDS":
            assert len(seq_feature.qualifiers['translation'])==1
            output_handle.write(">%s from %s\n%s\n" % (
                   seq_feature.qualifiers['locus_tag'][0],
                   seq_record.name,
                   seq_feature.qualifiers['translation'][0]))
            print('Start: %d, Stop: %d, Strand: %d'%(int(seq_feature.location.start),
                                                     int(seq_feature.location.end),
                                                     seq_feature.strand))

output_handle.close()
print "Done"

 

Hope this helps

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by mgalactus750

Hi mgalactus

 

Thank you so much for your help. Im brand new to python and am trying to learn my best. 

When I run that script I get this error: 
Traceback (most recent call last):


  File "gbkaafasta.py", line 18, in <module>
    print('Start: %d, Stop: %d, Strand: %d'%(int(seq_feature.locations.start),
AttributeError: 'SeqFeature' object has no attribute 'locations'

 

 

Thank you so much for your help

ADD REPLYlink written 5.7 years ago by kxd41910

My fault, it should have been "location" and not "locations" (I've updated the reply)

ADD REPLYlink written 5.7 years ago by mgalactus750

You are fantastic! It works perfectly. Thank you so much

ADD REPLYlink written 5.7 years ago by kxd41910
1
gravatar for nhaituan
3.3 years ago by
nhaituan10
nhaituan10 wrote:

Thank you very much, here my modified codes:

from Bio import SeqIO
gbk_filename = "N16961_Ch1.gbk"
faa_filename = "N16961_Ch1.faa"
output_handle = open(faa_filename, "w")
for seq_record in SeqIO.parse(gbk_filename, "genbank") :
    print "Dealing with GenBank record %s" % seq_record.id
    for seq_feature in seq_record.features :
        if seq_feature.type=="CDS":
            assert len(seq_feature.qualifiers['translation'])==1
            output_handle.write(">%s Start: %s Stop: %s Strand: %s From %s\n" % (
                   seq_feature.qualifiers['locus_tag'][0],
                   seq_feature.location.start,
                   seq_feature.location.end,
                   seq_feature.strand,
                   seq_record.name))

output_handle.close()
print "Done"
ADD COMMENTlink written 3.3 years ago by nhaituan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2279 users visited in the last hour
_