Problems with extracting genes from a genbank file using biopython
1
0
Entering edit mode
5.6 years ago

I am trying go extract the gene positions from a genbank file using Biopython. This is the function i wrote so far:

def get_CDS(file):
record = SeqIO.read(file, "genbank")
cds = []
for feature in record.features:
    if feature.type == 'CDS':
        print feature.location
        start_i = feature.location.start
        end_i = feature.location.end
        cds.append((start_i, end_i))
return cds

However I noticed sometimes, there are entries like:

join{[4585844:4586295](-), [4584940:4585845](-)}

And then start and end positions will return: 4584940 and 4586295.

Does someone maybe know, how can I also get the positions of the genes accordingly, for the first part of the gene [4585844:4586295] and then [4584940:4585845]

gene genome biopython python • 1.5k views
ADD COMMENT
0
Entering edit mode

Could you please provide accession number of the genbank file you are trying to parse using this code?

ADD REPLY
0
Entering edit mode

For example, one of the pestis genomes causes this problem: NC_003143

ADD REPLY
1
Entering edit mode
5.6 years ago
Sej Modha 5.3k

Hi There,

I have used a sample genbank file here, the following should work for you too.

This produces following output.

['335:4642', '335:1838', '4586:5165', '5104:5396', '5376:7970', '5515:8199', '5607:5856', '5770:8341', '6918:7488', '8342:8963']
ADD COMMENT

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6