Problem with downloading CDS sequences using Biopython
1
1
Entering edit mode
8.0 years ago
cl10101 ▴ 80

I'm trying to download CDS sequences for a given genome using Biopython. My script looks like this:

from Bio import Entrez 
from Bio import SeqIO

Entrez.email     = "c...@gmail.com"  
genomeAccessions = ['NC_021353.1', 'NC_020913.1']
handle = Entrez.efetch(db="nucleotide", id=genomeAccessions, rettype="gb")
records = SeqIO.parse(handle, "gb")

for i,record in enumerate(records):

    print(len(record.features))

    for feature in record.features:
        if feature.type == "CDS":
            print feature.location
            print feature.qualifiers["protein_id"]
            print feature.location.extract(record).seq

But using this code I get only one feature (for example for genome NC_021353) even though there are many features http://www.ncbi.nlm.nih.gov/nuccore/NC_021353.

I would be grateful for any suggestion what I'm doing wrong.

biopython ncbi entrez • 3.2k views
ADD COMMENT
1
Entering edit mode

I think something similar has been already discussed here:

But this solution is not in Biopython.

Download cds region coordinates

This one is.

How To Extract Just 'Cds' From Genbank File Into Another Genbank File?

There are some other suggestions here:

BioPython error parsing standard GenBank file

ADD REPLY
2
Entering edit mode
8.0 years ago
Markus ▴ 320

Your rettype must be "gbwithparts", this should work:

handle = Entrez.efetch(db="nucleotide", id=genomeAccessions, rettype="gbwithparts")

Running your program:

>>> 
======================== RESTART: D:/Desktop/test.py ========================
3823
[580:1822](+)
['WP_020447944.1']
ATGGACAACATCAGCGGCAACATCTTTG...
...
[1818:2619](-)
['WP_020447945.1']
GTGAATAAAGAGAACAAATCTACTTTAA...
...

Your code will break later, possibly there are entries without a protein_id:

Traceback (most recent call last):
  File "D:/Desktop/test.py", line 20, in <module>
    print feature.qualifiers["protein_id"]
KeyError: 'protein_id'
ADD COMMENT

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6