Question: How To Extract Just 'Cds' From Genbank File Into Another Genbank File?
0
gravatar for microbeatic
6.0 years ago by
microbeatic80
hanover, NH
microbeatic80 wrote:

I have a genbank file with multiple genomes (concatenated using cat). I want to produce a subset of this file with only 'CDS' sequences. It seems very trivial, but i am having trouble with this. Here is the code which succesfully prints each CDS record, but fails to produce a genbank file with those CDS.

from Bio import SeqIO,SeqFeature
import sys

gbank=SeqIO.parse(open(sys.argv[1],"rU"),"genbank"

for genome in gbank:
    print "looking in %s" %genome.id)
    for gene in genome.features:
        if gene.type == 'CDS':
            CDS=gene
            print CDS

output_handle=open("all_CDS.gbk","w")
SeqIO.write(CDS,output_handle,"genbank")    
output_handle.close()

The code prints CDS, but it produces following error at the end.

AttributeError: 'int' object has no attribute 'name'
biopython genbank • 7.3k views
ADD COMMENTlink modified 6.0 years ago by Peter5.8k • written 6.0 years ago by microbeatic80
3
gravatar for Peter
6.0 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

The error you are getting is because you are not giving SeqIO.write a SeqRecord object, but a single SeqFeature object (the final CDS looked at in the loop).

You need to edit the SeqRecord object to remove the unwanted SeqFeature objects, and then save that, e.g.

from Bio import SeqIO
record = SeqIO.parse("original.gbk","genbank")
record.features = [f for f in record.features if f.type == "CDS"]
SeqIO.write(record, "only_cds.gbk", "genbank")

That should give you a simpler GenBank file where the feature table only contains the CDS features. I'm not sure why you'd do that, but that is what I think you are asking for.

ADD COMMENTlink written 6.0 years ago by Peter5.8k

Thanks @Peter. It worked. I needed to "slimdown" the genbank file which i was using to extract location information and add to a sequence header. I am pretty novice to biopython and python (a month), so i realize its highly inefficient.

ADD REPLYlink written 6.0 years ago by microbeatic80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1225 users visited in the last hour