Question: Extracting exons from CDS CompoundLocations of a genbank file
gravatar for lhirsch
5.6 years ago by
lhirsch0 wrote:

Hi all,

 Using biopython, I'm dealing with a genbank file that only has CDS annotated as features.type . In order to extract the exon sequences in the whole genome, I'm trying to get their start and end positions from the FeatureLocations attribute, but I can't seem to understand how the CompoundLocation work.

For example:

 CompoundLocation([FeatureLocation(ExactPosition(368), ExactPosition(378), strand=1), FeatureLocation(ExactPosition(712), ExactPosition(1170), strand=1)], 'join')

 Using the record.features.location.[start|end].position I only get the start position of the first exon (368) and the end of the last exon (1170).

 Apparently the GenBank class has a function called _split_compound_loc() , but it only takes a list of the positions as an argument, which is exactly what I need in the first place.

 Is there a way to overcome these difficulties without having to parse the file manually? 

Many thanks

ADD COMMENTlink modified 5.5 years ago by Peter5.8k • written 5.6 years ago by lhirsch0
gravatar for Peter
5.5 years ago by
Scotland, UK
Peter5.8k wrote:

Python methods starting with a single underscore are by convention private, and you are best off avoiding them.

If you want the CDS sequence (as explained in the documentation), use the .extract(...) method of the SeqFeature (or location object).

If you want the individual exons from a CDS feature, then they would be the individual parts of the CompoundLocation, accessed via (which is a list). For your example, this would be a list of FeatureLocation(ExactPosition(368), ExactPosition(378), strand=1) and FeatureLocation(ExactPosition(712), ExactPosition(1170), strand=1) only.

I suggest reading the docstrings, either directly within Python using the help(...) command, on GitHub , or here:

ADD COMMENTlink written 5.5 years ago by Peter5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2074 users visited in the last hour