Using biopython, I'm dealing with a genbank file that only has CDS annotated as features.type . In order to extract the exon sequences in the whole genome, I'm trying to get their start and end positions from the FeatureLocations attribute, but I can't seem to understand how the CompoundLocation work.
CompoundLocation([FeatureLocation(ExactPosition(368), ExactPosition(378), strand=1), FeatureLocation(ExactPosition(712), ExactPosition(1170), strand=1)], 'join')
Using the record.features.location.[start|end].position I only get the start position of the first exon (368) and the end of the last exon (1170).
Apparently the GenBank class has a function called _split_compound_loc() , but it only takes a list of the positions as an argument, which is exactly what I need in the first place.
Is there a way to overcome these difficulties without having to parse the file manually?