Hi all,
Using biopython, I'm dealing with a genbank file that only has CDS annotated as features.type. In order to extract the exon sequences in the whole genome, I'm trying to get their start and end positions from the FeatureLocations attribute, but I can't seem to understand how the CompoundLocation work.
For example:
CompoundLocation([FeatureLocation(ExactPosition(368), ExactPosition(378), strand=1), FeatureLocation(ExactPosition(712), ExactPosition(1170), strand=1)], 'join')
Using the record.features.location.[start|end].position I only get the start position of the first exon (368) and the end of the last exon (1170).
Apparently the GenBank class has a function called _split_compound_loc()
, but it only takes a list of the positions as an argument, which is exactly what I need in the first place.
Is there a way to overcome these difficulties without having to parse the file manually?
Many thanks