Feature.extract(SeqRecord) misses features
0
0
Entering edit mode
11 weeks ago
niklas • 0

Hi,

I am somewhat new to using Biopython and have the following problem: I have a plasmid with annotations (features), that I open with SeqIO.parse. From this plasmid, I want to extract a certain region, together with the respective features. Hence I create a new SeqFeature with the FeatureLocation that I want to cut out of the plasmid. With this created feature, I do

cutout = feature.extract(plasmid)

However, the function extract only extracts features that are within the given range. It does not partially copy features and crops them. Is there a way, to do so? Or would I need to manually write a function to do so? Furthermore, it is unable to copy features with compound locations, for example features that wrap around the end and origin of the sequence.

Here is an example script

seq = "TA" + "A".join(["C"*9] * 10) + "AG"
test_plasmid = SeqRecord(
    seq,
    features=[
        SeqFeature(FeatureLocation(0, len(seq)), type="source"),
        SeqFeature(FeatureLocation(5, 15), type="CDS"),
        SeqFeature(FeatureLocation(92, 99), type="CDS"),
        SeqFeature(FeatureLocation(99, len(seq))+FeatureLocation(0, 5), type="CDS"), # compound feature wrapping around origin
        SeqFeature(FeatureLocation(4, 96), type="misc_feature"),
        SeqFeature(FeatureLocation(20, 50), type="gene"),
    ],
    annotations={
        "molecule_type": "ds-DNA",
        "topology": "circular",
    },
    id="test",
    name="test",
    description="test",
)
feature = SeqFeature(FeatureLocation(90, len(seq))+FeatureLocation(0, 13))

cutout = feature.extract(test_plasmid)

print(test_plasmid .features)
print(cutout.features)

While the test_plasmid has six features, only one (the third one) gets copied. I understand that cropping features is not done. But why is feature four not copied? It also lies within the range, although it is a compound location.

I am happy about any help!!!

Best,
Niklas

Biopython • 240 views
ADD COMMENT
0
Entering edit mode

Am I right in thinking you're just trying to get features within a given range? Biopython supports this natively - see here for instance: Slicing Genbank File by 'gene_id' range . Pretty sure you're using SeqFeatures and .extract wrong above.

Compound features will complicate things however, and I think you may have to write logic specifically to handle these. I can't say I've worked with these closely though so I would advise looking at the documentation.

You might also find this discussion about circular sequences and compound features helpful: https://github.com/biopython/biopython/issues/2158

This approach might be more useful: Extract genbank features inside a given range

Unless you are adamant about only isolating features with a specific range, I think the way to go might be to check if any part of any of the features you care about falls inside the range, and then pull out the feature in full. Its coords might terminate outside your specified initial range but maybe that's OK, I don't know.

You will also need to be mindful of the type of feature (e.g a CDS or a source etc), IIRC, not all of them are extracted as features in the sense most would use it. Someone who has tested recently can correct me, but I think, for instance the source record is ignored unless you specifically try to retrieve it.

ADD REPLY
0
Entering edit mode

Thank you for your quick reply.

Am I right in thinking you're just trying to get features within a given range? Biopython supports this natively - see here for instance: Slicing Genbank File by 'gene_id' range . Pretty sure you're using SeqFeatures and .extract wrong above.

I was using .extract, as I want to crop wrapped around the origin, which the native [start, end] notation does not support. But the native slicing also does not support copying features that start or stop outside of the start..end range.

However, I will look into the other posts also.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6