Entering edit mode
4.7 years ago
hugo.avila
▴
530
I need to remove features from a gbk file that have split coordinates:
join{[5355286:5355459](+), [0:17](+)}
join{[5355286:5355459](+), [0:17](+)}
So far i was able to test features to check if they are CompaundLocation types, but i culd not remove all off then from the record object:
#!/usr/bin/env python
from Bio import SeqIO
import sys
records = SeqIO.parse(sys.argv[1],"genbank")
for record in records:
for index, feature in enumerate(record.features):
try:
if feature.location.operator == "join":
record.features.pop(index)
except:
pass
SeqIO.write(record, "out.gbk", "genbank")
Can you post on the genbank accession number for the file you are playing with so we can test some code?
Not at a computer to test right now, but it should be possible to do something like:
If the
features
element is a list (I forget now). However, what you'll need to do is find out exactly what the feature object is, or its index within the genome. Your current approach should be what you need, but you wantremove()
instead ofpop
.Edit: see comment below, del needed instead of pop
Thank you for the help. Replace pop() for remove() works, but only for genes, CDS still present on the output file.
Before:
After:
Hi, sorry i forgot to mention, here is the acc of the file NZ_CP012744.1.
I think the approach you have above will work, but you need
del
instead ofpop
.Alternatively you can use filtering list comprehensions or filter functions to create a new list of features, e.g:
will remove anything that has the feature type
'gene'
. You can replace this with logic to remove joins instead, then re-write a new genbank file with the modified list of features.This works:
But i can't write extracted features to a new file. Error message:
You dont want to just write the features, you need to add those features to a new genbank (or whatever).
You're trying to write the record, but passing it a list of the records features instead.
Ok i will do it Thanks for all the help !