How to remove features from GBK files with biopython ?
0
0
Entering edit mode
20 months ago
hugo.avila ▴ 190

I need to remove features from a gbk file that have split coordinates:

join{[5355286:5355459](+), [0:17](+)}
join{[5355286:5355459](+), [0:17](+)}

So far i was able to test features to check if they are CompaundLocation types, but i culd not remove all off then from the record object:

#!/usr/bin/env python

from Bio import SeqIO
import sys

records = SeqIO.parse(sys.argv[1],"genbank")

for record in records:
    for index, feature in enumerate(record.features):
        try:
            if feature.location.operator == "join":
                record.features.pop(index)
        except:
            pass
    SeqIO.write(record, "out.gbk", "genbank")
biopython • 618 views
ADD COMMENT
0
Entering edit mode

Can you post on the genbank accession number for the file you are playing with so we can test some code?

ADD REPLY
1
Entering edit mode

Not at a computer to test right now, but it should be possible to do something like:

records.features.remove(feature)

If the features element is a list (I forget now). However, what you'll need to do is find out exactly what the feature object is, or its index within the genome. Your current approach should be what you need, but you want remove() instead of pop.

Edit: see comment below, del needed instead of pop

ADD REPLY
0
Entering edit mode

Thank you for the help. Replace pop() for remove() works, but only for genes, CDS still present on the output file.

Before:

FEATURES             Location/Qualifiers
     source          1..5355459
                     /organism="Klebsiella pneumoniae subsp. pneumoniae"
                     /mol_type="genomic DNA"
                     /strain="TGH10"
                     /isolation_source="wound swab"
                     /host="Homo sapiens"
                     /sub_species="pneumoniae"
                     /db_xref="taxon:72407"
                     /country="Greece"
                     /lat_lon="37.97 N 23.73 E"
                     /collection_date="02-Mar-2013"
                     /collected_by="Olympia"
     **gene            join(5355287..5355459,1..17)**
                     /locus_tag="AOG31_RS26915"
                     /pseudo=""
     **CDS             join(5355287..5355459,1..17)**
                     /locus_tag="AOG31_RS26915"
                     /inference="COORDINATES: similar to AA
                     sequence:RefSeq:WP_006808765.1"
                     /note="frameshifted; internal stop; Derived by automated
                     computational analysis using gene prediction method:
                     Protein Homology."
                     /pseudo=""
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"

After:

FEATURES             Location/Qualifiers
     source          1..5355459
                     /organism="Klebsiella pneumoniae subsp. pneumoniae"
                     /mol_type="genomic DNA"
                     /strain="TGH10"
                     /isolation_source="wound swab"
                     /host="Homo sapiens"
                     /sub_species="pneumoniae"
                     /db_xref="taxon:72407"
                     /country="Greece"
                     /lat_lon="37.97 N 23.73 E"
                     /collection_date="02-Mar-2013"
                     /collected_by="Olympia"
     **CDS             join(5355287..5355459,1..17)**
                     /locus_tag="AOG31_RS26915"
                     /inference="COORDINATES: similar to AA
                     sequence:RefSeq:WP_006808765.1"
                     /note="frameshifted; internal stop; Derived by automated
                     computational analysis using gene prediction method:
                     Protein Homology."
                     /pseudo=""
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
ADD REPLY
0
Entering edit mode

Hi, sorry i forgot to mention, here is the acc of the file NZ_CP012744.1.

ADD REPLY
1
Entering edit mode

I think the approach you have above will work, but you need del instead of pop.

Alternatively you can use filtering list comprehensions or filter functions to create a new list of features, e.g:

feat = [feature for feature in [feature for feature in feat  if feature.type == 'gene']]

will remove anything that has the feature type 'gene'. You can replace this with logic to remove joins instead, then re-write a new genbank file with the modified list of features.

ADD REPLY
0
Entering edit mode

This works:

#!/usr/bin/env python

from Bio import SeqIO
import sys

def check_feature(feature):
    try:
        if feature.location.operator == "join":
            return True
    except:
        return False

#save only the first record
record = [i for i in SeqIO.parse(sys.argv[1],"genbank")][0]

features = [feature for feature in [feature for feature in record.features if not check_feature(feature)]]
SeqIO.write(features, "out.gbk", "genbank")

But i can't write extracted features to a new file. Error message:

Traceback (most recent call last):
  File "./filter.py", line 17, in <module>
    SeqIO.write(features, "out.gbk", "genbank")
  File "/home/hugo/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 539, in write
    count = writer_class(fp).write_file(sequences)
  File "/home/hugo/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 253, in write_file
    count = self.write_records(records)
  File "/home/hugo/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 238, in write_records
    self.write_record(record)
  File "/home/hugo/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/InsdcIO.py", line 862, in write_record
    self._write_the_first_line(record)
  File "/home/hugo/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/InsdcIO.py", line 596, in _write_the_first_line
    locus = record.name
AttributeError: 'SeqFeature' object has no attribute 'name'
ADD REPLY
0
Entering edit mode

You dont want to just write the features, you need to add those features to a new genbank (or whatever).

You're trying to write the record, but passing it a list of the records features instead.

ADD REPLY
0
Entering edit mode

Ok i will do it Thanks for all the help !

ADD REPLY

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6