Question: Parsing GFF with biopython throws error
0
gravatar for Juliofdiaz
10 weeks ago by
Juliofdiaz130
Toronto, Ontario, Canada
Juliofdiaz130 wrote:

I using bio python to parse a GFF file, and I am using some of the sample code I found in their website (Basic GFF parsing section).

from BCBio import GFF

in_file = "my_genome.gff"

in_handle = open(in_file)
for rec in GFF.parse(in_handle):
    print(rec)
in_handle.close()

When I run it on my system, I get the following error:

Traceback (most recent call last):
  File "/home/zoo/zool2417/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
    self.stream = open(source, "r" + mode)
TypeError: expected str, bytes or os.PathLike object, not FakeHandle

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "test.py", line 6, in <module>
        for rec in GFF.parse(in_handle):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 746, in parse
        target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
        for results in self.parse_simple(gff_files, limit_info, target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 343, in parse_simple
        for results in self._gff_process(gff_files, limit_info, target_lines):
      File "/home/zoo/zool2417/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 637, in _gff_process
        for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 699, in _lines_to_out_info
        fasta_recs = self._parse_fasta(FakeHandle(line_iter))
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 560, in _parse_fasta
        return list(SeqIO.parse(in_handle, "fasta"))
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
        i = iterator_generator(handle)
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
        super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 46, in __init__
        if source.read(0) != "":
    TypeError: read() takes 1 positional argument but 2 were given

I am running python v. 3.7 and bio python v. 1.77, and bobio-gff v. 0.6.6 installed with bioconda

Any clues?

biopython gff • 203 views
ADD COMMENTlink modified 6 weeks ago by Biostar ♦♦ 20 • written 10 weeks ago by Juliofdiaz130

Hi,

Not related with your problem, but you forgot a bracket in print.

António

ADD REPLYlink written 10 weeks ago by antonioggsousa1.4k

Thanks, fixed it in the post

ADD REPLYlink written 10 weeks ago by Juliofdiaz130

I didn't forget the bracket (in my comment) and it still wasn't displayed in the code block. It works with extra spaces.

ADD REPLYlink written 10 weeks ago by user_without_id140

For me, this seems to work with open and without open:

from BCBio import GFF
from tempfile import NamedTemporaryFile as TempFile

gff = """
X   Ensembl Repeat  2419108 2419128 42  .   .   hid=trf; hstart=1; hend=21
"""


with TempFile() as t:
    t.write(gff.encode())
    t.flush()

    for x in GFF.parse( open( t.name ) ):
        print("OK!", len(x))

    for x in GFF.parse( t.name ):
        print("OK!", len(x))

bcbio-gff-0.6.6, biopython==1.77

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by user_without_id140

When I try your code I get a different exception:

Traceback (most recent call last):
  File "test2.py", line 13, in <module>
    for x in GFF.parse( open( t.name ) ):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 746, in parse
    target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
    for results in self.parse_simple(gff_files, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 343, in parse_simple
    for results in self._gff_process(gff_files, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 637, in _gff_process
    for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 667, in _lines_to_out_info
    results = self._map_fn(line, params)
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 177, in _gff_line_map
    assert len(parts) >= 8, line
AssertionError: X   Ensembl Repeat  2419108 2419128 42  .   .   hid=trf; hstart=1; hend=21
ADD REPLYlink written 9 weeks ago by Juliofdiaz130

That's because tabs in the string gff = "..." didn't survive the copy-pasting. This line gff = "\t".join(gff.split()) would fix it.

Anyway, the message was: try without open.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by user_without_id140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1914 users visited in the last hour