Parsing a GTF file with BCBio-gff: AttributeError
1
0
Entering edit mode
15 months ago
Flavia • 0

I'm trying to parse a gtf file using this code:

from BCBio import GFF

gtf_rec = []
in_file = 'cuffcmp.combined.gtf'
out_file = 'extract.gtf'

with open(in_file) as f:
    for line in f:
        if 'class_code "x"' or 'class_code "u"' or 'class_code "i"' in line:
            gtf_rec.append(line)

with open(out_file, "w") as out_handle:
    GFF.write(gtf_rec, out_handle)

in_file.close()
out_handle.close()

When I print(gtf_rec), the required information is filtered out, but when I try to write then into a new file I get this AttributeError:

  File "/excise.py", line 18, in <module>
    GFF.write(gtf, out_handle)
  File "/GFFOutput.py", line 202, in write
    return writer.write(recs, out_handle, include_fasta)
  File "/GFFOutput.py", line 80, in write
    self._write_rec(rec, out_handle)
  File "/GFFOutput.py", line 108, in _write_rec
    if len(rec.seq) > 0:
AttributeError: 'str' object has no attribute 'seq'

I'm new in bioinformatics, and I have spent to much time trying to solve this. The general explanation for this error can't help me to fix it.

Would like to know if some of you can find out the cause of the error or give me another tip to do this parsing.

There is extensive material to work with files other than gtf.

Thank you very much!

gtf gff AttributeError • 971 views
ADD COMMENT
1
Entering edit mode
15 months ago

Your issue is that you parse your input manually into a custom array gtf_rec with items as strings. However, the function GFF.write expects input of the SeqRecord class. Instances of this class also have the required seq attribute.

Ideally, you should replace your custom input parsing with one that already makes use of the handy classes and functions provided by Biopython.

ADD COMMENT
0
Entering edit mode

Oh, for sure, I already tried to find a specific SeqRecord limiter for attributes but I still couldn't.

But you gave me the error solution, your answer will certainly help me to get in something, thank you.

ADD REPLY
0
Entering edit mode

I think, GFF.parse(...,limit_info=) should do the trick to restrict the output to specific attributes. See section Limiting to features of interest in the tutorial.

ADD REPLY

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6