Parsing a GTF file with BCBio-gff: AttributeError
1
0
Entering edit mode
11 weeks ago
Flavia • 0

I'm trying to parse a gtf file using this code:

from BCBio import GFF

gtf_rec = []
in_file = 'cuffcmp.combined.gtf'
out_file = 'extract.gtf'

with open(in_file) as f:
for line in f:
if 'class_code "x"' or 'class_code "u"' or 'class_code "i"' in line:
gtf_rec.append(line)

with open(out_file, "w") as out_handle:
GFF.write(gtf_rec, out_handle)

in_file.close()
out_handle.close()


When I print(gtf_rec), the required information is filtered out, but when I try to write then into a new file I get this AttributeError:

  File "/excise.py", line 18, in <module>
GFF.write(gtf, out_handle)
File "/GFFOutput.py", line 202, in write
return writer.write(recs, out_handle, include_fasta)
File "/GFFOutput.py", line 80, in write
self._write_rec(rec, out_handle)
File "/GFFOutput.py", line 108, in _write_rec
if len(rec.seq) > 0:
AttributeError: 'str' object has no attribute 'seq'


I'm new in bioinformatics, and I have spent to much time trying to solve this. The general explanation for this error can't help me to fix it.

Would like to know if some of you can find out the cause of the error or give me another tip to do this parsing.

There is extensive material to work with files other than gtf.

Thank you very much!

gtf gff AttributeError • 480 views
1
Entering edit mode
11 weeks ago

Your issue is that you parse your input manually into a custom array gtf_rec with items as strings. However, the function GFF.write expects input of the SeqRecord class. Instances of this class also have the required seq attribute.

Ideally, you should replace your custom input parsing with one that already makes use of the handy classes and functions provided by Biopython.

0
Entering edit mode

Oh, for sure, I already tried to find a specific SeqRecord limiter for attributes but I still couldn't.

But you gave me the error solution, your answer will certainly help me to get in something, thank you.

0
Entering edit mode

I think, GFF.parse(...,limit_info=) should do the trick to restrict the output to specific attributes. See section Limiting to features of interest in the tutorial.

Traffic: 1816 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.