Question: Maker Gff3 file issues
11 weeks ago by
alslonik70 wrote:

Hi community,

This is really a technical question, I hope it is OK to post it here...

I am trying to import the gff3 file from Maker to my Jbrowse to view the annotations. I am using the maker2jbrowse script and getting constant errors. There are no indications that Maker did produce a problematic file, the logs are w/o errors. Still I am getting this output:

GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).

Head of my gff3 file:

##gff-version 3
Chr6    .   contig  1   41368575    .   .   .   ID=Chr6;Name=Chr6
Chr6    maker   gene    9418414 9419484 .   -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.9;Name=maker-Chr6-exonerate_protein2genome-gene-94.9
Chr6    maker   mRNA    9418414 9419484 594 -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1;Parent=maker-Chr6-exonerate_protein2genome-gene-94.9;Name=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1;_AED=0.31;_eAED=0.43;_QI=0|0|0|1|0|0|2|0|197
Chr6    maker   exon    9418414 9418727 .   -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1:exon:382;Parent=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1
Chr6    maker   exon    9419205 9419484 .   -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1:exon:381;Parent=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1
Chr6    maker   CDS 9419205 9419484 .   -   0   ID=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1:cds;Parent=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1
Chr6    maker   CDS 9418414 9418727 .   -   2   ID=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1:cds;Parent=maker-Chr6-exonerate_protein2genome-gene-94.9-mRNA-1
Chr6    maker   gene    9469345 9471102 .   -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.15;Name=maker-Chr6-exonerate_protein2genome-gene-94.15
Chr6    maker   mRNA    9469345 9471102 588 -   .   ID=maker-Chr6-exonerate_protein2genome-gene-94.15-mRNA-1;Parent=maker-Chr6-exonerate_protein2genome-gene-94.15;Name=maker-Chr6-exonerate_protein

The file is 2.1 Gb large. How do I check for the validity of the file and more importantly how do I fix the file in case it is not valid?


alslonik

Hello alslonik ,

I don't know Maker or worked with Jbrowse. But the error message is quite clear to me. In a ggf3 file the value given in Parent= link to an entry in the file where you have the same value in ID=. And this is not always the case in your file.

fin swimmer

fin swimmer

Thanks, finswimmer, I understand what you mean. The question is how do I deal with this? Are there any ways to fix this in a gff3 file? Also, maybe it is a matter of sorting the file correctly? I have never worked with gff3 before, hence the questions...

alslonik

Is your gff3 file the output of gff3_merge without filtering? It appears that you have filtered to keep only source (i.e., 2nd column) as maker. Perhaps you need to redo gff3_merge without filtering it's output to input into JBrowse, as some features of the gff3 file seem to be missing.

jean.elbers

Not sure that I understand... Yes, I did:

gff3_merge -d logfile

I did not do any filtering while merging.

alslonik
11 weeks ago by
Juke-341.8k wrote:

I got often this kind of problem using MAKER on our cluster (LSF + openMPI). I never succeeded to find where the problem is coming from. I end up with some parent features missing or duplicated features. I have developed a library to standardise any kind of GTF/GFF that fix all kind of problem and produce a full gff3 output.

Clone this repository and install it: Then just do: -gff input.gff -o output.gff You can even add this option -v 1 to have a look at what problem is corrected.

Juke-34

WOW. Thanks, Juke-34 I am going to try it. Al least I am not the only one!!! And I ran Maker with open MPI too... Thank you very much!

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by alslonik70

Hi, an update. Your script throws me a list of awful errors and no output:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: OBO File Format Error - 
Cannot find tag format-version and/ default-namespace . These are required header.

STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/
STACK: Bio::OntologyIO::obo::_header /usr/share/perl5/Bio/OntologyIO/
STACK: Bio::OntologyIO::obo::parse /usr/share/perl5/Bio/OntologyIO/
STACK: BILS::Handler::GXFhandler::try {...}  /home/alex/bin/GAAS/annotation/BILS/Handler/
STACK: Try::Tiny::try /usr/share/perl5/Try/
STACK: BILS::Handler::GXFhandler::_handle_ontology /home/alex/bin/GAAS/annotation/BILS/Handler/
STACK: BILS::Handler::GXFhandler::slurp_gff3_file_JD /home/alex/bin/GAAS/annotation/BILS/Handler/

Let's continue without feature-ontology information.
No data retrieved among the feature-ontology.
=>GFF version parser used: 3

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: [  -   .   ID=Chr7:hsp:23258:;Parent=Chr7:hit:14758:;Target=gb|OWM91437.1| 350 575;Gap=M85 I4 M44 I4 M62 I4 M5 D7 M18] does not look like GFF3 to me
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/
STACK: Bio::Tools::GFF::_from_gff3_string /usr/share/perl5/Bio/Tools/
STACK: Bio::Tools::GFF::from_gff_string /usr/share/perl5/Bio/Tools/
STACK: Bio::Tools::GFF::next_feature /usr/share/perl5/Bio/Tools/
STACK: BILS::Handler::GXFhandler::slurp_gff3_file_JD /home/alex/bin/GAAS/annotation/BILS/Handler/

Does this mean that the file is corrupted? What do you think it means about my Maker run? Thanks again...

Does this mean that the file is corrupted? What do you think it means about my Maker run? Thanks again...

Sounds like a line in your file is not 9 columns. That cannot be fixed by the tool. You have to find this line manualy ( the one cited in the error) and fix it. I have already seen that... it’s rare but few times one line is split and written over 2 lines ...

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Juke-341.8k
