How to solve Non-unique top level ID error in Maker annotation pipeline ?
1
0
Entering edit mode
2.3 years ago
Upendra • 0

Hi,

I am trying to run a second round of Maker annotation job with a SNAP trained file. However, when I try to pass maker_gff=maker.all.gff.file.from.the.first.round.gff. I get Non-unique top level ID error for all the scaffolds. the first part of the maker_opts.ctl looks like this:

#-----Genome (these are always required)
genome=path/to/assembly.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=path/to/maker/gff/file/from/the/first/round/maker.round.1.all.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=1 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff=path/to/maker.round1.est2genome.gff #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=path/to/maker.round1.protein2genome.gff  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff=path/to/maker.round1.repeats.gff #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm=path/to/snap/trained/file/from/first/round/snap.round1.hmm #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species= my_species #Augustus gene prediction species model

and a part of the error file looks like this


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Scaff_17
Length: 21773
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
doing repeat masking
doing repeat masking
ERROR: Non-unique top level ID for Scaff_17:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs).  MAKER will not handle these correctly.

--> rank=5, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_17

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Scaff_17

examining contents of the fasta file and run log
ERROR: Non-unique top level ID for Scaff_1:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs).  MAKER will not handle these correctly.

--> rank=6, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_1

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Scaff_1

ERROR: Non-unique top level ID for Scaff_16:hit:0:1.3.0.0
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs).  MAKER will not handle these correctly.

--> rank=14, hostname=wbl008
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Scaff_16

It fails for all the scaffolds not just a few.

maker_round_1_master_datastore_index.log shows failed report for all the scaffolds.

I tried gff3_merge with and without -l flag, both gff3 files ended up giving the same error. I also tried gaas_maker_merge_outputs_from_datastore.pl and used maker_mix.gff file for maker_gff. It fails with the same error.

When I grep non unique id eg:

grep -n "Scaff_14:hit:0:1.3.0.0" maker_mix.gff

It shows two hits:

3383054:Scaff_14    repeatmasker    match   7723    7762    14  +   .   ID=Scaff_14:hit:0:1.3.0.0;Name=species:%28ATATA%29n|genus:Simple_repeat;Target=species:%28ATATA%29n|genus:Simple_repeat 1 41 +
3383055:Scaff_14    repeatmasker    match_part  7723    7762    14  +   .   ID=Scaff_14:hsp:0:1.3.0.0;Parent=Scaff_14:hit:0:1.3.0.0;Target=species:%2528ATATA%2529n|genus:Simple_repeat 1 41 +

I saw some tutorials not passing maker_gff on the second round of maker. But when I do that number of gene models decreases.

Can someone help me, please?

Thank you,

Upendra

gff3_file Genome_Annotation Non-unique_ID_error Maker_error • 1.6k views
ADD COMMENT
0
Entering edit mode

Any chance one of your input fasta files contains multiple records with the same name? In any case, the best place to ask about Maker is the mailing list.

ADD REPLY
0
Entering edit mode

Thank you for your reply. I had renamed the sequence header for the input fasta to be sure about that.

I will write on the mailing list, Thank you.

U

ADD REPLY
0
Entering edit mode
2.2 years ago
Juke34 8.5k

You provide twice the same info so you have duplicates. Either you use maker_gff (that contains all the files in one) and set the different track to 1 or you set the files independently using est_gff, protein_gff, etc… but not both.

ADD COMMENT
0
Entering edit mode

Thanks Juke for your comment, that makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6