Question: MAKER annotation gives weird results
gravatar for int11ap1
4.4 years ago by
int11ap1420 wrote:

I am performing a gene annotation on an assembly and, for a specific loci, I have the following support:

Pp05    blastn  expressed_sequence_match    11506941    11508983    2043    +   .   ID=Pp05:hit:646548:;Name=asmbl_43336
Pp05    est2genome  expressed_sequence_match    11506941    11508983    10215   +   .   ID=Pp05:hit:656896:;Name=asmbl_43336
Pp05    cdna2genome expressed_sequence_match    11507201    11509028    9136    +   .   ID=Pp05:hit:666951:;Name=asmbl_12674
Pp05    tblastx translated_nucleotide_match 11507201    11509028    3245    -   .   ID=Pp05:hit:665604:;Name=asmbl_12674
Pp05    snap    match   11507276    11508937    95.824  +   .   ID=Pp05:hit:683530:;Name=snap-Pp05-abinit-gene-2.249-mRNA-1
Pp05    protein2genome  protein_match   11507330    11508925    1009    +   .   ID=Pp05:hit:676148:;Name=sp|Q9FYG4|GLOX1_ARATH
Pp05    blastx  protein_match   11507333    11508925    1129    +   .   ID=Pp05:hit:669247:;Name=sp|Q9FYG4|GLOX1_ARATH
Pp05    blastx  protein_match   11507363    11508925    1009    +   .   ID=Pp05:hit:669248:;Name=sp|Q3HRQ2|GLOX_VITPS
Pp05    protein2genome  protein_match   11507363    11508772    821 +   .   ID=Pp05:hit:676149:;Name=sp|Q3HRQ2|GLOX_VITPS

Why this is not annotated as a gene by maker?


In maker_opts.ctl I have:

#-----Genome (these are always required)
genome=/Synology/final_assembly.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff= #MAKER derived GFF3 file
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=/Synology/104_Lt_2.assemblies.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest=/Synology/104_ppersica_transcripts.assemblies.fasta #EST/cDNA sequence file in fasta format from an alternate organism
est_gff=/Synology/104_Lt_2.pasa_assemblies.renamed.gff3 #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff=/Synology/104_ppersica_transcripts.pasa_assemblies.renamed.gff3 #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/DATA/SwissProt_2016_09/uniprot_sprot_plants.fasta #protein sequence file in fasta format (i.e. from mutiple organisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm=/Synology/snap_3/p_dulcis.hmm #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species= #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=40 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=5000000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1000 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
min_intron=20 #minimum intron length (used for alignment polishing)
single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files
maker annotation gene • 2.5k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by int11ap1420

Could you show us the maker_opts.ctl file please ?

ADD REPLYlink written 4.4 years ago by Juke345.1k

I have edited my post with the content of maker_opts.ctl.

ADD REPLYlink written 4.4 years ago by int11ap1420

Do you have any Maker result un your output ? Did the Maker run finish correctly ? You could load the différent tracks (est2genome, protein2genome, snap) in a genome browser. It often helps to understand.

It looks like the gene model synthesis didn't work properly. You should relaunch using the option -t 10 to be sure that everything went well. (Keep everything as it is in your folder when you re-run Maker).

ADD REPLYlink written 4.4 years ago by Juke345.1k

I am facing similar problem. How did you solve yours? I would be grateful if you could share the solution. Thanks.

ADD REPLYlink written 2.9 years ago by urmi20830
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2268 users visited in the last hour