Question: Genome annotation with BRAKER: how to interpret the results?
gravatar for svitlana.lukicheva
5 weeks ago by
Brussels, Belgium
svitlana.lukicheva10 wrote:


I am new to the genome annotation and I'm lost with the interpretation of results produced by BRAKER.

  • I have a de novo assembly of an insect genome (N50 = 350kb, length = 1.9 Gb).
  • I masked the repeats using RepeatModeler and RepeatMasker.
  • I mapped the RNA-Seq data from the same species to the (hard) masked genome with HISAT2.
  • I used BRAKER to annotate my (soft masked) genome with the bam file produced by HISAT2.

I have 54000 entries in the resulting augustus.hints.gff file. That means that Augustus predicted 54k genes, right? We expect to have between 10k and 20k genes for our species, so I would like to understand why there are so many genes in our prediction.

Among these 54k entries, 38k entries contain the following information:

# % of transcript supported by hints (any source): 0

Does it mean that these predictions are of poor quality and I should only keep predictions with a significant %?

Any other suggestions on how to enhance the annotation of my genome are welcome!

braker annotation • 123 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by svitlana.lukicheva10

that can not be the only output file, no? Can you check what the numbers in the fasta (output) files are?

also: what was the exact command you executed?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lieven.sterck5.5k

Thank you for your reply!

I also have a fasta file with AA and another with coding sequences, both containing 54 k genes.

The command I executed is: --cores 16 --species=mySpecies --genome=genome_softmasked.fa --bam=rnaseq_masked_sorted.bam --softmasking --gff3
ADD REPLYlink written 4 weeks ago by svitlana.lukicheva10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour