Genome annotation with BRAKER: how to interpret the results?
0
1
Entering edit mode
4.9 years ago

Hello,

I am new to the genome annotation and I'm lost with the interpretation of results produced by BRAKER.

  • I have a de novo assembly of an insect genome (N50 = 350kb, length = 1.9 Gb).
  • I masked the repeats using RepeatModeler and RepeatMasker.
  • I mapped the RNA-Seq data from the same species to the (hard) masked genome with HISAT2.
  • I used BRAKER to annotate my (soft masked) genome with the bam file produced by HISAT2.

I have 54000 entries in the resulting augustus.hints.gff file. That means that Augustus predicted 54k genes, right? We expect to have between 10k and 20k genes for our species, so I would like to understand why there are so many genes in our prediction.

Among these 54k entries, 38k entries contain the following information:

# % of transcript supported by hints (any source): 0

Does it mean that these predictions are of poor quality and I should only keep predictions with a significant %?

Any other suggestions on how to enhance the annotation of my genome are welcome!

annotation braker • 2.2k views
ADD COMMENT
0
Entering edit mode

that can not be the only output file, no? Can you check what the numbers in the fasta (output) files are?

also: what was the exact command you executed?

ADD REPLY
0
Entering edit mode

Thank you for your reply!

I also have a fasta file with AA and another with coding sequences, both containing 54 k genes.

The command I executed is:

braker.pl --cores 16 --species=mySpecies --genome=genome_softmasked.fa --bam=rnaseq_masked_sorted.bam --softmasking --gff3
ADD REPLY

Login before adding your answer.

Traffic: 1438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6