Question

Genome annotation with BRAKER: how to interpret the results?

1

Entering edit mode

4.9 years ago

svitlana.lukicheva ▴ 10

Hello,

I am new to the genome annotation and I'm lost with the interpretation of results produced by BRAKER.

I have a de novo assembly of an insect genome (N50 = 350kb, length = 1.9 Gb).
I masked the repeats using RepeatModeler and RepeatMasker.
I mapped the RNA-Seq data from the same species to the (hard) masked genome with HISAT2.
I used BRAKER to annotate my (soft masked) genome with the bam file produced by HISAT2.

I have 54000 entries in the resulting augustus.hints.gff file. That means that Augustus predicted 54k genes, right? We expect to have between 10k and 20k genes for our species, so I would like to understand why there are so many genes in our prediction.

Among these 54k entries, 38k entries contain the following information:

# % of transcript supported by hints (any source): 0

Does it mean that these predictions are of poor quality and I should only keep predictions with a significant %?

Any other suggestions on how to enhance the annotation of my genome are welcome!

annotation braker • 2.2k views

ADD COMMENT • link 4.9 years ago by svitlana.lukicheva ▴ 10

0

Entering edit mode

that can not be the only output file, no? Can you check what the numbers in the fasta (output) files are?

also: what was the exact command you executed?

ADD REPLY • link 4.9 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you for your reply!

I also have a fasta file with AA and another with coding sequences, both containing 54 k genes.

The command I executed is:

braker.pl --cores 16 --species=mySpecies --genome=genome_softmasked.fa --bam=rnaseq_masked_sorted.bam --softmasking --gff3

ADD REPLY • link 4.9 years ago by svitlana.lukicheva ▴ 10