Question

BRAKER3 genome annotation

3

Entering edit mode

14 days ago

manaswiniparija3 ▴ 40

i wanted to annotate a genome file available In NCBI. I used RNAseq data available for the same organism In EBISRA and a set of proteins available in NCBI as input to BRAKER3 as evidence. I got the BRAKER 3 result as a GFF3 file. it gave the mRNA positions in the genome but did not give the name or id of proteins or mRNA. how should I annotate further ????

annotation BRAKER3 genome • 570 views

ADD COMMENT • link updated 12 days ago by dariober 14k • written 14 days ago by manaswiniparija3 ▴ 40

0

Entering edit mode

Could you share some lines from your gff3?

ADD REPLY • link 14 days ago by sansan_96 ▴ 80

0

Entering edit mode

ya sure

CM041852.1  AUGUSTUS    mRNA    36422   45212   1   +   .   ID=g1.t1;Parent=g1;
CM041852.1  AUGUSTUS    start_codon 36422   36424   .   +   0   ID=g1.t1.start1;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 36422   36793   1   +   0   ID=g1.t1.CDS1;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    36422   36793   .   +   .   ID=g1.t1.exon1;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  36794   40677   1   +   .   ID=g1.t1.intron1;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 40678   40743   1   +   0   ID=g1.t1.CDS2;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    40678   40743   .   +   .   ID=g1.t1.exon2;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  40744   42124   1   +   .   ID=g1.t1.intron2;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 42125   42277   1   +   0   ID=g1.t1.CDS3;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    42125   42277   .   +   .   ID=g1.t1.exon3;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  42278   43242   1   +   .   ID=g1.t1.intron3;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 43243   43377   1   +   0   ID=g1.t1.CDS4;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    43243   43377   .   +   .   ID=g1.t1.exon4;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  43378   45038   1   +   .   ID=g1.t1.intron4;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 45039   45212   1   +   0   ID=g1.t1.CDS5;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    45039   45212   .   +   .   ID=g1.t1.exon5;Parent=g1.t1;
CM041852.1  AUGUSTUS    stop_codon  45210   45212   .   +   0   ID=g1.t1.stop1;Parent=g1.t1;
CM041852.1  AUGUSTUS    gene    50766   84289   .   -   .   ID=g2;
CM041852.1  AUGUSTUS    mRNA    50766   84289   1   -   .   ID=g2.t1;Parent=g2;
CM041852.1  AUGUSTUS    stop_codon  50766   50768   .   -   0   ID=g2.t1.stop1;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 50766   50874   1   -   1   ID=g2.t1.CDS1;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    50766   50874   .   -   .   ID=g2.t1.exon1;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  50875   58490   1   -   .   ID=g2.t1.intron1;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 58491   58585   1   -   0   ID=g2.t1.CDS2;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    58491   58585   .   -   .   ID=g2.t1.exon2;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  58586   63406   1   -   .   ID=g2.t1.intron2;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 63407   63569   1   -   1   ID=g2.t1.CDS3;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    63407   63569   .   -   .   ID=g2.t1.exon3;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  63570   83800   1   -   .   ID=g2.t1.intron3;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 83801   83917   1   -   1   ID=g2.t1.CDS4;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    83801   83917   .   -   .   ID=g2.t1.exon4;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  83918   84224   1   -   .   ID=g2.t1.intron4;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 84225   84289   1   -   0   ID=g2.t1.CDS5;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    84225   84289   .   -   .   ID=g2.t1.exon5;Parent=g2.t1;
CM041852.1  AUGUSTUS    start_codon 84287   84289   .   -   0   ID=g2.t1.start1;Parent=g2.t1;
CM041852.1  AUGUSTUS    gene    88181   95729   .   +   .   ID=g3;
CM041852.1  AUGUSTUS    mRNA    88181   95729   1   +   .   ID=g3.t1;Parent=g3;
CM041852.1  AUGUSTUS    start_codon 88181   88183   .   +   0   ID=g3.t1.start1;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 88181   88879   1   +   0   ID=g3.t1.CDS1;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    88181   88879   .   +   .   ID=g3.t1.exon1;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  88880   90686   1   +   .   ID=g3.t1.intron1;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 90687   90814   1   +   0   ID=g3.t1.CDS2;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    90687   90814   .   +   .   ID=g3.t1.exon2;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  90815   91483   1   +   .   ID=g3.t1.intron2;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 91484   91641   1   +   1   ID=g3.t1.CDS3;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    91484   91641   .   +   .   ID=g3.t1.exon3;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  91642   93660   1   +   .   ID=g3.t1.intron3;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 93661   93744   1   +   2   ID=g3.t1.CDS4;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    93661   93744   .   +   .   ID=g3.t1.exon4;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  93745   94643   1   +   .   ID=g3.t1.intron4;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 94644   94768   1   +   2   ID=g3.t1.CDS5;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    94644   94768   .   +   .   ID=g3.t1.exon5;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  94769   94866   1   +   .   ID=g3.t1.intron5;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 94867   95068   1   +   0   ID=g3.t1.CDS6;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    94867   95068   .   +   .   ID=g3.t1.exon6;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  95069   95445   1   +   .   ID=g3.t1.intron6;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 95446   95729   1   +   2   ID=g3.t1.CDS7;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    95446   95729   .   +   .   ID=g3.t1.exon7;Parent=g3.t1;
CM041852.1  AUGUSTUS    stop_codon  95727   95729   .   +   0   ID=g3.t1.stop1;Parent=g3.t1;

ADD REPLY • link updated 14 days ago by dariober 14k • written 14 days ago by manaswiniparija3 ▴ 40

0

Entering edit mode

Perhaps eggNOG-MAPPER can be useful for functional annotation:

This will give you the description of your proteins.

ADD REPLY • link 13 days ago by sansan_96 ▴ 80

score 2 · Answer 1 · 2024-04-15

2

Entering edit mode

14 days ago

dariober 14k

[BRAKER3] gave the mRNA positions in the genome but did not give the name or id of proteins or mRNA

I guess you want to know which proteins in the newly annotated genome correspond to proteins in the reference annotation(s). One option is to run a program for finding orthologs, e.g. orthofinder, but I'd like to hear of alternatives.

EDIT:

Consider also liftoff: "accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species"
Possibly related: You can also annotate proteins with pfam domains using for example hmmer. This is biologically more meaningful than assigning names from one assembly to another.

If any useful I have a snakemake pipeline for running braker, galba, merge annotations from these two, and annotate with pfam.

ADD COMMENT • link 12 days ago by dariober 14k

1

Entering edit mode

I agree with this answer. You only got the structural annotation (ORF coordinates) from BRAKER. Using an ortholog finder to assign a protein/gene ID is possible, but only if you use the same species as source and target (genomes to find orthologs) and those IDs have been validated.

ADD REPLY • link 14 days ago by Buffo ★ 2.4k

0

Entering edit mode

I also have this exact same question. I have my entire genome annotated and a braker.GTF file, but without protein "product" names.

ADD REPLY • link 13 days ago by andorjkiss ▴ 40

0

Entering edit mode

I suggest searching for the difference between structural annotation (identifying the coordinates of the ORFs in a sequence) and functional annotation (assigning a function or the most likely function to an ORF/protein). Those are independent processes. As far as I remember, BRAKER doesn't include the functional annotation, that's why you can't find any "product names".

ADD REPLY • link 13 days ago by Buffo ★ 2.4k

0

Entering edit mode

Perhaps worth keeping in mind that protein "product" names is only a human construct. At the biological level you have protein biochemical properties and phylogenetic relationships. So while it's useful to assign protein names according to a reference annotation, ultimately you are still looking for orthologs.

ADD REPLY • link 12 days ago by dariober 14k