BRAKER3 genome annotation
1
3
Entering edit mode
6 weeks ago

i wanted to annotate a genome file available In NCBI. I used RNAseq data available for the same organism In EBISRA and a set of proteins available in NCBI as input to BRAKER3 as evidence. I got the BRAKER 3 result as a GFF3 file. it gave the mRNA positions in the genome but did not give the name or id of proteins or mRNA. how should I annotate further ????

annotation BRAKER3 genome • 715 views
ADD COMMENT
0
Entering edit mode

Could you share some lines from your gff3?

ADD REPLY
0
Entering edit mode

ya sure

CM041852.1  AUGUSTUS    mRNA    36422   45212   1   +   .   ID=g1.t1;Parent=g1;
CM041852.1  AUGUSTUS    start_codon 36422   36424   .   +   0   ID=g1.t1.start1;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 36422   36793   1   +   0   ID=g1.t1.CDS1;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    36422   36793   .   +   .   ID=g1.t1.exon1;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  36794   40677   1   +   .   ID=g1.t1.intron1;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 40678   40743   1   +   0   ID=g1.t1.CDS2;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    40678   40743   .   +   .   ID=g1.t1.exon2;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  40744   42124   1   +   .   ID=g1.t1.intron2;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 42125   42277   1   +   0   ID=g1.t1.CDS3;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    42125   42277   .   +   .   ID=g1.t1.exon3;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  42278   43242   1   +   .   ID=g1.t1.intron3;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 43243   43377   1   +   0   ID=g1.t1.CDS4;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    43243   43377   .   +   .   ID=g1.t1.exon4;Parent=g1.t1;
CM041852.1  AUGUSTUS    intron  43378   45038   1   +   .   ID=g1.t1.intron4;Parent=g1.t1;
CM041852.1  AUGUSTUS    CDS 45039   45212   1   +   0   ID=g1.t1.CDS5;Parent=g1.t1;
CM041852.1  AUGUSTUS    exon    45039   45212   .   +   .   ID=g1.t1.exon5;Parent=g1.t1;
CM041852.1  AUGUSTUS    stop_codon  45210   45212   .   +   0   ID=g1.t1.stop1;Parent=g1.t1;
CM041852.1  AUGUSTUS    gene    50766   84289   .   -   .   ID=g2;
CM041852.1  AUGUSTUS    mRNA    50766   84289   1   -   .   ID=g2.t1;Parent=g2;
CM041852.1  AUGUSTUS    stop_codon  50766   50768   .   -   0   ID=g2.t1.stop1;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 50766   50874   1   -   1   ID=g2.t1.CDS1;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    50766   50874   .   -   .   ID=g2.t1.exon1;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  50875   58490   1   -   .   ID=g2.t1.intron1;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 58491   58585   1   -   0   ID=g2.t1.CDS2;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    58491   58585   .   -   .   ID=g2.t1.exon2;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  58586   63406   1   -   .   ID=g2.t1.intron2;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 63407   63569   1   -   1   ID=g2.t1.CDS3;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    63407   63569   .   -   .   ID=g2.t1.exon3;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  63570   83800   1   -   .   ID=g2.t1.intron3;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 83801   83917   1   -   1   ID=g2.t1.CDS4;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    83801   83917   .   -   .   ID=g2.t1.exon4;Parent=g2.t1;
CM041852.1  AUGUSTUS    intron  83918   84224   1   -   .   ID=g2.t1.intron4;Parent=g2.t1;
CM041852.1  AUGUSTUS    CDS 84225   84289   1   -   0   ID=g2.t1.CDS5;Parent=g2.t1;
CM041852.1  AUGUSTUS    exon    84225   84289   .   -   .   ID=g2.t1.exon5;Parent=g2.t1;
CM041852.1  AUGUSTUS    start_codon 84287   84289   .   -   0   ID=g2.t1.start1;Parent=g2.t1;
CM041852.1  AUGUSTUS    gene    88181   95729   .   +   .   ID=g3;
CM041852.1  AUGUSTUS    mRNA    88181   95729   1   +   .   ID=g3.t1;Parent=g3;
CM041852.1  AUGUSTUS    start_codon 88181   88183   .   +   0   ID=g3.t1.start1;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 88181   88879   1   +   0   ID=g3.t1.CDS1;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    88181   88879   .   +   .   ID=g3.t1.exon1;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  88880   90686   1   +   .   ID=g3.t1.intron1;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 90687   90814   1   +   0   ID=g3.t1.CDS2;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    90687   90814   .   +   .   ID=g3.t1.exon2;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  90815   91483   1   +   .   ID=g3.t1.intron2;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 91484   91641   1   +   1   ID=g3.t1.CDS3;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    91484   91641   .   +   .   ID=g3.t1.exon3;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  91642   93660   1   +   .   ID=g3.t1.intron3;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 93661   93744   1   +   2   ID=g3.t1.CDS4;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    93661   93744   .   +   .   ID=g3.t1.exon4;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  93745   94643   1   +   .   ID=g3.t1.intron4;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 94644   94768   1   +   2   ID=g3.t1.CDS5;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    94644   94768   .   +   .   ID=g3.t1.exon5;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  94769   94866   1   +   .   ID=g3.t1.intron5;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 94867   95068   1   +   0   ID=g3.t1.CDS6;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    94867   95068   .   +   .   ID=g3.t1.exon6;Parent=g3.t1;
CM041852.1  AUGUSTUS    intron  95069   95445   1   +   .   ID=g3.t1.intron6;Parent=g3.t1;
CM041852.1  AUGUSTUS    CDS 95446   95729   1   +   2   ID=g3.t1.CDS7;Parent=g3.t1;
CM041852.1  AUGUSTUS    exon    95446   95729   .   +   .   ID=g3.t1.exon7;Parent=g3.t1;
CM041852.1  AUGUSTUS    stop_codon  95727   95729   .   +   0   ID=g3.t1.stop1;Parent=g3.t1;
ADD REPLY
0
Entering edit mode

Perhaps eggNOG-MAPPER can be useful for functional annotation:

  1. https://academic.oup.com/nar/article/47/D1/D309/5173662
  2. http://eggnog-mapper.embl.de/

This will give you the description of your proteins.

ADD REPLY
2
Entering edit mode
6 weeks ago

[BRAKER3] gave the mRNA positions in the genome but did not give the name or id of proteins or mRNA

I guess you want to know which proteins in the newly annotated genome correspond to proteins in the reference annotation(s). One option is to run a program for finding orthologs, e.g. orthofinder, but I'd like to hear of alternatives.

EDIT:

  • Consider also liftoff: "accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species"

  • Possibly related: You can also annotate proteins with pfam domains using for example hmmer. This is biologically more meaningful than assigning names from one assembly to another.

If any useful I have a snakemake pipeline for running braker, galba, merge annotations from these two, and annotate with pfam.

ADD COMMENT
1
Entering edit mode

I agree with this answer. You only got the structural annotation (ORF coordinates) from BRAKER. Using an ortholog finder to assign a protein/gene ID is possible, but only if you use the same species as source and target (genomes to find orthologs) and those IDs have been validated.

ADD REPLY
0
Entering edit mode

I also have this exact same question. I have my entire genome annotated and a braker.GTF file, but without protein "product" names.

ADD REPLY
0
Entering edit mode

I suggest searching for the difference between structural annotation (identifying the coordinates of the ORFs in a sequence) and functional annotation (assigning a function or the most likely function to an ORF/protein). Those are independent processes. As far as I remember, BRAKER doesn't include the functional annotation, that's why you can't find any "product names".

ADD REPLY
0
Entering edit mode

Perhaps worth keeping in mind that protein "product" names is only a human construct. At the biological level you have protein biochemical properties and phylogenetic relationships. So while it's useful to assign protein names according to a reference annotation, ultimately you are still looking for orthologs.

ADD REPLY

Login before adding your answer.

Traffic: 2109 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6