Hi everyone,
I am relatively new to bioinformatics. Currently, I’m working on Whole Genome Bisulfite Sequencing (WGBS) analysis for our target organism, which is a non-model fish species. As part of my research, I am also performing genome annotation.
I’ve completed the structural annotation using BRAKER3 and have obtained the corresponding .gtf, .aa, and .codingseq files. Now, using BLASTp, InterProScan, and EggNOG, I’ve identified gene names corresponding to the "gene_id" entries in my GTF file.
My question is: Is it possible to modify the GTF file so that, instead of arbitrary gene IDs, it displays the corresponding gene names? This would make it easier for our team to browse the genome and immediately recognize gene identities.
If this is feasible, should I use the results from BLASTp or EggNOG to make the replacements? Also, what tools or software would you recommend for editing the GTF file accordingly?
I’ve included an example below:
GTF:
JASCQY010000001.1 gmst gene 712854 716064 . + . g26
JASCQY010000001.1 gmst transcript 712854 716064 . + . g26.t1
JASCQY010000001.1 gmst start_codon 712854 712856 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst CDS 712854 713246 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst exon 712854 713246 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst intron 713247 714043 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst CDS 714044 714202 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst exon 714044 714202 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst intron 714203 714407 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst CDS 714408 714606 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst exon 714408 714606 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst intron 714607 714770 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst CDS 714771 714866 227.542526 + 2 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst exon 714771 714866 227.542526 + 2 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst intron 714867 715165 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst CDS 715166 716064 227.542526 + 2 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst exon 715166 716064 227.542526 + 2 transcript_id "g26.t1"; gene_id "g26";
JASCQY010000001.1 gmst stop_codon 716062 716064 227.542526 + 0 transcript_id "g26.t1"; gene_id "g26";
Functional annotation table:
Blastp and Interproscan
Tags SeqName Description Length #Hits e-Value sim mean #GO GO IDs GO Names Enzyme Codes Enzyme Names InterPro IDs InterPro GO IDs InterPro GO Names
true [INTERPRO, BLASTED, MAPPED, ANNOTATED] g26.t1 granulin a isoform X2 559 2 8.64E-4 92.86 1 C:GO:0005576 C:extracellular region no IPS match no IPS match no IPS match
EggNOG
Type Query ID Gene Name EggNOG Description E-Value Bit-Score Best Tax-Level EC Codes #GO GOs GO Names KEGG KO KEGG Pathway
KOG,ENOG g26.t1 GRN Granulin 0.075 45.8 Chiroptera 21.0 P:GO:0007566; P:GO:0010469; P:GO:0032355; P:GO:1900006; P:GO:0007618; P:GO:0060179; P:GO:0009725; P:GO:0060999; P:GO:0043312; C:GO:0005783; F:GO:0008083; P:GO:0045666; C:GO:0035578; P:GO:0061351; P:GO:0001835; P:GO:0048488; C:GO:0005768; C:GO:0005615; P:GO:0050769; P:GO:0035988; P:GO:0050679 P:positive regulation of neuron differentiation; C:endoplasmic reticulum; P:positive regulation of dendritic spine development; P:response to estradiol; P:positive regulation of epithelial cell proliferation; P:positive regulation of dendrite development; C:extracellular space; C:endosome; P:positive regulation of neurogenesis; P:neural precursor cell proliferation; P:chondrocyte proliferation; P:regulation of signaling receptor activity; P:male mating behavior; P:embryo implantation; P:neutrophil degranulation; P:response to hormone; P:synaptic vesicle endocytosis; P:mating; C:azurophil granule lumen; P:blastocyst hatching; F:growth factor activity
Loosely related, it's better if you work with the gff3 format since gtf (aka gff2) is deprecated. Braker's output *.gff should be in gff3 format.
Hi!
Yes! I have looked into the GFF3 but apparently I was not able to input --gff command in braker when I was running it last month. right now, I am redoing my annotation so that I could have a gff3 output!
Thank you so much!