Correct way to write the names in the "product" field of a GTF file
0
0
Entering edit mode
3 months ago
arturo.marin ▴ 10

Hi,

I did the assembly and the annotation of the genome of a eukaryotic organisms. The prediction of the genes and the possible proteins was done with AUGUSTUS, and the annotation of these genes was made using Blastp with the RefSeq database. For the names of some proteins, field "product", the name of a species appears between the symbols "[" "]", since it is the name that appears in the RefSeq database. Example "[Leishmania infantum JPCM5]". My question is if this name should be removed from the GTF file, since my species is not that. On the other hand, if I remove it, the "protein_id" field would continue to refer to the protein with all that name, including the symbols "[" "]" and the name of a species among them. I add a part of my GTF file as an example.

jcf7180000024611    AUGUSTUS    gene    2158    2691    1   -   .   gene_id "LPASSIMC3V1_1";
jcf7180000024611    AUGUSTUS    mRNA    2158    2691    1   -   .   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    stop_codon  2158    2160    .   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    CDS 2161    2691    1   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    start_codon 2689    2691    .   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    gene    3930    4637    1   -   .   gene_id "LPASSIMC3V1_2"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    mRNA    3930    4637    1   -   .   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    stop_codon  3930    3932    .   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    CDS 3933    4637    1   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    start_codon 4635    4637    .   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    gene    5850    6671    1   -   .   gene_id "LPASSIMC3V1_3"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    mRNA    5850    6671    1   -   .   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    stop_codon  5850    5852    .   -   0   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
gtf rnaseq annotation • 142 views
ADD COMMENT

Login before adding your answer.

Traffic: 1710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6