GFF3 files
2
0
Entering edit mode
9 months ago
Percy • 0

How could I create a gff file with the "Name" attribute? I have tried with both prodigal and prokka however, the gff files produced lack the name attribute which i need for a following analysis.

linux • 809 views
0
Entering edit mode

context is missing.

0
Entering edit mode

for instance, I annotated my MAGs using Prokka and Prodigal respectively. The gff file that I obtain afterwards lack the attribute "Name" eg. prodigal:

##gff-version  3
# Sequence Data: seqnum=1;seqlen=59792;seqhdr="NODE_23_length_59792_cov_23.204747"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="39|Rickettsia_conorii_Malish_7|B|32.4|11|1";gc_cont=32.40;transl_table=11;uses_sd=1
NODE_23_length_59792_cov_23.204747  Prodigal_v2.6.3 CDS 1   147 19.5    -   0   ID=1_1;partial=10;start_type=TTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.299;conf=98.71;score=18.89;cscore=30.86;sscore=-11.98;rscore=-0.99;uscore=-0.73;tscore=-9.61;
NODE_23_length_59792_cov_23.204747  Prodigal_v2.6.3 CDS 523 1983    198.6   -   0   ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.300;conf=99.99;score=198.00;cscore=196.39;sscore=1.61;rscore=-0.99;uscore=0.35;tscore=2.90;


prokka:

##gff-version 3
##sequence-region NODE_23_length_59792_cov_23.204747 1 59792
##sequence-region NODE_111_length_30229_cov_21.362365 1 30229
##sequence-region NODE_186_length_24472_cov_19.556948 1 24472
##sequence-region NODE_198_length_23498_cov_18.236489 1 23498
##sequence-region NODE_240_length_21525_cov_17.369632 1 21525


I need to get a gff file with the following attribute including "Name" eg.

LT795054.1      EMBL    CDS     2243    2450    .       -       0       ID=cds-SJX60001.1;Parent=gene-SRS1_00846;Dbxref=NCBI_GP:SJX60001.1;Name=SJX60001.1;gbkey=CDS;locus_tag=SRS1_00846;product=uncharacterized protein;protein_id=SJX60001.1


(...)

1
Entering edit mode

sigh... This comment is not an answer, you'd better add it to your original post. And add some formatting, for example enclose the gff sections with code blocks (the 101010 icon), have each line on their own line, etc.

Getting the 'name' attribute is a data analysts job. Prodigal will give you the gene predictions, you'll need to match those with functional annotations.

1
Entering edit mode
9 months ago
iraun 5.9k

You can use awk. Using this one-liner, the content of Dbbxref is copied and added to a new attribute "Name".

awk -F'\t' '{split($9,a,";");split(a[3],b,":"); newname=b[2]; print$0";NAME="newname}' your.gff3


This one-liner assumes that Dbbxref is the 3rd attribute in the 9th column.

0
Entering edit mode

If OP had formatted their added info in a more readable way, you'd probably have seen that the Dbxref is not an attribute provided by the orf caller:

NODE_23_length_59792_cov_23.204747  Prodigal_v2.6.3 CDS 1    147   19.5   -  0 ID=1_1;partial=10;start_type=TTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.299;conf=98.71;score=18.89;cscore=30.86;sscore=-11.98;rscore=-0.99;uscore=-0.73;tscore=-9.61;
NODE_23_length_59792_cov_23.204747  Prodigal_v2.6.3 CDS 523  1983  198.6  -  0 ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.300;conf=99.99;score=198.00;cscore=196.39;sscore=1.61;rscore=-0.99;uscore=0.35;tscore=2.90;

0
Entering edit mode

Definitely, that is very true. I hope he can play with the command though and adapt it to his needs, it is quite straightforward. For example, to copy the content of ID attribute, just change the code to the following:

awk -F'\t' '{split($9,a,";");split(a[1],b,"="); newname=b[2]; print$0";NAME="newname}' your.gff3

1
Entering edit mode
9 months ago
Juke34 7.7k

You can play with agat_sq_manage_attributes.pl AGAT and create a Name attribute from the ID.