Correct gtf file format (AGAT toolkit)
1
0
Entering edit mode
3.9 years ago
tianshenbio ▴ 170

I used agat_convert_sp_gff2gtf.pl of AGAT toolkit to convert my gff file to gtf file. In the converted gtf file, the double quotes of 'gene_id' are missing:

Bany_Scaf1  maker   gene    201136  207903  .   +   .   Alias "maker-Bany_Scaf1-snap-gene-2.23"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID Bany_03723; Name Bany_03723; Ontology_term "GO:0016714" "GO:0055114"; gene_id Bany_03723
Bany_Scaf1  maker   transcript  201136  207903  .   +   .   Alias "maker-Bany_Scaf1-snap-gene-2.23-mRNA-1"; Dbxref "InterPro:IPR019774" "Pfam:PF00351"; ID "Bany_03723-RA"; Name "Bany_03723-RA"; Ontology_term "GO:0016714" "GO:0055114"; Parent Bany_03723; _AED "0.06"; _QI "45|1|1|1|1|1|7|425|530"; _eAED "0.06"; gene_id Bany_03723; original_biotype mrna; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    201136  201304  .   +   .   ID "Bany_03723-RA:1"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    202687  202770  .   +   .   ID "Bany_03723-RA:2"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    202886  202921  .   +   .   ID "Bany_03723-RA:3"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    203004  203820  .   +   .   ID "Bany_03723-RA:4"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    206097  206223  .   +   .   ID "Bany_03723-RA:5"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    206649  206878  .   +   .   ID "Bany_03723-RA:6"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   exon    207304  207903  .   +   .   ID "Bany_03723-RA:7"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 201181  201304  .   +   0   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 202687  202770  .   +   2   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 202886  202921  .   +   2   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 203004  203820  .   +   2   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 206097  206223  .   +   1   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 206649  206878  .   +   0   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   CDS 207304  207478  .   +   1   ID "Bany_03723-RA:cds"; Parent "Bany_03723-RA"; gene_id Bany_03723; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   five_prime_utr  201136  201180  .   +   .   ID "Bany_03723-RA:five_prime_utr"; Parent "Bany_03723-RA"; gene_id Bany_03723; original_biotype five_prime_UTR; transcript_id "Bany_03723-RA" 
Bany_Scaf1  maker   three_prime_utr 207479  207903  .   +   .   ID "Bany_03723-RA:three_prime_utr"; Parent "Bany_03723-RA"; gene_id Bany_03723; original_biotype three_prime_UTR; transcript_id "Bany_03723-RA"

my gff (already corrected by AGAT.

Bany_Scaf1  maker   gene    201136  207903  .   +   .   ID=Bany_03723;Alias=maker-Bany_Scaf1-snap-gene-2.23;Dbxref=InterPro:IPR019774,Pfam:PF00351;Name=Bany_03723;Ontology_term=GO:0016714,GO:0055114
Bany_Scaf1  maker   mRNA    201136  207903  .   +   .   ID=Bany_03723-RA;Parent=Bany_03723;Alias=maker-Bany_Scaf1-snap-gene-2.23-mRNA-1;Dbxref=InterPro:IPR019774,Pfam:PF00351;Name=Bany_03723-RA;Ontology_term=GO:0016714,GO:0055114;_AED=0.06;_QI=45|1|1|1|1|1|7|425|530;_eAED=0.06
Bany_Scaf1  maker   exon    201136  201304  .   +   .   ID=Bany_03723-RA:1;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    202687  202770  .   +   .   ID=Bany_03723-RA:2;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    202886  202921  .   +   .   ID=Bany_03723-RA:3;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    203004  203820  .   +   .   ID=Bany_03723-RA:4;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    206097  206223  .   +   .   ID=Bany_03723-RA:5;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    206649  206878  .   +   .   ID=Bany_03723-RA:6;Parent=Bany_03723-RA
Bany_Scaf1  maker   exon    207304  207903  .   +   .   ID=Bany_03723-RA:7;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 201181  201304  .   +   0   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 202687  202770  .   +   2   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 202886  202921  .   +   2   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 203004  203820  .   +   2   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 206097  206223  .   +   1   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 206649  206878  .   +   0   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   CDS 207304  207478  .   +   1   ID=Bany_03723-RA:cds;Parent=Bany_03723-RA
Bany_Scaf1  maker   five_prime_UTR  201136  201180  .   +   .   ID=Bany_03723-RA:five_prime_utr;Parent=Bany_03723-RA
Bany_Scaf1  maker   three_prime_UTR 207479  207903  .   +   .   ID=Bany_03723-RA:three_prime_utr;Parent=Bany_03723-RA

How can I add the missing double quotes?

RNA-Seq gff gtf AGAT gff3 • 2.5k views
ADD COMMENT
0
Entering edit mode

AGAT should produce a correct GTF file. Is there anything wrong with your GFF? A new version of AGAT was released recently and a few issues were fixed. Try to update it.

ADD REPLY
0
Entering edit mode

Added my gff in the post. My gff file was checked and correct by AGAT already, and I am using v0.3.0

ADD REPLY
0
Entering edit mode
3.9 years ago
Juke34 8.5k

Hi, thank you for pointing it, I had forgot about it! The problem is related to Bioperl see here. I have a patch to fix the problem in Bioperl but I was waiting some feedbacks. I will try to include the necessary changes specifically in the agat_convert_sp_gff2gtf.pl script. It should be fixed in a next release.

ADD COMMENT
0
Entering edit mode

Hi, thank you for your updates! I managed to fix it in Linux, hope this can be fixed in the next release.

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6