How to fix GTF files by adding specific strings into empty gene_id ""
1
0
Entering edit mode
2.7 years ago
sasa ▴ 10

Hi,

I want to repair GTF file by adding a unique string (such as Product name) to empty gene_id "". I would really appreciate it if anyone could provide any solution.

For example:

grep -m1 'gene_id ""' mygtf.gtf

NC_001717.1 RefSeq  exon    1004    1071    .   +   .   **gene_id ""**; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

I want to add the product name between the double quotes right after the gene_id like:

NC_001717.1 RefSeq  exon    1004    1071    .   +   .   gene_id "tRNA-Phe"; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

I have 24 empty gene_id, and need to fix all of them. I obtained this file from NCBI RefSeq. Unfortunately, this species is not available from the Ensemble database.

The original reason why I would like to fix the GTF file is to filter GTF file with cellragner mkgtf. I am getting the below error, so I need to modify the GTF file.

cellranger.reference.GtfParseError: Error while parsing GTF file /~/genome/mygtf.gtf Property 'gene_id' is empty in GTF line 1809658: NC_001717.1 RefSeq exon 1004 1071 . + gene_id ""; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

Thank you!

cellranger GTF Annotation UNIX • 2.0k views
ADD COMMENT
1
Entering edit mode
2.7 years ago
Juke34 8.5k

AGAT has a script for that type of task. Look at agat_sp_manage_attributes.pl with --att product/gene_id --cp --overwrite.
I doubt you choose the good solution. Different genes may have the same product value. The best would be to use agat_sp_manage_attributes.pl to remove all gene_id attributes. Then just use agat_convert_sp_gff2gtf.pl it should recreate proper gene_id attributes.

ADD COMMENT
0
Entering edit mode

Thank you so much for your suggestion. And, yes, my idea to use the product name was not great. I was able to convert my gtf by using the functions you listed above from AGAT. Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 3148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6