editing gtf file
1
0
Entering edit mode
16 months ago

I have a gtf file as follow:

KB705106        VEuPathDB       exon    3645    3767    0       -       .       gene_id ""; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       CDS     3645    3767    0       -       2       gene_id ""; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       exon    3975    4065    0       -       .       gene_id ""; transcript_id "AARA010198-RA";


I want to copy the first 10 characters of the gene transcript id and paste it to the corresponding gene id as follow:

KB705106        VEuPathDB       exon    3645    3767    0       -       .       gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       CDS     3645    3767    0       -       2       gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       exon    3975    4065    0       -       .       gene_id "AARA010198"; transcript_id "AARA010198-RA";


Please, what is the easiest way to do this?

Thank you. ~DD

gtf gee edit • 437 views
0
Entering edit mode

what is the easiest way to do this?

There are many different ways to parse and reformat text files. The easiest for you will depend on the scripting language you are most familiar with. For instance, I would personally use R (with the read.table(), sapply() and strsplit() functions), but there are also good options in python/perl, and the most efficient way would probably be in bash/awk. What do you prefer ?

1
Entering edit mode
16 months ago

Here is a perl one-liner that would do the job:

perl -pe 's/gene_id ""; transcript_id "([^"]{1,10})/gene_id "$1"; transcript_id "$1/' input.gtf > output.gtf


The pattern [^"]{1,10} matches the first 10 characters of transcript_id, even if its length is shorter.