editing gtf file
1
0
Entering edit mode
3.3 years ago

I have a gtf file as follow:

KB705106        VEuPathDB       exon    3645    3767    0       -       .       gene_id ""; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       CDS     3645    3767    0       -       2       gene_id ""; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       exon    3975    4065    0       -       .       gene_id ""; transcript_id "AARA010198-RA";

I want to copy the first 10 characters of the gene transcript id and paste it to the corresponding gene id as follow:

KB705106        VEuPathDB       exon    3645    3767    0       -       .       gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       CDS     3645    3767    0       -       2       gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106        VEuPathDB       exon    3975    4065    0       -       .       gene_id "AARA010198"; transcript_id "AARA010198-RA";

Please, what is the easiest way to do this?

Thank you. ~DD

gtf gee edit • 1.0k views
ADD COMMENT
0
Entering edit mode

what is the easiest way to do this?

There are many different ways to parse and reformat text files. The easiest for you will depend on the scripting language you are most familiar with. For instance, I would personally use R (with the read.table(), sapply() and strsplit() functions), but there are also good options in python/perl, and the most efficient way would probably be in bash/awk. What do you prefer ?

ADD REPLY
1
Entering edit mode
3.3 years ago

Here is a perl one-liner that would do the job:

perl -pe 's/gene_id ""; transcript_id "([^"]{1,10})/gene_id "$1"; transcript_id "$1/' input.gtf > output.gtf

The pattern [^"]{1,10} matches the first 10 characters of transcript_id, even if its length is shorter.

ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6