Question: replace gene names in gff file
0
gravatar for arunprasanna83
5 weeks ago by
arunprasanna8330 wrote:

I have gff file generated by braker. It gives the default gene name like the following:

# start gene g1
CC151   AUGUSTUS    gene    5487 15014 0.36 -   .   g1
CC151   AUGUSTUS    transcript  5487 15014 0.36 -   .   g1.t1
CC151   AUGUSTUS    terminal    5487 5697 1 -   1   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6385 6467 1 -   2   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6550 6622 1 -   0   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6714 6854 1 -   0   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    CDS 6998 7110 1 -   2   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    CDS 7888 7941 1 -   0   transcript_id "g1.t1"; gene_id "g1";
# end gene g1

I wish to change the default g1 to custom name for example "CC151_gene1". I tried to create a list of all gene ids and corresponding replace texts and tried the following:

g1   CC151_gene1
g2   CC151_gene2

grep -f gene.replacement.txt mygfffile.gff > replaced.gfffile.gff

However, my original file was not modified. Can anyone suggest a better method ?

Thanks in advance.

ADD COMMENTlink modified 5 weeks ago by Juke-343.3k • written 5 weeks ago by arunprasanna8330
0
gravatar for Juke-34
5 weeks ago by
Juke-343.3k
Sweden
Juke-343.3k wrote:

I would suggest to use agat_sp_manage_IDs.pl from AGAT. In same time it will standardize your output file which is not correct (9th column should contain ’tag value’ attribute and it is not the case for gene and transcript)

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Juke-343.3k

Hi Juke, Thanks for suggesting AGAT. It does work, but the problem is the naming is too long. For instance, the total number of genes i have is 28540 but I get a gene name like M000000000001. this is close to 12 places.

ADD REPLYlink written 5 weeks ago by arunprasanna8330

Ya I implemented like that to follow what does Ensembl. What you could do now that your file is standardized by AGAT, is to use agat_sq_manage_ID.pl (Do not use this script with your original file because it expects a properly formatted gff3. All script with sq prefix need a proper gff3 file )

ADD REPLYlink written 5 weeks ago by Juke-343.3k

I have fixed it in AGAT version 0.1.0

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Juke-343.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour