Question: replace gene names in gff file
0
gravatar for arunprasanna83
11 months ago by
arunprasanna8340 wrote:

I have gff file generated by braker. It gives the default gene name like the following:

# start gene g1
CC151   AUGUSTUS    gene    5487 15014 0.36 -   .   g1
CC151   AUGUSTUS    transcript  5487 15014 0.36 -   .   g1.t1
CC151   AUGUSTUS    terminal    5487 5697 1 -   1   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6385 6467 1 -   2   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6550 6622 1 -   0   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    intron  6714 6854 1 -   0   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    CDS 6998 7110 1 -   2   transcript_id "g1.t1"; gene_id "g1";
CC151   AUGUSTUS    CDS 7888 7941 1 -   0   transcript_id "g1.t1"; gene_id "g1";
# end gene g1

I wish to change the default g1 to custom name for example "CC151_gene1". I tried to create a list of all gene ids and corresponding replace texts and tried the following:

g1   CC151_gene1
g2   CC151_gene2

grep -f gene.replacement.txt mygfffile.gff > replaced.gfffile.gff

However, my original file was not modified. Can anyone suggest a better method ?

Thanks in advance.

ADD COMMENTlink modified 11 months ago by Juke344.9k • written 11 months ago by arunprasanna8340
0
gravatar for Juke34
11 months ago by
Juke344.9k
Sweden
Juke344.9k wrote:

I would suggest to use agat_sp_manage_IDs.pl from AGAT. In same time it will standardize your output file which is not correct (9th column should contain ’tag value’ attribute and it is not the case for gene and transcript)

ADD COMMENTlink modified 11 months ago • written 11 months ago by Juke344.9k

Hi Juke, Thanks for suggesting AGAT. It does work, but the problem is the naming is too long. For instance, the total number of genes i have is 28540 but I get a gene name like M000000000001. this is close to 12 places.

ADD REPLYlink written 11 months ago by arunprasanna8340

Ya I implemented like that to follow what does Ensembl. What you could do now that your file is standardized by AGAT, is to use agat_sq_manage_ID.pl (Do not use this script with your original file because it expects a properly formatted gff3. All script with sq prefix need a proper gff3 file )

ADD REPLYlink written 11 months ago by Juke344.9k

I have fixed it in AGAT version 0.1.0

ADD REPLYlink modified 11 months ago • written 11 months ago by Juke344.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1224 users visited in the last hour