Question: changing ID in an existing GFF3 file
0
gravatar for Ric
11 months ago by
Ric330
Australia
Ric330 wrote:

I have an annotation file in GFF3 file, but I do not have the amino acid and cds sequences anymore. Is there a tool which can retrieve those files from a genome in FASTA format and a GFF3 file?

Thank you in advance

annotation gene • 561 views
ADD COMMENTlink modified 10 months ago by lieven.sterck8.9k • written 11 months ago by Ric330

Could anyone please revert the question and title to the previous version?

ADD REPLYlink written 10 months ago by Ric330

Hi Ric

we were able to trace back the original post title, however we don't have the ability to get the original post content back. Perhaps you are best placed to re-create it?

ADD REPLYlink written 10 months ago by lieven.sterck8.9k
0
gravatar for Juke34
11 months ago by
Juke344.9k
Sweden
Juke344.9k wrote:

You can try with agat_sp_manage_IDs.pl from AGAT

ADD COMMENTlink modified 11 months ago • written 11 months ago by Juke344.9k

I installed AGAT. Could you please show me how to change the id from g65212 to AT1G01010.1, AT1G01020.1, AT1G01030.1 with agat_sp_manage_IDs.pl ?

ADD REPLYlink written 10 months ago by Ric330

Did you invoke the help to see?

ADD REPLYlink written 10 months ago by Juke344.9k

I looked at the help but I did not understand it.

ADD REPLYlink written 10 months ago by Ric330

I agree, it should be improved :)

agat_sp_manage_IDs.pl --gff yourfile.gff --prefix AT1 -o result.gff

For the first gene the ID will be AT1G1 for the second AT1G2...
For the first mRNA the ID will be AT1M1 for the second AT1M2...
I hope it is what you want.

ADD REPLYlink modified 10 months ago • written 10 months ago by Juke344.9k

I updated my question which might explain better how I would like to change the IDs.

ADD REPLYlink written 10 months ago by Ric330

AGAT does not in that way currently but it Could be updated in a future version to follow this convention if it is something largely used (several large DBs)

ADD REPLYlink written 10 months ago by Juke344.9k

For example, the Arabidopsis group does it ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff . Does AGAT can anything close to it?

ADD REPLYlink modified 10 months ago • written 10 months ago by Ric330

I updated my question (update 2). However, why the first gene has an ID of AT1G00000068467, but its mRNA has an ID of ID=AT1M00000076570? Should gene ID not start with 1 because of --nb 1?

ADD REPLYlink written 10 months ago by Ric330

with or without --nb 1 it should start numbering at 1. What if you grep '0000001' in the file? you should find something starting at 1...

ADD REPLYlink written 10 months ago by Juke344.9k

I found NbV1Ch04 AUGUSTUS gene 61731467 61732149 0.15 + . ID=AT1G00000010000. Why are they 0000 after the 1 and why NbV1Ch04 is the first one rather NbV1Ch04?

ADD REPLYlink written 10 months ago by Ric330

It shouldn't be the only result this is the 10000 gene but if you grep for "AT1G00000000001" you will find the first. I just checked and it works for me. The only problem, is that when it propagates the ID it does not do it in the same way (order) it prints the result at the End. So the first line in the output is not necessarily the first number. I will open an issue in the repo to improve that for the next release.

ADD REPLYlink modified 10 months ago • written 10 months ago by Juke344.9k

I found it in my last chromosome:

NbV1Ch19        AUGUSTUS        gene    97401   99254   0.03    -       .       ID=AT1G00000000001
NbV1Ch19        AUGUSTUS        mRNA    97401   99254   0.03    -       .       ID=AT1M00000000001;Parent=AT1G00000000001
NbV1Ch19        AUGUSTUS        exon    97401   99007   .       -       .       ID=AT1E00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        exon    99101   99254   .       -       .       ID=AT1E00000000002;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        CDS     98823   99007   0.36    -       2       ID=AT1C00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        CDS     99101   99230   0.68    -       0       ID=AT1C00000000002;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        five_prime_utr  99231   99254   0.25    -       .       ID=AT1F00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        intron  99008   99100   0.69    -       .       ID=AT1I00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        start_codon     99228   99230   .       -       0       ID=AT1S00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        stop_codon      98823   98825   .       -       0       ID=AT1ST00000000001;Parent=AT1M00000000001
NbV1Ch19        AUGUSTUS        three_prime_utr 97401   98822   0.05    -       .       ID=AT1T00000000001;Parent=AT1M00000000001

Why there is such big difference in IDs between a gene and its sub-features?

NbV1Ch01        AUGUSTUS        gene    97932   99714   0.06    -       .       ID=AT1G00000068467
NbV1Ch01        AUGUSTUS        mRNA    97932   99714   0.06    -       .       ID=AT1M00000076570;Parent=AT1G00000068467
NbV1Ch01        AUGUSTUS        exon    97932   98571   .       -       .       ID=AT1E00000339808;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        exon    98679   98844   .       -       .       ID=AT1E00000339809;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        exon    99134   99325   .       -       .       ID=AT1E00000339810;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        exon    99417   99714   .       -       .       ID=AT1E00000339811;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        CDS     98177   98571   1       -       2       ID=AT1C00000294005;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        CDS     98679   98844   1       -       0       ID=AT1C00000294006;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        CDS     99134   99325   1       -       0       ID=AT1C00000294007;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        CDS     99417   99668   0.65    -       0       ID=AT1C00000294008;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        five_prime_utr  99669   99714   0.14    -       .       ID=AT1F00000101217;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        intron  98572   98678   1       -       .       ID=AT1I00000123933;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        intron  98845   99133   1       -       .       ID=AT1I00000123934;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        intron  99326   99416   1       -       .       ID=AT1I00000123935;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        start_codon     99666   99668   .       -       0       ID=AT1S00000057436;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        stop_codon      98177   98179   .       -       0       ID=AT1ST00000057445;Parent=AT1M00000076570
NbV1Ch01        AUGUSTUS        three_prime_utr 97932   98176   0.44    -       .       ID=AT1T00000096168;Parent=AT1M00000076570
ADD REPLYlink written 10 months ago by Ric330

Let's say for 1 gene you have 10 exon, when you are at your 150 gene, its first exon will be numbered 15000 and its last exon 15010. So it is just related of how many of numbered feature has been met before.

ADD REPLYlink modified 10 months ago • written 10 months ago by Juke344.9k

What confused me on my previous commend pasted output data is that gene id is AT1G00000068467, the mRNA is AT1M00000076570 and the first exon ID is AT1E00000339808. Why is it not for mRNA ID AT1G00000068468 and for the first exon ID AT1G00000068469?

ADD REPLYlink written 10 months ago by Ric330

Because it is numbered by feature type (3rd column) independently, here an example:

NbV1Ch01        AUGUSTUS        gene    97932   99714   0.06    -       .       ID=gene1
NbV1Ch01        AUGUSTUS        mRNA    97932   99714   0.06    -       .       ID=mRNA1
NbV1Ch01        AUGUSTUS        exon    97932   98571   .       -       .       ID=exon1
NbV1Ch01        AUGUSTUS        exon    98679   98844   .       -       .       ID=exon2
NbV1Ch01        AUGUSTUS        exon    99134   99325   .       -       .       ID=exon3
NbV1Ch01        AUGUSTUS        exon    99417   99714   .       -       .       ID=exon4
NbV1Ch01        AUGUSTUS        CDS     98177   98571   1       -       2       ID=cds1
NbV1Ch01        AUGUSTUS        CDS     98679   98844   1       -       0       ID=cds2
NbV1Ch01        AUGUSTUS        CDS     99134   99325   1       -       0       ID=cds3
NbV1Ch01        AUGUSTUS        CDS     99417   99668   0.65    -       0       ID=cds4
NbV1Ch01        AUGUSTUS        mRNA    97935   99711   0.06    -       .       ID=mRNA2
NbV1Ch01        AUGUSTUS        exon    97935   98571   .       -       .       ID=exon5
NbV1Ch01        AUGUSTUS        exon    98679   98844   .       -       .       ID=exon6
NbV1Ch01        AUGUSTUS        exon    99134   99325   .       -       .       ID=exon7
NbV1Ch01        AUGUSTUS        exon    99417   99711   .       -       .       ID=exon8
NbV1Ch01        AUGUSTUS        CDS     98177   98571   1       -       2       ID=cds5
NbV1Ch01        AUGUSTUS        CDS     98679   98844   1       -       0       ID=cds6
NbV1Ch01        AUGUSTUS        CDS     99134   99325   1       -       0       ID=cds7
NbV1Ch01        AUGUSTUS        gene    109665  112554  0.04    -       .       ID=gene2
NbV1Ch01        AUGUSTUS        mRNA    109665  112554  0.04    -       .       ID=mRNA3
NbV1Ch01        AUGUSTUS        exon    109665  110489  .       -       .       ID=exon9
NbV1Ch01        AUGUSTUS        exon    110608  111042  .       -       .       ID=exon10
NbV1Ch01        AUGUSTUS        exon    111592  111844  .       -       .       ID=exon11
NbV1Ch01        AUGUSTUS        exon    112128  112554  .       -       .       ID=exon12
NbV1Ch01        AUGUSTUS        CDS     109839  110489  0.69    -       0       ID=cds8
NbV1Ch01        AUGUSTUS        CDS     110608  111042  0.21    -       0       ID=cds9
NbV1Ch01        AUGUSTUS        CDS     111592  111844  0.23    -       1       ID=cds10
NbV1Ch01        AUGUSTUS        CDS     112128  112450  0.95    -       0       ID=cds11
ADD REPLYlink written 10 months ago by Juke344.9k

I understand, but would it be less confusing if numbered by feature type would be dependent?

ADD REPLYlink written 10 months ago by Ric330

It sounds coherent, It is noted, I will modify the code for a next release. Would you check here it is what you expect? https://github.com/NBISweden/AGAT/issues/16

ADD REPLYlink modified 10 months ago • written 10 months ago by Juke344.9k

Could you not simply replace NbV1Ch01 withchr1 (and so on for others) using sed or similar tool? Changing AUGUSTUS to TAIR10 could be done in a similar way but does it make sense. That is just an identifier any way.

ADD REPLYlink written 10 months ago by genomax92k

I am sorry for the confusion, but I would like to change the IDs in the last column.

ADD REPLYlink written 10 months ago by Ric330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1518 users visited in the last hour