Question: Convert an 'intron-style' GFF3 file into an 'exon-style' GFF3 file
2
gravatar for Dan
4.8 years ago by
Dan500
Cambridge
Dan500 wrote:

I have a GFF3 file that doesn't have exons, instead it has introns, UTRs, start and stop codons:

0001.scaffold00002      AUGUSTUS        gene    1386    2772    0.12    +       .       ID=Bv_00001z1_qhas;Name=Bv_00001z1_qhas
0001.scaffold00002      AUGUSTUS        mRNA    1386    2772    0.12    +       .       ID=Bv_00001z1_qhas.t1;Parent=Bv_00001z1_qhas;Name=Bv_00001z1_qhas.t1 0%;Note=cDNAcoverage_0%
0001.scaffold00002      AUGUSTUS        five_prime_UTR  1386    1976    .       +       .       ID=Bv_00001z1_qhas.t1.UTR;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        start_codon     1977    1979    .       +       0       ID=Bv_00001z1_qhas.t1.start_codon;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        CDS     1977    2325    0.96    +       0       ID=Bv_00001z1_qhas.t1.CDS;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        intron  2326    2619    0.81    +       .       ID=Bv_00001z1_qhas.t1.intron;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        CDS     2620    2747    0.8     +       2       ID=Bv_00001z1_qhas.t1.CDS;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        stop_codon      2745    2747    .       +       0       ID=Bv_00001z1_qhas.t1.stop_codon;Parent=Bv_00001z1_qhas.t1
0001.scaffold00002      AUGUSTUS        three_prime_UTR 2748    2772    .       +       .       ID=Bv_00001z1_qhas.t1.UTR;Parent=Bv_00001z1_qhas.t1

I can convert this to 'exon-style' by calculating the exons from the above, but I'm wondering if there is an 'off the shelf' solution?

Cheers,
Dan.

intron conversion exon format gff3 • 2.8k views
ADD COMMENTlink modified 4.5 years ago by Daniel Standage3.8k • written 4.8 years ago by Dan500
5
gravatar for Dan
4.8 years ago by
Dan500
Cambridge
Dan500 wrote:

Actually, this can be done with GenomeTools. The dupfeat command duplicates features of type -source and outputs the copies with type dest. The mergefeat command merges adjacent features of the same type:

gt dupfeat -dest exon -source CDS your.gff3 \

  | gt dupfeat -dest exon -source three_prime_UTR \

  | gt dupfeat -dest exon -source five_prime_UTR \

  | gt mergefeat \

  | gt gff3 -retainids -sort -tidy -o your.new.gff3

 

Pretty slick!

 

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Dan500

good to know

ADD REPLYlink written 4.8 years ago by Istvan Albert ♦♦ 79k

GenomeTools is the answer for many questions I have about GFF3 processing!

ADD REPLYlink written 4.5 years ago by Daniel Standage3.8k
2
gravatar for Dan
4.8 years ago by
Dan500
Cambridge
Dan500 wrote:

Here is my answer in full, complicated by the fact that the dumb format wasn't consistent in it's stupidity: 

ADD COMMENTlink written 4.8 years ago by Dan500
0
gravatar for Istvan Albert
4.8 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Looks like your CDS' are the exons, only that the CDS' also include the stop codon that is not actually part of the mRNA.

I don't think that there is a tool to do what you need in one step.

ADD COMMENTlink written 4.8 years ago by Istvan Albert ♦♦ 79k

Right, the CDS are the exons except when interrupted by a start (stop) codon, in which case the exon includes the five (three) prime UTR.... I guess?

ADD REPLYlink written 4.8 years ago by Dan500

the definition for these is actually a lot more complicated, and I suspect tool developers may be a little cavalier in labeling. I would not be surprised if there were inconsistencies along the way. It all depends what is the file needed for.

Exon: http://www.sequenceontology.org/browser/current_svn/term/SO:0000147

CDS: http://www.sequenceontology.org/browser/current_svn/term/SO:0000316

ADD REPLYlink written 4.8 years ago by Istvan Albert ♦♦ 79k

The definitions are (now) clear (and the GFF validates OK), the pain is knowing if your CDS abuts a five (three) prime UTR (or both!) and if your five (three) prime UTR is a separate exon... Actually, my solution has been ignoring the intron features, these let me solve it actually! I'll post Perl when I'm done.

ADD REPLYlink written 4.8 years ago by Dan500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour