StringTie to GFF3
1
0
Entering edit mode
4.6 years ago
Ric ▴ 430

Hi, I ran StringTie and converted its results to GFF3 with the following commands:

> gffread -E stringtie_merged.gtf -o- > stringtie_merged.gff3
> sed -i.bak 's|transcript|mRNA|g' stringtie_merged.gff3

##gff-version 3
NbV1Ch01        StringTie       mRNA    212226  219213  1000.00 -       .       ID=STRG.3.1;geneID=STRG.3
NbV1Ch01        StringTie       exon    212226  212731  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       exon    212829  212968  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       exon    218080  219213  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       mRNA    212226  219213  1000.00 -       .       ID=STRG.3.2;geneID=STRG.3
NbV1Ch01        StringTie       exon    212226  212731  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    212829  212968  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    218080  218969  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    219061  219213  1000.00 -       .       Parent=STRG.3.2

How is it possible to merge all the above two mRNA features (different splice forms) into a gene feature, as shown below?

##gff-version 3
NbV1Ch01        StringTie       gene    212226  219213  1000.00 -       .       ID=STRG.3
NbV1Ch01        StringTie       mRNA    212226  219213  1000.00 -       .       ID=STRG.3.1;Parent=STRG.3
NbV1Ch01        StringTie       exon    212226  212731  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       exon    212829  212968  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       exon    218080  219213  1000.00 -       .       Parent=STRG.3.1
NbV1Ch01        StringTie       mRNA    212226  219213  1000.00 -       .       ID=STRG.3.2;Parent=STRG.3
NbV1Ch01        StringTie       exon    212226  212731  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    212829  212968  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    218080  218969  1000.00 -       .       Parent=STRG.3.2
NbV1Ch01        StringTie       exon    219061  219213  1000.00 -       .       Parent=STRG.3.2

Thank you in advance,

gene assembly RNA-Seq • 1.4k views
ADD COMMENT
0
Entering edit mode

It seems like you need a short script that reads all mRNA lines, picks the smallest start coordinates and largest end coordinates, and writes a line with gene designation. That should doable with few lines of code in a variety of scripting languages.

ADD REPLY
3
Entering edit mode
4.6 years ago
Juke34 8.5k

Using gxf_to_gff3.pl from the GAAS toolkit you will end up with what you wish for.

gxf_to_gff3.pl -g stringtie.gtf -o stringtie_standardized.gff3
ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6