Entering edit mode
4.6 years ago
Ric
▴
430
Hi, I ran StringTie and converted its results to GFF3 with the following commands:
> gffread -E stringtie_merged.gtf -o- > stringtie_merged.gff3
> sed -i.bak 's|transcript|mRNA|g' stringtie_merged.gff3
##gff-version 3
NbV1Ch01 StringTie mRNA 212226 219213 1000.00 - . ID=STRG.3.1;geneID=STRG.3
NbV1Ch01 StringTie exon 212226 212731 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie exon 212829 212968 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie exon 218080 219213 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie mRNA 212226 219213 1000.00 - . ID=STRG.3.2;geneID=STRG.3
NbV1Ch01 StringTie exon 212226 212731 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 212829 212968 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 218080 218969 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 219061 219213 1000.00 - . Parent=STRG.3.2
How is it possible to merge all the above two mRNA features (different splice forms) into a gene feature, as shown below?
##gff-version 3
NbV1Ch01 StringTie gene 212226 219213 1000.00 - . ID=STRG.3
NbV1Ch01 StringTie mRNA 212226 219213 1000.00 - . ID=STRG.3.1;Parent=STRG.3
NbV1Ch01 StringTie exon 212226 212731 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie exon 212829 212968 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie exon 218080 219213 1000.00 - . Parent=STRG.3.1
NbV1Ch01 StringTie mRNA 212226 219213 1000.00 - . ID=STRG.3.2;Parent=STRG.3
NbV1Ch01 StringTie exon 212226 212731 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 212829 212968 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 218080 218969 1000.00 - . Parent=STRG.3.2
NbV1Ch01 StringTie exon 219061 219213 1000.00 - . Parent=STRG.3.2
Thank you in advance,
It seems like you need a short script that reads all
mRNA
lines, picks the smallest start coordinates and largest end coordinates, and writes a line withgene
designation. That should doable with few lines of code in a variety of scripting languages.