Question: Stringtie output gtf file only contains STRG transripts, no original annotations
gravatar for jammydodger123456
2.3 years ago by
jammydodger12345630 wrote:


I am trying to generate a more complete transcriptome using stringtie to look for more lowly expressed transcripts. I have taken 2 read files from the same study for Opisthorchis viverrini, aligned them (together, as if they were replicates) to the genome using STAR, converted sam to bam with samtools. My command for stringtie is below.

stringtie -p 8 -G Oviv_Annos.gff3 New_Annos.bam -o New_Annos.gtf

Stringtie runs with no errors, but when I extract the sequences from New_Annos.gtf, every transcript is annotated "STRG". I was hoping that only the novel transcripts would have this notation. Is there an option I am missing for me to do this?


Edit FIXED I think I have figured the problem out. The issue seems to be that running both samples as 1 sample gave annotations containing nothing but STRGs. Running each sample individually, including generating annotations for each sample then merging these gave original annotations plus novel STRGs.

"Running samples as one" approach gave ~21,000 transcripts, "Running samples individually then merging" approach gave ~30,000 transcripts. It seems that maybe the latter approach actually gives a deeper view of the transcriptome.

Thanks to those who offered advice

stringtie • 1.5k views
ADD COMMENTlink modified 2.2 years ago • written 2.3 years ago by jammydodger12345630

are the contig/chromosome names same between reference fasta and reference annotation (gff3)?

ADD REPLYlink written 2.3 years ago by cpad011214k

For my first run, they actually weren't (despite coming from the same source strangely). Taking annotations and the corresponding gff file (note not gff3 that was in my original command), it still only produces STRG transcripts. It is increasing the number of total transcripts from 16,356 to 21,972 if that helps at all.

ADD REPLYlink written 2.3 years ago by jammydodger12345630

I am not sure but can you try using gtf format. I think stringtie looks for "transcript_id" in the gtf/gff file. If that field is not there, probably it skips and does transcript assembly to get transcripts (STRGs). I vaguely remember that similar thing happened, I used gtf and it solved. Worth exploring.

ADD REPLYlink written 2.3 years ago by pbpanigrahi190

Please also refer to the numerous questions on that topic here on Biostars. This issue is common for Stringtie beginners.

ADD REPLYlink written 2.2 years ago by ATpoint39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour