Question: Stringtie output gtf file only contains STRG transripts, no original annotations
1
gravatar for jammydodger123456
14 months ago by
jammydodger12345620 wrote:

Hi,

I am trying to generate a more complete transcriptome using stringtie to look for more lowly expressed transcripts. I have taken 2 read files from the same study for Opisthorchis viverrini, aligned them (together, as if they were replicates) to the genome using STAR, converted sam to bam with samtools. My command for stringtie is below.

stringtie -p 8 -G Oviv_Annos.gff3 New_Annos.bam -o New_Annos.gtf

Stringtie runs with no errors, but when I extract the sequences from New_Annos.gtf, every transcript is annotated "STRG". I was hoping that only the novel transcripts would have this notation. Is there an option I am missing for me to do this?

Thanks

Edit FIXED I think I have figured the problem out. The issue seems to be that running both samples as 1 sample gave annotations containing nothing but STRGs. Running each sample individually, including generating annotations for each sample then merging these gave original annotations plus novel STRGs.

"Running samples as one" approach gave ~21,000 transcripts, "Running samples individually then merging" approach gave ~30,000 transcripts. It seems that maybe the latter approach actually gives a deeper view of the transcriptome.

Thanks to those who offered advice

stringtie • 851 views
ADD COMMENTlink modified 13 months ago • written 14 months ago by jammydodger12345620
1

are the contig/chromosome names same between reference fasta and reference annotation (gff3)?

ADD REPLYlink written 14 months ago by cpad011211k

For my first run, they actually weren't (despite coming from the same source strangely). Taking annotations and the corresponding gff file (note not gff3 that was in my original command), it still only produces STRG transcripts. It is increasing the number of total transcripts from 16,356 to 21,972 if that helps at all.

ADD REPLYlink written 14 months ago by jammydodger12345620

I am not sure but can you try using gtf format. I think stringtie looks for "transcript_id" in the gtf/gff file. If that field is not there, probably it skips and does transcript assembly to get transcripts (STRGs). I vaguely remember that similar thing happened, I used gtf and it solved. Worth exploring.

ADD REPLYlink written 13 months ago by pbpanigrahi180

Please also refer to the numerous questions on that topic here on Biostars. This issue is common for Stringtie beginners.

ADD REPLYlink written 13 months ago by ATpoint21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour