Question: Stringtie merge generates close duplicates of original transcripts
gravatar for jammydodger123456
9 months ago by
jammydodger12345620 wrote:


I am analysing a study of the parasitic flatworm Schistosoma mansoni that looks at different developmental stages, although I have noticed this problem occurs with the same pipeline in different species.

In attempts to generate a deeper transcriptome, I have made a gtf for each biological replicate (20 bio reps in total) using;

stringtie -p 8 -G ../SmanAnnos.gtf -o FA_B1.gtf FA_B1.bam

Subsequently, I merged these all together using;

stringtie --merge -p 8 -G ../SmanAnnos.gtf -o stringtie_merged.gtf mergelist.txt

As expected, in extension to the orginal Smp_* annotations, it has produced MSTRGs. Whenever I blast queries against this new transcriptome, I find that in many cases, a query may match an Smp* as well as an MSTRG with almost identical scores. When I blast these close duplicates back against the genome, they both match the same regions of the genome.

Is there a step that I am missing that looks through the merged gtf and removes MSTRGs with overlapping Smp*s? I would be worried that these duplicates would mess with FPKM and subsequent DE calculations.

Thanks, Duncan

stringtie • 344 views
ADD COMMENTlink written 9 months ago by jammydodger12345620

This is a good point, surely duplicating the transcriptome is going to screw up quantification of transcripts and differential expression downstream? Did you find an answer to this?

ADD REPLYlink written 7 weeks ago by chris86250
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour