Stringtie merge generates close duplicates of original transcripts
Entering edit mode
3.5 years ago


I am analysing a study of the parasitic flatworm Schistosoma mansoni that looks at different developmental stages, although I have noticed this problem occurs with the same pipeline in different species.

In attempts to generate a deeper transcriptome, I have made a gtf for each biological replicate (20 bio reps in total) using;

stringtie -p 8 -G ../SmanAnnos.gtf -o FA_B1.gtf FA_B1.bam

Subsequently, I merged these all together using;

stringtie --merge -p 8 -G ../SmanAnnos.gtf -o stringtie_merged.gtf mergelist.txt

As expected, in extension to the orginal Smp_* annotations, it has produced MSTRGs. Whenever I blast queries against this new transcriptome, I find that in many cases, a query may match an Smp* as well as an MSTRG with almost identical scores. When I blast these close duplicates back against the genome, they both match the same regions of the genome.

Is there a step that I am missing that looks through the merged gtf and removes MSTRGs with overlapping Smp*s? I would be worried that these duplicates would mess with FPKM and subsequent DE calculations.

Thanks, Duncan

Stringtie • 1.2k views
Entering edit mode

This is a good point, surely duplicating the transcriptome is going to screw up quantification of transcripts and differential expression downstream? Did you find an answer to this?


Login before adding your answer.

Traffic: 2077 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6