Question: Stringtie merge generates close duplicates of original transcripts
gravatar for jammydodger123456
2.2 years ago by
jammydodger12345630 wrote:


I am analysing a study of the parasitic flatworm Schistosoma mansoni that looks at different developmental stages, although I have noticed this problem occurs with the same pipeline in different species.

In attempts to generate a deeper transcriptome, I have made a gtf for each biological replicate (20 bio reps in total) using;

stringtie -p 8 -G ../SmanAnnos.gtf -o FA_B1.gtf FA_B1.bam

Subsequently, I merged these all together using;

stringtie --merge -p 8 -G ../SmanAnnos.gtf -o stringtie_merged.gtf mergelist.txt

As expected, in extension to the orginal Smp_* annotations, it has produced MSTRGs. Whenever I blast queries against this new transcriptome, I find that in many cases, a query may match an Smp* as well as an MSTRG with almost identical scores. When I blast these close duplicates back against the genome, they both match the same regions of the genome.

Is there a step that I am missing that looks through the merged gtf and removes MSTRGs with overlapping Smp*s? I would be worried that these duplicates would mess with FPKM and subsequent DE calculations.

Thanks, Duncan

stringtie • 856 views
ADD COMMENTlink written 2.2 years ago by jammydodger12345630

This is a good point, surely duplicating the transcriptome is going to screw up quantification of transcripts and differential expression downstream? Did you find an answer to this?

ADD REPLYlink written 18 months ago by chris86340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1411 users visited in the last hour