Question

Stringtie merge generates close duplicates of original transcripts

1

Entering edit mode

5.8 years ago

jammydodger123456 ▴ 40

Hi,

I am analysing a study of the parasitic flatworm Schistosoma mansoni that looks at different developmental stages, although I have noticed this problem occurs with the same pipeline in different species.

In attempts to generate a deeper transcriptome, I have made a gtf for each biological replicate (20 bio reps in total) using;

stringtie -p 8 -G ../SmanAnnos.gtf -o FA_B1.gtf FA_B1.bam

Subsequently, I merged these all together using;

stringtie --merge -p 8 -G ../SmanAnnos.gtf -o stringtie_merged.gtf mergelist.txt

As expected, in extension to the orginal Smp_* annotations, it has produced MSTRGs. Whenever I blast queries against this new transcriptome, I find that in many cases, a query may match an Smp* as well as an MSTRG with almost identical scores. When I blast these close duplicates back against the genome, they both match the same regions of the genome.

Is there a step that I am missing that looks through the merged gtf and removes MSTRGs with overlapping Smp*s? I would be worried that these duplicates would mess with FPKM and subsequent DE calculations.

Thanks, Duncan

Stringtie • 1.7k views

ADD COMMENT • link 5.8 years ago by jammydodger123456 ▴ 40

0

Entering edit mode

This is a good point, surely duplicating the transcriptome is going to screw up quantification of transcripts and differential expression downstream? Did you find an answer to this?

ADD REPLY • link 5.1 years ago by chris86 ▴ 400