Question: Stringtie merge generates close duplicates of original transcripts
1
gravatar for jammydodger123456
9 months ago by
jammydodger12345620 wrote:

Hi,

I am analysing a study of the parasitic flatworm Schistosoma mansoni that looks at different developmental stages, although I have noticed this problem occurs with the same pipeline in different species.

In attempts to generate a deeper transcriptome, I have made a gtf for each biological replicate (20 bio reps in total) using;

stringtie -p 8 -G ../SmanAnnos.gtf -o FA_B1.gtf FA_B1.bam

Subsequently, I merged these all together using;

stringtie --merge -p 8 -G ../SmanAnnos.gtf -o stringtie_merged.gtf mergelist.txt

As expected, in extension to the orginal Smp_* annotations, it has produced MSTRGs. Whenever I blast queries against this new transcriptome, I find that in many cases, a query may match an Smp* as well as an MSTRG with almost identical scores. When I blast these close duplicates back against the genome, they both match the same regions of the genome.

Is there a step that I am missing that looks through the merged gtf and removes MSTRGs with overlapping Smp*s? I would be worried that these duplicates would mess with FPKM and subsequent DE calculations.

Thanks, Duncan

stringtie • 344 views
ADD COMMENTlink written 9 months ago by jammydodger12345620

This is a good point, surely duplicating the transcriptome is going to screw up quantification of transcripts and differential expression downstream? Did you find an answer to this?

ADD REPLYlink written 7 weeks ago by chris86250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour