Question: Trinity assembly way too many transcripts!
gravatar for karsa692
15 months ago by
karsa6920 wrote:

I am hoping someone might be able to help me trouble shoot some issues I am having building an assembly which I would like to use as a reference for DE analysis. As the title states, I am using trinity and I am getting way too many transcripts. Before I go any further I will state that I have read the article "There are too many transcripts, what do I do?" but I think my problem is a bit beyond the scope of that post.

I am trying to build a de novo transcriptome assembly using pooled samples of sea urchin larvae (four samples representing one sample per four treatment levels). I am getting around 800,000 transcripts where I would normally expect to get around 70,000 to 100,000 for similar data sets. I know that at least part of the issue is that I have lots of duplicate genes/isoforms due to the nature of the samples (many pooled individuals), however this still seems extreme. I have tried collapsing my assemblies using Grouper and CD-HIT but the assemblies are so large that collapsing still produces very large assemblies. Does anyone have any suggestions?


rna-seq assembly • 481 views
ADD COMMENTlink modified 15 months ago by amandine.velt40 • written 15 months ago by karsa6920

It may be of interest to tell us some information about how much data went into this analysis, size of the genome and how you pre-processed the data.

Out of curiosity is there a specific reason to do a de novo assembly? Sea urchin genome has been available for over a decade and has a defined transcriptome (for S. purpuratus, available from Echinobase, NCBI or Ensembl). You could use this known transcriptome to weed out spurious transcripts from your assembly.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax92k
gravatar for amandine.velt
15 months ago by
amandine.velt40 wrote:


I had the same problem as you. My solution was to use the DRAP tool, which uses trinity but makes filters afterwards. The results are very good and I get a number of transcripts similar to the expected number.

First, you need to assemble the transcriptome of each of your conditions with the runDrap command. Then, DRAP proposes a tool to merge the transcriptomes of each condition into a single complete transcriptome, with the runMeta command.

If you're interested, here is the link to the tool:

Best, Amandine

ADD COMMENTlink written 15 months ago by amandine.velt40

Thank you for your suggestion, I will give this a try.

ADD REPLYlink written 14 months ago by karsa6920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour