Question: Trinity assembly way too many transcripts!
gravatar for karsa692
3 months ago by
karsa6920 wrote:

I am hoping someone might be able to help me trouble shoot some issues I am having building an assembly which I would like to use as a reference for DE analysis. As the title states, I am using trinity and I am getting way too many transcripts. Before I go any further I will state that I have read the article "There are too many transcripts, what do I do?" but I think my problem is a bit beyond the scope of that post.

I am trying to build a de novo transcriptome assembly using pooled samples of sea urchin larvae (four samples representing one sample per four treatment levels). I am getting around 800,000 transcripts where I would normally expect to get around 70,000 to 100,000 for similar data sets. I know that at least part of the issue is that I have lots of duplicate genes/isoforms due to the nature of the samples (many pooled individuals), however this still seems extreme. I have tried collapsing my assemblies using Grouper and CD-HIT but the assemblies are so large that collapsing still produces very large assemblies. Does anyone have any suggestions?


rna-seq assembly • 191 views
ADD COMMENTlink modified 3 months ago by amandine.velt40 • written 3 months ago by karsa6920

It may be of interest to tell us some information about how much data went into this analysis, size of the genome and how you pre-processed the data.

Out of curiosity is there a specific reason to do a de novo assembly? Sea urchin genome has been available for over a decade and has a defined transcriptome (for S. purpuratus, available from Echinobase, NCBI or Ensembl). You could use this known transcriptome to weed out spurious transcripts from your assembly.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax75k
gravatar for amandine.velt
3 months ago by
amandine.velt40 wrote:


I had the same problem as you. My solution was to use the DRAP tool, which uses trinity but makes filters afterwards. The results are very good and I get a number of transcripts similar to the expected number.

First, you need to assemble the transcriptome of each of your conditions with the runDrap command. Then, DRAP proposes a tool to merge the transcriptomes of each condition into a single complete transcriptome, with the runMeta command.

If you're interested, here is the link to the tool:

Best, Amandine

ADD COMMENTlink written 3 months ago by amandine.velt40

Thank you for your suggestion, I will give this a try.

ADD REPLYlink written 3 months ago by karsa6920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour