Question: Trinity assembly way too many transcripts!
0
gravatar for karsa692
3 months ago by
karsa6920
karsa6920 wrote:

I am hoping someone might be able to help me trouble shoot some issues I am having building an assembly which I would like to use as a reference for DE analysis. As the title states, I am using trinity and I am getting way too many transcripts. Before I go any further I will state that I have read the article "There are too many transcripts, what do I do?" but I think my problem is a bit beyond the scope of that post.

I am trying to build a de novo transcriptome assembly using pooled samples of sea urchin larvae (four samples representing one sample per four treatment levels). I am getting around 800,000 transcripts where I would normally expect to get around 70,000 to 100,000 for similar data sets. I know that at least part of the issue is that I have lots of duplicate genes/isoforms due to the nature of the samples (many pooled individuals), however this still seems extreme. I have tried collapsing my assemblies using Grouper and CD-HIT but the assemblies are so large that collapsing still produces very large assemblies. Does anyone have any suggestions?

Thanks.

rna-seq assembly • 191 views
ADD COMMENTlink modified 3 months ago by amandine.velt40 • written 3 months ago by karsa6920

It may be of interest to tell us some information about how much data went into this analysis, size of the genome and how you pre-processed the data.

Out of curiosity is there a specific reason to do a de novo assembly? Sea urchin genome has been available for over a decade and has a defined transcriptome (for S. purpuratus, available from Echinobase, NCBI or Ensembl). You could use this known transcriptome to weed out spurious transcripts from your assembly.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax75k
3
gravatar for amandine.velt
3 months ago by
amandine.velt40 wrote:

Hi,

I had the same problem as you. My solution was to use the DRAP tool, which uses trinity but makes filters afterwards. The results are very good and I get a number of transcripts similar to the expected number.

First, you need to assemble the transcriptome of each of your conditions with the runDrap command. Then, DRAP proposes a tool to merge the transcriptomes of each condition into a single complete transcriptome, with the runMeta command.

If you're interested, here is the link to the tool: http://www.sigenae.org/drap/

Best, Amandine

ADD COMMENTlink written 3 months ago by amandine.velt40

Thank you for your suggestion, I will give this a try.

ADD REPLYlink written 3 months ago by karsa6920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour