Question

Trinity assembly way too many transcripts!

0

Entering edit mode

4.6 years ago

karsa692 • 0

I am hoping someone might be able to help me trouble shoot some issues I am having building an assembly which I would like to use as a reference for DE analysis. As the title states, I am using trinity and I am getting way too many transcripts. Before I go any further I will state that I have read the article "There are too many transcripts, what do I do?" but I think my problem is a bit beyond the scope of that post.

I am trying to build a de novo transcriptome assembly using pooled samples of sea urchin larvae (four samples representing one sample per four treatment levels). I am getting around 800,000 transcripts where I would normally expect to get around 70,000 to 100,000 for similar data sets. I know that at least part of the issue is that I have lots of duplicate genes/isoforms due to the nature of the samples (many pooled individuals), however this still seems extreme. I have tried collapsing my assemblies using Grouper and CD-HIT but the assemblies are so large that collapsing still produces very large assemblies. Does anyone have any suggestions?

Thanks.

RNA-Seq Assembly • 1.6k views

ADD COMMENT • link updated 4.6 years ago by amandine.velt ▴ 40 • written 4.6 years ago by karsa692 • 0

0

Entering edit mode

It may be of interest to tell us some information about how much data went into this analysis, size of the genome and how you pre-processed the data.

Out of curiosity is there a specific reason to do a de novo assembly? Sea urchin genome has been available for over a decade and has a defined transcriptome (for S. purpuratus, available from Echinobase, NCBI or Ensembl). You could use this known transcriptome to weed out spurious transcripts from your assembly.

ADD REPLY • link 4.6 years ago by GenoMax 141k

score 3 · Answer 1 · 2019-09-05

3

Entering edit mode

4.6 years ago

amandine.velt ▴ 40

Hi,

I had the same problem as you. My solution was to use the DRAP tool, which uses trinity but makes filters afterwards. The results are very good and I get a number of transcripts similar to the expected number.

First, you need to assemble the transcriptome of each of your conditions with the runDrap command. Then, DRAP proposes a tool to merge the transcriptomes of each condition into a single complete transcriptome, with the runMeta command.

If you're interested, here is the link to the tool: http://www.sigenae.org/drap/

Best, Amandine

ADD COMMENT • link 4.6 years ago by amandine.velt ▴ 40

0

Entering edit mode

Thank you for your suggestion, I will give this a try.

ADD REPLY • link 4.6 years ago by karsa692 • 0