Question

merged transcripts from RNA de novo assembly to create a reference transcriptome

2

Entering edit mode

9.5 years ago

teixe005 ▴ 30

Hello,

I have created de novo assemblies from RNAseq reads using velvet/oases for different subjects at several time points. For every subject I have a merged file that has ~ 100,000 de novo transcripts that were created by merging other transcripts with different k-mer sizes. My ultimate goal is to perform differentially expressed analysis on this data set. The next step is to create a reference transcriptome that has all the transcripts from all subjects and time points, with no ambiguity, so I can map the de novo transcripts to the reference transcriptome that was created and quantify expression.

My question is in regards to a program that will merge all the transcripts from all subjects and time points and create a transcriptome that has just one copy of the same transcript and is also not missing any of the de novo transcripts that were found. Any suggestions? Is cd-hit a good option?

Thank you in advance for your help. I really appreciate it.

Assembly RNA-Seq • 7.7k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 9.5 years ago by teixe005 ▴ 30

0

Entering edit mode

Are you working with a species that does not have a reference genome sequence?

ADD REPLY • link 9.5 years ago by Malachi Griffith 19k

0

Entering edit mode

I'm working with the equine genome. There is a reference genome but we know this reference has problems with assembly and annotation. This is the reason why we performed a de novo assembly of the RNA reads (using velvet/Oases), in addition to the reference based one (using Bowtie/TopHat for mapping followed by Cufflinks).

ADD REPLY • link 9.5 years ago by teixe005 ▴ 30

score 4 · Answer 1 · 2014-10-30

4

Entering edit mode

9.5 years ago

Richard Smith-Unna ▴ 140

corset [software | paper] is much better at merging transcriptome assemblies than cd-hit-est. Specifically it is a tool for clustering contigs in a transcriptome assembly, but this makes it useful for merging, as demonstrated in the paper.

ADD COMMENT • link 9.5 years ago by Richard Smith-Unna ▴ 140

Ram · Answer 2 · 2014-10-23

1

Entering edit mode

9.5 years ago

Ram 43k

When I worked on my de novo transcriptome, we used cd-hit-est to cluster the merge assemblies from Velvet/Oases. It is one of the ways to go - the only one I know, in fact - but I was never completely comfortable with it. The technique is self referential and hence validation feels a bit quirky.

ADD COMMENT • link 2.2 years ago by Ram 43k

1

Entering edit mode

Thank you very much for your comment. I'll merge them using cd-hit-est. We'll see how it goes.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by teixe005 ▴ 30

1

Entering edit mode

As RamRS said it is little tricky especially to select the similarity cutoffs to merge the shorter transcripts. Reducing the similarity cutoff will merge the isoforms and paralogs and increasing the similarity cutoff would retain spurious contigs generated. So, we ran the cd-hit-est on at multiple cutoff's and conservatively chose the cutoff's where there is not a drastic falldown of the merged contigs. But still, this is not "the way" to carry on.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

I agree. A similarity cut-off of ~90 is quite stringent, and 80 saw the number of contigs fall dramatically, in my case, that is. Without prior knowledge of the number of genes in the organism, gauging accuracy can be difficult.

ADD REPLY • link 2.2 years ago by Ram 43k

Ram · Answer 3 · 2014-11-27

0

Entering edit mode

9.4 years ago

h.mon 35k

This paper points to the EvidentialGene pipeline as providing a high quality merged transcriptome. I've used Corset and the results were a bit puzzling and did not follow the manual description, but I did not follow through.

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by h.mon 35k