Question: merged transcripts from RNA de novo assembly to create a reference transcriptome
2
gravatar for teixe005
4.5 years ago by
teixe00530
United States
teixe00530 wrote:

Hello,

I have created de novo assemblies from RNAseq reads using velvet/oases for different subjects at several time points. For every subject I have a merged file that has ~ 100,000 de novo transcripts that were created by merging other transcripts with different k-mer sizes. My ultimate goal is to perform differentially expressed analysis on this data set. The next step is to create a reference transcriptome that has all the transcripts from all subjects and time points, with no ambiguity, so I can map the de novo transcripts to the reference transcriptome that was created and quantify expression.

My question is in regards to a program that will merge all the transcripts from all subjects and time points and create a transcriptome that has just one copy of the same transcript and is also not missing any of the de novo transcripts that were found. Any suggestions? Is cd-hit a good option? 

Thank you in advance for your help. I really appreciate it.

rna-seq forum assembly • 5.6k views
ADD COMMENTlink modified 4.4 years ago by h.mon24k • written 4.5 years ago by teixe00530

Are you working with a species that does not have a reference genome sequence?

ADD REPLYlink written 4.5 years ago by Malachi Griffith17k

I'm working with the equine genome. There is a reference genome but we know this reference has problems with assembly and annotation. This is the reason why we performed a de novo assembly of the RNA reads (using velvet/Oases), in addition to the reference based one (using Bowtie/TopHat for mapping followed by Cufflinks).

ADD REPLYlink written 4.5 years ago by teixe00530
4
gravatar for Richard Smith-Unna
4.5 years ago by
UK
Richard Smith-Unna130 wrote:

corset [software | paper] is much better at merging transcriptome assemblies than cd-hit-est. Specifically it is a tool for clustering contigs in a transcriptome assembly, but this makes it useful for merging, as demonstrated in the paper.

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Richard Smith-Unna130
1
gravatar for RamRS
4.5 years ago by
RamRS21k
Houston, TX
RamRS21k wrote:

When I worked on my de novo transcriptome, we used cd-hit-est to cluster the merge assemblies from Velvet/Oases. It is one of the ways to go - the only one I know, in fact - but I was never completely comfortable with it. The technique is self referential and hence validation feels a bit quirky.

ADD COMMENTlink written 4.5 years ago by RamRS21k
1

Thank you very much for you comment. I'll merge them using cd-hit-est. We'll see how it goes. 

ADD REPLYlink written 4.5 years ago by teixe00530
1

As RamRS said it is little tricky especially to select the similarity cutoffs to merge the shorter transcripts. Reducing the similarity cutoff will merge the isoforms and paralogs and increasing the similarity cutoff would retain spurious contigs generated. So, we ran the cd-hit-est on at multiple cutoff's and conservatively chose the cutoff's where there is not a drastic falldown of the merged contigs. But still, this is not "the way" to carry on.

ADD REPLYlink written 4.5 years ago by Prakki Rama2.2k

I agree. A similarity cut-off of ~90 is quite stringent, and 80 saw the number of contigs fall dramatically, in my case, that is. Without prior knowledge of the number of genes in the organism, gauging accuracy can be difficult.

ADD REPLYlink written 4.5 years ago by RamRS21k
0
gravatar for h.mon
4.4 years ago by
h.mon24k
Brazil
h.mon24k wrote:

This paper points to the EvidentialGene pipeline as providing a high quality merged transcriptome. I've used Corset and the results were a bit puzzling and did not follow the manual description, but I did not follow through.

ADD COMMENTlink written 4.4 years ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour