Question: strand-specific transcriptome Oases vs. CLC
gravatar for wd
5.2 years ago by
wd0 wrote:


I assembled an animal transcriptome de novo using strand-specific paired-end Illumina sequence data and the Oases/velvet software package (supporting strand specific data). Using the same sequence data, I also assembled a transcriptome using CLC software (CLC genomic workbench, not supporting strand-specific data for de novo assembly). Comparing these two transcriptomes (Oases vs CLC) for several reference genes (> 50) revealed that the CLC assembly was much better than the Oases version (e.g. in the CLC transcriptome genes were not fragmented into several contigs and a larger number of full length genes were assembled with CLC).

I understand strand-specific sequence data is very useful for measuring strand-specific expression but is it also favourable to use strand-specific information when assembling a transcriptome. A literature search couldn't make me much wiser....



rna-seq next-gen assembly • 1.7k views
ADD COMMENTlink modified 5.2 years ago by Damian Kao15k • written 5.2 years ago by wd0
gravatar for Damian Kao
5.2 years ago by
Damian Kao15k
Damian Kao15k wrote:

I've done some tests where I performed two assemblies with the same set of stranded PE data with Trinity. One specifying strandedness and the other specifying non-strandedness. Then I mapped the stranded PE reads back to see how how many reads would be mapped in mixed orientations in strandedness and non-strandedness assemblies. 

Any transcript with more than 5 reads mapping in a single direction, I designated single orientation. Any transcript with more than 5 reads mapping in both directions, I designated mixed orientation. 

For my libraries, I found ~25% single direction and ~1% mixed direction for stranded assembly. And ~25% single direction and ~2-3% mixed direction for unstranded assembly. So there were more reads mapped in mixed directions in the unstranded assembly. 

There were also a lot less transcripts assembled in the unstranded assembly (~180k vs 210k in stranded).

I think, in terms of transcriptome assembly, for the majority of transcripts, strandness doesn't seem to matter that much. But for a small proportion where maybe there are anti-sense transcription, you might be fusing transcripts. 

ADD COMMENTlink written 5.2 years ago by Damian Kao15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2092 users visited in the last hour