Question: strand-specific transcriptome Oases vs. CLC
4.7 years ago
wd wrote:


I assembled an animal transcriptome de novo using strand-specific paired-end Illumina sequence data and the Oases/velvet software package (supporting strand specific data). Using the same sequence data, I also assembled a transcriptome using CLC software (CLC genomic workbench, not supporting strand-specific data for de novo assembly). Comparing these two transcriptomes (Oases vs CLC) for several reference genes (> 50) revealed that the CLC assembly was much better than the Oases version (e.g. in the CLC transcriptome genes were not fragmented into several contigs and a larger number of full length genes were assembled with CLC).

I understand strand-specific sequence data is very useful for measuring strand-specific expression but is it also favourable to use strand-specific information when assembling a transcriptome. A literature search couldn't make me much wiser....



4.7 years ago
Damian Kao
Damian Kao wrote:

I've done some tests where I performed two assemblies with the same set of stranded PE data with Trinity. One specifying strandedness and the other specifying non-strandedness. Then I mapped the stranded PE reads back to see how how many reads would be mapped in mixed orientations in strandedness and non-strandedness assemblies. 

Any transcript with more than 5 reads mapping in a single direction, I designated single orientation. Any transcript with more than 5 reads mapping in both directions, I designated mixed orientation. 

For my libraries, I found ~25% single direction and ~1% mixed direction for stranded assembly. And ~25% single direction and ~2-3% mixed direction for unstranded assembly. So there were more reads mapped in mixed directions in the unstranded assembly. 

There were also a lot less transcripts assembled in the unstranded assembly (~180k vs 210k in stranded).

I think, in terms of transcriptome assembly, for the majority of transcripts, strandness doesn't seem to matter that much. But for a small proportion where maybe there are anti-sense transcription, you might be fusing transcripts. 

