Question

Are there any de novo assemblers which can handle (either) transcriptomic or genomic reads?

0

Entering edit mode

6.9 years ago

maxwhjohn1988 ▴ 130

I'm looking to make some proposals for QC pipelines for a range of sequencing applications. Two of the applications I'm thinking about are genome sequencing and transcriptome sequencing. When a reference is not available, a quick and dirty assembly is desirable to map reads to.

I wondered if anyone knows of any assembler which contains modules for doing both genomic and transcriptomic de novo assembly. This would be desirable as I could then recommend using a single assembler for reference-free QC of both applications, with an option set to indicate which type of assembly is desired. I've not been able to find anything by googling.

I appreciate that this is very unlikely given the disparities between the characteristics of the two types of data. To make matters worse, I'm looking for something which produces decent results quickly (i.e. not something which is liable to hold up QC by taking days to run). The assembly doesn't need to be perfect or even particularly good, it just needs to be good enough to allow confidence in looking at things like % of reads which map to contigs. Freely-available software would be a massive bonus.

Thanks.

Assembly DNA RNA • 1.5k views

ADD COMMENT • link updated 6.9 years ago by Damian Kao 16k • written 6.9 years ago by maxwhjohn1988 ▴ 130

score 2 · Answer 1 · 2018-08-02

2

Entering edit mode

6.9 years ago

Damian Kao 16k

The two confounding factors that differentiates transcriptomic and genomic assemblies is read coverage and splicing.

In a genomic assembly, you are assuming a relatively even coverage across your genome with spikes of higher coverage for repetitive/multi-copy regions. This coverage information can then be used as a heuristic in how you process the assembly graph.

In a transcriptomic assembly, you can't assume even coverage since genes are expressed at very different levels. The addition of splice forms means your assembly graph will have multiple "correct" paths. Usage of coverage in this case for graph processing can be very complex.

It is possible to just use a generic assembler to assemble unitigs from both transcriptomic and genomic data as a quick and dirty solution as both genomic/transcriptomic assembly usually involve a common unitig generation stage.

Abyss and Abyss-Trans both use the same initial program to generate unitigs for both genomic and transcriptomic data. You can stop the Abyss assembly process after unitig stage and use that for a quick and dirty result.

ADD COMMENT • link 6.9 years ago by Damian Kao 16k

0

Entering edit mode

Thanks I'll check out Abyss with some toy data. Much appreciated.

ADD REPLY • link 6.9 years ago by maxwhjohn1988 ▴ 130

0

Entering edit mode

Just thought I'd provide an update (might be helpful for anyone else reading this in the future).

I abandoned my idea of a single piece of software for all cases. For a quick and dirty de novo transcriptome assembly I have settled on SOAPdenovo-Trans. It's not packaged with SOAPdenovo but is based on it. On some mouse RNAseq fastq files I was given I found it to be very fast (~1/2 an hour). N50 wasn't great (lots of small scaffolds) but that's likely to be due to me having to guess at some library specifics like insert size. Testing it now with a human RNAseq library.

ADD REPLY • link 6.9 years ago by maxwhjohn1988 ▴ 130