Question: Are there any de novo assemblers which can handle (either) transcriptomic or genomic reads?
0
gravatar for maxwhjohn1988
18 months ago by
maxwhjohn198890 wrote:

I'm looking to make some proposals for QC pipelines for a range of sequencing applications. Two of the applications I'm thinking about are genome sequencing and transcriptome sequencing. When a reference is not available, a quick and dirty assembly is desirable to map reads to.

I wondered if anyone knows of any assembler which contains modules for doing both genomic and transcriptomic de novo assembly. This would be desirable as I could then recommend using a single assembler for reference-free QC of both applications, with an option set to indicate which type of assembly is desired. I've not been able to find anything by googling.

I appreciate that this is very unlikely given the disparities between the characteristics of the two types of data. To make matters worse, I'm looking for something which produces decent results quickly (i.e. not something which is liable to hold up QC by taking days to run). The assembly doesn't need to be perfect or even particularly good, it just needs to be good enough to allow confidence in looking at things like % of reads which map to contigs. Freely-available software would be a massive bonus.

Thanks.

dna rna assembly • 444 views
ADD COMMENTlink modified 18 months ago by Damian Kao15k • written 18 months ago by maxwhjohn198890
2
gravatar for Damian Kao
18 months ago by
Damian Kao15k
USA
Damian Kao15k wrote:

The two confounding factors that differentiates transcriptomic and genomic assemblies is read coverage and splicing.

In a genomic assembly, you are assuming a relatively even coverage across your genome with spikes of higher coverage for repetitive/multi-copy regions. This coverage information can then be used as a heuristic in how you process the assembly graph.

In a transcriptomic assembly, you can't assume even coverage since genes are expressed at very different levels. The addition of splice forms means your assembly graph will have multiple "correct" paths. Usage of coverage in this case for graph processing can be very complex.

It is possible to just use a generic assembler to assemble unitigs from both transcriptomic and genomic data as a quick and dirty solution as both genomic/transcriptomic assembly usually involve a common unitig generation stage.

Abyss and Abyss-Trans both use the same initial program to generate unitigs for both genomic and transcriptomic data. You can stop the Abyss assembly process after unitig stage and use that for a quick and dirty result.

ADD COMMENTlink written 18 months ago by Damian Kao15k

Thanks I'll check out Abyss with some toy data. Much appreciated.

ADD REPLYlink written 18 months ago by maxwhjohn198890

Just thought I'd provide an update (might be helpful for anyone else reading this in the future).

I abandoned my idea of a single piece of software for all cases. For a quick and dirty de novo transcriptome assembly I have settled on SOAPdenovo-Trans. It's not packaged with SOAPdenovo but is based on it. On some mouse RNAseq fastq files I was given I found it to be very fast (~1/2 an hour). N50 wasn't great (lots of small scaffolds) but that's likely to be due to me having to guess at some library specifics like insert size. Testing it now with a human RNAseq library.

ADD REPLYlink written 18 months ago by maxwhjohn198890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2181 users visited in the last hour