Question: Are there any de novo assemblers which can handle (either) transcriptomic or genomic reads?
0
gravatar for maxwhjohn1988
2.5 years ago by
maxwhjohn1988100
maxwhjohn1988100 wrote:

I'm looking to make some proposals for QC pipelines for a range of sequencing applications. Two of the applications I'm thinking about are genome sequencing and transcriptome sequencing. When a reference is not available, a quick and dirty assembly is desirable to map reads to.

I wondered if anyone knows of any assembler which contains modules for doing both genomic and transcriptomic de novo assembly. This would be desirable as I could then recommend using a single assembler for reference-free QC of both applications, with an option set to indicate which type of assembly is desired. I've not been able to find anything by googling.

I appreciate that this is very unlikely given the disparities between the characteristics of the two types of data. To make matters worse, I'm looking for something which produces decent results quickly (i.e. not something which is liable to hold up QC by taking days to run). The assembly doesn't need to be perfect or even particularly good, it just needs to be good enough to allow confidence in looking at things like % of reads which map to contigs. Freely-available software would be a massive bonus.

Thanks.

dna rna assembly • 564 views
ADD COMMENTlink modified 2.5 years ago by Damian Kao15k • written 2.5 years ago by maxwhjohn1988100
2
gravatar for Damian Kao
2.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

The two confounding factors that differentiates transcriptomic and genomic assemblies is read coverage and splicing.

In a genomic assembly, you are assuming a relatively even coverage across your genome with spikes of higher coverage for repetitive/multi-copy regions. This coverage information can then be used as a heuristic in how you process the assembly graph.

In a transcriptomic assembly, you can't assume even coverage since genes are expressed at very different levels. The addition of splice forms means your assembly graph will have multiple "correct" paths. Usage of coverage in this case for graph processing can be very complex.

It is possible to just use a generic assembler to assemble unitigs from both transcriptomic and genomic data as a quick and dirty solution as both genomic/transcriptomic assembly usually involve a common unitig generation stage.

Abyss and Abyss-Trans both use the same initial program to generate unitigs for both genomic and transcriptomic data. You can stop the Abyss assembly process after unitig stage and use that for a quick and dirty result.

ADD COMMENTlink written 2.5 years ago by Damian Kao15k

Thanks I'll check out Abyss with some toy data. Much appreciated.

ADD REPLYlink written 2.5 years ago by maxwhjohn1988100

Just thought I'd provide an update (might be helpful for anyone else reading this in the future).

I abandoned my idea of a single piece of software for all cases. For a quick and dirty de novo transcriptome assembly I have settled on SOAPdenovo-Trans. It's not packaged with SOAPdenovo but is based on it. On some mouse RNAseq fastq files I was given I found it to be very fast (~1/2 an hour). N50 wasn't great (lots of small scaffolds) but that's likely to be due to me having to guess at some library specifics like insert size. Testing it now with a human RNAseq library.

ADD REPLYlink written 2.4 years ago by maxwhjohn1988100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 910 users visited in the last hour
_