Question: what should I use for reference guided assembly of bacterial genomes intended for comparative genomic analyses?
gravatar for jerrybug109
4.7 years ago by
United States
jerrybug10910 wrote:

Hi everyone!

I'm new to genomics and bioinformatics and was hoping I could get insight from seasoned bioinformaticians for some advice.

My lab has sequenced the genomes of several dozen strains of Bacillus subtilis and we've gotten the whole genome short reads (paired end, 150bp, llumina hiseq) back for several dozen strains. We now need to assemble their genomes. After that we want to do a comparative genomic analysis of them: compare gene content, function, differential regimes of positive selection on genes, shared/unique genes, known/novel genes responsible for ecological adaptations in nature, etc

Our lab has done comparative studies on these strains before based on single or triple housekeeping phylogenetic analyses, but this is our first time getting our hands on their whole genome sequences - and I don't have experience doing genome assembly at all.

There's plenty of B. subtilis reference strains available so I'm guessing a reference guided assembly would be our safest bet (I'm not sure why we would want to do de novo assembly if we have references available). I'm not familiar with the software or reference guided assembly pipelines out there. Do you guys have any suggestions for software or pipeline/approaches we should use to assemble our genomes?

Excited to join the field, thanks!

ADD COMMENTlink modified 4.6 years ago by indexofire30 • written 4.7 years ago by jerrybug10910
gravatar for piet
4.7 years ago by
planet earth
piet1.8k wrote:

Assemble every genome denovo with spades. Then map the contigs from all assemblies to an appropriate finished genome in one run. Except for repeats, spades will assemble most of the genome with high accuracy. Assembling a single B.sub genome will presumably take about 20 min on a desktop or notebook.

The genome of laboratory strain 168 (AL009126.3) is the model-organism for all Firmicutes and very well annotated, but I would not be surprised, if some field isolates of B.sub will map very poorly.  


ADD COMMENTlink written 4.7 years ago by piet1.8k

Can spaDES also do the mapping/reference assembly? I noticed that some de novo assemblers seem to be able to do reference-based assembly as well, but was wondering if it would be better to find a reference-guided tool specially for that step. Thanks for your advice!

ADD REPLYlink written 4.7 years ago by jerrybug10910

Mapping and assembly are very different things. Usually the best programs are those dedicated to a single task. I recommend to use 'bwa mem' to map spades contigs to a reference genome, see here.

I use this often for contigs of bacterial genomes and it works quite well, despite that bwa mem is intended for aligning short reads.

ADD REPLYlink modified 2.1 years ago by RamRS30k • written 4.6 years ago by piet1.8k
gravatar for indexofire
4.6 years ago by
Hong Kong
indexofire30 wrote:

For reference based assembly, you can try ragout

ADD COMMENTlink modified 2.1 years ago by RamRS30k • written 4.6 years ago by indexofire30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour