Question: How to assemble contigs?
0
gravatar for Paul
23 months ago by
Paul70
India
Paul70 wrote:

Hi have a new strain of bacterial genome for which sequencing was done using illumina.

The sequence is a pair-end read for which I have done de-novo assembly and generated contigs with minimum length of 200 using CLC genomics workbench and online servers.

Now, my aim is to assemble this contigs into a whole sequence genome. Is there any software (for windows7) or online server to assemble the contigs into a single genome?

ADD COMMENTlink modified 23 months ago by vmicrobio240 • written 23 months ago by Paul70
1
gravatar for Vijay Lakhujani
23 months ago by
Vijay Lakhujani4.0k
India
Vijay Lakhujani4.0k wrote:

Refrain working with CLC gw until and unless you are not familiar with linux at all. You could have used soap denovo for bacterial genome assembly.

Anyway, are you sure that you have contigs from CLC? Look at this image:

enter image description here

What you get from CLC is a fasta file having scaffolds. You can check this by exporting and opening the fasta file into a text editor (since you are working in windows) and looking out for 'n/N' in the sequence which are gaps.

It is not possible to get a single sequence representing the entire genome (for obvious reasons of shotgun sequencing). However, it is possible to judge the quality of assembly. Check out these posts here and here.

Pacbio data can produce a single contig representing the entire genome.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Vijay Lakhujani4.0k
2

Refrain working with CLC gw until and unless you are not familiar with linux at all.

Vijay Lakhujani : It is not appropriate to tell other users what they should or should not do since we don't know their circumstances. CLC gw is a perfectly valid option for users restricted to using Windows. CLC has been around for many years and is actively developed/supported.

You can certainly suggest other/better software options is you want to help.

It is not possible to get a single sequence representing the entire genome (for obvious reasons of shotgun sequencing)

That is also not correct. With bacterial genomes it is certainly possible to get a single contig representing the entire genome, provided one had the right kind of libraries/coverage.

ADD REPLYlink modified 23 months ago • written 23 months ago by genomax65k

CLC gw is a perfectly valid option for users restricted to using Windows.

I might be opening another debate here (open source v/s commercial software). Commercial tools often hide minute algorithmic details because of obvious trade/business reasons. The down side of adopting a commercial solution is, inevitably, some loss of flexibility and configurability. A significant danger is the temptation to simply apply a pre-configured workflow and treat it as a "black box" without fully considering or understanding whether each of the step is appropriate for a particular project's objectives and datasets. Additionally, commercial software tend to replace simple scientific keywords with other terms (example "kmer" with "word") which could be confusing to users; though they mean the same thing. Not everything hardcoded inside is disclosed which forces users to have a blind faith on the software.

On the other hand, open source software codes are publicly available and can be easily hacked into accordingly. Additionally, open source tools have prescriptive published protocol. Everything is clearly understood and any error or deviation from the expectations could be tracked.

With bacterial genomes it is certainly possible to get a single contig representing the entire genome, provided one had the right kind of libraries/coverage.

It's possible in rare circumstances where one is ready to pay for "required coverage" to get a assembly in one contig.

ADD REPLYlink written 23 months ago by Vijay Lakhujani4.0k

In adition, if you have a very close related specie (to genomic level) with a complete assembled genome, you can use it as reference to assembly your reads (spades or idba_hybrid; in linux by command line). But with paired end reads it would be very difficult (or even imposible).

ADD REPLYlink written 23 months ago by Buffo1.5k
0
gravatar for vmicrobio
23 months ago by
vmicrobio240
vmicrobio240 wrote:

if you have a reference, I would recommend to use scaffold builder (or better scaffold builder source forge) to map your contigs against a close reference

ADD COMMENTlink written 23 months ago by vmicrobio240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1198 users visited in the last hour