Question: Assembler for large genome de novo assembly with Illumina paired end reads of 150 pb
2
gravatar for alecloic
5.5 years ago by
alecloic40
France
alecloic40 wrote:

Hello,

I need some advice.I never realized a genome assembly before. I have to make a de novo genome assembly on a large genome (2.5 gb) with short illumina paired end reads of 150 pb.
I inquired about the different assemblers but none match my needs. there is always a criterion that blocks (for example Abyss, ALLPATHS-LG and SOAPdenovo work with much shorter reads while others like Spades are not working for the genomes of this size).

Do you have an idea of what short-read de novo assembler I could use? which would give the best results?

cordially

 

A. GUYOMARD

France, Lyon

next-gen forum assembly genome • 5.6k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by alecloic40
3
gravatar for Brian Bushnell
5.5 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Allpaths and Soap both work fine with 150bp reads.  I don't know what the best assembler is for genomes of that size, though.

ADD COMMENTlink written 5.5 years ago by Brian Bushnell17k

not Allpaths, it requires at least two libraries, one paired-end and one mate-pair (see B1 and B3 in https://www.broadinstitute.org/software/allpaths-lg/blog/?page_id=215)

 

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Rayan Chikhi1.4k
3

Actually, we use Allpaths routinely here with only one library.  You can feed Allpaths the short library again instead of a long library, or you can assemble the short library with something like Velvet and generate synthetic LMP reads from the contigs, which is the approach we take.  It seems silly, but it works and gives good results.

ADD REPLYlink written 5.5 years ago by Brian Bushnell17k

thanks for your comment. Just curious..  did you control for misassemblies?

ADD REPLYlink written 5.5 years ago by Rayan Chikhi1.4k
1

When testing a new assembler or assembly method, we use data of known organisms and run the assembly through Quast, which counts misassemblies, to verify that the approach is valid.

ADD REPLYlink written 5.5 years ago by Brian Bushnell17k
1

QUAST with a ref genome is indeed a very good approach to evaluate an assembly.  If you had no or little misassemblies, then I'd be enclined to think it's fine.

I'd be curious to hear from Allpaths developers what they think of this usage of their tool.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Rayan Chikhi1.4k

Have you a script to generate synthetic LMP reads from a contig file to share please ? :)

ADD REPLYlink written 4.7 years ago by MathGon10
2

You can use the BBMap package for that:

randomreads.sh ref=contigs.fa reads=1000000 out=lmp.fq paired interleaved len=150 mininsert=3600 maxinsert=4400

They come out in "innie" orientation; you can use reformat.sh with the rcomp or rcompmate flag to transform them to a different orientation if you need to.

ADD REPLYlink written 4.7 years ago by Brian Bushnell17k

You can feed Allpaths the short library again instead of a long library, or you can assemble the short library with something like Velvet and generate synthetic LMP reads from the contigs, which is the approach we take.

Is there a recommendation for which option might work better - short library again Vs. synthetic LMPs?

OR does it differ on a case by case basis, and if so, how does not determine which option might better serve one's genome assembly goals?

AND I wonder if ALLPATHS-LG, for a medium sized eukaryotic, haploid genome (~50MB), has been empirically shown to be any better or worse than a5miseq, or SPAdes, or ABySS. I'm comparing assemblers to pick one, but I've got to stop my comparative analyses to move on with the "chosen" one. Hence this question.

I hope you do not mind me tagging you two here: Rayan Chikhi, and Brian Bushnell. Thanks!

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Anand Rao310
1

Actually, we don't do that anymore as far as I know :) I'm not sure if it's a good idea or not, or what the procedure was for validating that it did not lead to misassemblies (if any validation was performed). So if you do go that route, I suggest you validated it on genomes with finished references first.

We have extensively tested AllPaths versus other assemblers multiple times, but assembly results can be very version-specific and Spades especially has changed a lot since the last test.

Spades tends to be our best microbial assembler but I'm not sure how it does on fungi.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Brian Bushnell17k
2
gravatar for rtliu
5.5 years ago by
rtliu2.1k
New Zealand
rtliu2.1k wrote:

If you don't have a big-memory machine (512GB+), you could try Minia.

Manual: http://minia.genouest.org/files/manual.pdf

ADD COMMENTlink written 5.5 years ago by rtliu2.1k
1
gravatar for alecloic
5.5 years ago by
alecloic40
France
alecloic40 wrote:

hello,
thank you for your advice, it will help me in my choice.
For ALLPATHS-LG and Soap, I thought it needed shorter reads. but suddenly I'll maybe use Soap. I can't use ALLPATHS-LG because "ALLPATHS‐LG requires a minimum of 2 paired ‐ end libraries – one short and one long" and I do not have that.
Minia could indeed be a good solution, I had not heard about this software during my research.
I have not yet seen testing on a large genome with IDBA-UD, but why not.

best regards

A. GUYOMARD
France, Lyon

ADD COMMENTlink written 5.5 years ago by alecloic40
0
gravatar for 5heikki
5.5 years ago by
5heikki8.9k
Finland
5heikki8.9k wrote:

How about IDBA-UD?

ADD COMMENTlink written 5.5 years ago by 5heikki8.9k

Not sure if it works for Gbp-sized genomes..

ADD REPLYlink written 5.5 years ago by Rayan Chikhi1.4k
0
gravatar for alecloic
5.5 years ago by
alecloic40
France
alecloic40 wrote:

if it can be useful to someone, it seems that there are other interesting software for this case: JR-Assemble, Contrail and maybe Gossamer.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by alecloic40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour