Question

Best de novo assembler for insect genome ?

0

Entering edit mode

7.9 years ago

Picasa ▴ 640

Hello,

I have an insect genome to assemble (max size: 500 MB) with illumina data composed of paired end and mate pair.

I'm thinking to use SOAPdenovo and Spades.

Do you have any recommendation of better assembler for my data ?

assembler • 3.4k views

ADD COMMENT • link updated 7.5 years ago by Chris Fields ★ 2.2k • written 7.9 years ago by Picasa ▴ 640

0

Entering edit mode

There is no clear answer to that question, But you are adviced to use different assembler, I would suggest Abyss and SOAPdenovo; after that you can use as suggested by the answer of @harold.smith.tarheel N50 and/or align your read to the assembly to see how it behave (if many reads didn't aligned you probably miss some regions in your assembly) as you have paire-end and mate pair if you have concordant align reads low then you have rearrangements in your assembly, use relative specious to see how your assembly looks.

Also you can use tools like REAPR (for de novo assembly) , misFinder (identify mis-assemblies in an unbiased manner using reference and paired-end reads), QUAST

ADD REPLY • link 6.2 years ago by Medhat 9.7k

score 2 · Answer 1 · 2016-06-17

2

Entering edit mode

7.9 years ago

harold.smith.tarheel ★ 4.9k

"Best assembler" is in the eye of the beholder. What are your requirements? Longest NG50? Most comprehensive gene coverage? Accurate resolution of heterozygosity? Best long range connectivity? Most reads remapping to your assembly?

There is no single best assembler, or single best metric for determining the best assembly. I recommend the Assemblathon 2 paper for its discussion of assembly evaluation, as well as challenges posed by heterozygosity, repetitive sequences, etc.

ADD COMMENT • link 7.9 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

I know that paper, and test have been done with large eukaryote, while I try to assemble insect.

By best I mean, best N50 mostly

ADD REPLY • link 7.9 years ago by Picasa ▴ 640

0

Entering edit mode

Those vertebrate genomes were only 2X-3X larger than your insect (1.0-1.6 GB vs 500MB), so the sizes are comparable. And no single assembler gives consistently best NG50 across all data sets. That metric is strongly dependent upon the degree of heterozygosity and repetitive DNA, which varies by genome.

ADD REPLY • link 7.9 years ago by harold.smith.tarheel ★ 4.9k

score 2 · Answer 2 · 2016-10-24

My suggestion would be to do some preliminary QC on the sequence data first, which may help dictate which assemblers you may want to look into. Run a k-mer analysis to determine the level of actual coverage and complexity of the data (you could use Jellyfish, khmer, and a whole slew of tools to generate this data). Also, we run preQC to give a more complete assessment.

This, plus what library types you have, normally helps dictate which assemblers may work best. If you have overlapping shotgun libraries and a genome with low heterozygosity, ALLPATHS-LG or DISCOVAR are great (with the latter you would need to scaffold with a separate tool). Which one depends on the length of the sequence data you have.

If the het. rate is pretty high you could give Platanus a go; we've had fairly reasonable luck with it on a few troublesome genomes. You can also use SOAPdenovo, though I believe it's now deprecated in favor of MEGAHIT (we haven't tried this one yet).

score 1 · Answer 3 · 2016-10-24

1

Entering edit mode

7.5 years ago

Lina F ▴ 200

Here is a recent paper discussing using DISCOVAR for insect assembly: http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2531-7

Might be helpful

ADD COMMENT • link 7.5 years ago by Lina F ▴ 200