Question

Anyone Have Suggestions For De Novo Assembly Of A Genome From Miseq 250 Pe Data?

1

Entering edit mode

11.6 years ago

Dan D 7.4k

I have some MiSeq data from a prokaryotic specimen. The data are paired-end with read lengths of 250 and an average library size of 440. I ran these data through Velvet using the following parameters:

velveth:
hash_length of 31
shortPaired read type

velvetg:
-exp_cov auto   (automatically infer unique region coverage)
-ins_length 200

The contigs came out looking pretty nonsensical. They were shorter than expected and there were lots of repeated sequences. After browsing the literature, it seems like Velvet is more for shorter reads coming from older technology like 454 and Solexa.

Does anyone have any advice for how to assemble a genome from the data I have now? I'm pretty new to genome assembly, so please forgive me if I've left out relevant info. I'll be happy to provide it if asked.

denovo assembly miseq • 6.9k views

ADD COMMENT • link updated 11.6 years ago by liz.batty ▴ 30 • written 11.6 years ago by Dan D 7.4k

0

Entering edit mode

Unless quality of MiSeq is a lot lower than usual Illumina reads, I would up the word length when using a deBruijn assembler. I think we get okay results using CLC with a word size of 50-75. We also use Newbler, Abyss, and Celera for assembly, but I have no experience with prokaryotes and MiSeq.

ADD REPLY • link 11.6 years ago by Ketil 4.1k

score 2 · Answer 1 · 2012-10-09

2

Entering edit mode

11.6 years ago

Istvan Albert 100k

Assembling longer reads is a very common task that has an extensive literature (look for publications that discuss assembling 454 reads , these have been hundreds of bp long from the very beginning)

A good start would be:

Comparing de novo assemblers for 454 transcriptome data

ADD COMMENT • link 11.6 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks, that's very helpful!

ADD REPLY • link 11.6 years ago by Dan D 7.4k

0

Entering edit mode

Hm - I'm skeptical. IME, assemblers are often very dependent on the kind of data, and something that performs well on 454 isn't automatically a good performer on Illumina.

ADD REPLY • link 11.6 years ago by Ketil 4.1k

score 2 · Answer 2 · 2012-10-09

2

Entering edit mode

11.6 years ago

liz.batty ▴ 30

Velvet does a pretty good job on bacteria sequenced with 150bp Illumina reads. Try using a higher kmer - you need to compile Velvet with a larger MAXKMERLENGTH than the default, and you can use VelvetOptimiser (which comes with velvet) to run your assembly over a range of kmers and select the optimal one (by default it uses n50 for optimisation). There are a whole lot of other assemblers out there you might try, but a5 is a new microbial assembly pipeline which promises to run lots of optimisation and quality control steps for you.

With reads this long, you could also try using FLASH to turn your paired 250bp reads into a single longer read before assembly, and use those in assemblers designed for 454-length reads.

ADD COMMENT • link 11.6 years ago by liz.batty ▴ 30

0

Entering edit mode

neat paper (FLASH), I have not heard of it before

ADD REPLY • link 11.6 years ago by Istvan Albert 100k

0

Entering edit mode

N50 is a horrid metric to optimize for.

ADD REPLY • link 11.6 years ago by Ketil 4.1k