Question: Anyone Have Suggestions For De Novo Assembly Of A Genome From Miseq 250 Pe Data?
1
gravatar for Dan D
6.5 years ago by
Dan D6.7k
Tennessee
Dan D6.7k wrote:

I have some MiSeq data from a prokaryotic specimen. The data are paired-end with read lengths of 250 and an average library size of 440. I ran these data through Velvet using the following parameters:

velveth:
hash_length of 31
shortPaired read type

velvetg:
-exp_cov auto   (automatically infer unique region coverage)
-ins_length 200

The contigs came out looking pretty nonsensical. They were shorter than expected and there were lots of repeated sequences. After browsing the literature, it seems like Velvet is more for shorter reads coming from older technology like 454 and Solexa.

Does anyone have any advice for how to assemble a genome from the data I have now? I'm pretty new to genome assembly, so please forgive me if I've left out relevant info. I'll be happy to provide it if asked.

assembly miseq denovo • 5.6k views
ADD COMMENTlink modified 6.5 years ago by liz.batty30 • written 6.5 years ago by Dan D6.7k

Unless quality of MiSeq is a lot lower than usual Illumina reads, I would up the word length when using a deBruijn assembler. I think we get okay results using CLC with a word size of 50-75. We also use Newbler, Abyss, and Celera for assembly, but I have no experience with prokaryotes and MiSeq.

ADD REPLYlink written 6.5 years ago by Ketil3.9k
2
gravatar for Istvan Albert
6.5 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

Assembling longer reads is a very common task that has an extensive literature (look for publications that discuss assembling 454 reads , these have been hundreds of bp long from the very beginning)

A good start would be:

Comparing de novo assemblers for 454 transcriptome data

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Istvan Albert ♦♦ 80k

Thanks, that's very helpful!

ADD REPLYlink written 6.5 years ago by Dan D6.7k

Hm - I'm skeptical. IME, assemblers are often very dependent on the kind of data, and something that performs well on 454 isn't automatically a good performer on Illumina.

ADD REPLYlink written 6.5 years ago by Ketil3.9k
2
gravatar for liz.batty
6.5 years ago by
liz.batty30
liz.batty30 wrote:

Velvet does a pretty good job on bacteria sequenced with 150bp Illumina reads. Try using a higher kmer - you need to compile Velvet with a larger MAXKMERLENGTH than the default, and you can use VelvetOptimiser (which comes with velvet) to run your assembly over a range of kmers and select the optimal one (by default it uses n50 for optimisation). There are a whole lot of other assemblers out there you might try, but a5 is a new microbial assembly pipeline which promises to run lots of optimisation and quality control steps for you.

With reads this long, you could also try using FLASH to turn your paired 250bp reads into a single longer read before assembly, and use those in assemblers designed for 454-length reads.

ADD COMMENTlink written 6.5 years ago by liz.batty30

neat paper (FLASH), I have not heard of it before

ADD REPLYlink written 6.5 years ago by Istvan Albert ♦♦ 80k

N50 is a horrid metric to optimize for.

ADD REPLYlink written 6.5 years ago by Ketil3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1398 users visited in the last hour