Question

Should I use MiSeq or HiSeq to generate data for assembling the blowfly genome?

0

Entering edit mode

9.9 years ago

leven001 • 0

Hey!

I'm working with a blowfly genome (650M genome size). I have already used Ion Torrent PGM for sequencing but it only yielded about 2-3x coverage and around 4M usable reads (size select: 400, actually around 250, single end reads). I'm looking to sequence my samples on an Illumina platform but I don't know whether to use MiSeq or HiSeq. I am using the sequencing data to do de novo assembly (using CLCbio, V7) since there are no closely related genomes available (closest annotated genome would be Drosphila). Later on, I plan on using the assemblies to locate genes, microsatellites, transposable elements, etc.

What would be more useful: more coverage or longer reads? Any input would be great since I'm new to the bioinformatics field.

genome Assembly sequencing • 9.8k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.9 years ago by leven001 • 0

0

Entering edit mode

Someone correct me if I'm wrong, but I would assume longer reads would be more informative for de novo assembly.

ADD REPLY • link 9.9 years ago by Katie D'Aco ★ 1.1k

1

Entering edit mode

yes that is true, on the other hand having higher coverage helps a lot. So it is a tradeoff.

ADD REPLY • link 9.9 years ago by Istvan Albert 100k

Ram · Answer 1 · 2014-05-28

I assume you will make new libraries, and therefore aren't limited by the short fragment sizes you had before.

I don't agree with the other post saying that the MiSeq has higher error rates - the "official" word, other publications, and my own experience is that MiSeq is actually better (in terms of per-base accuracy, see e.g. here or here informally).

The tradeoff is read length (MiSeq wins) versus total coverage (HiSeq wins, for fixed cost). For assembly and particularly looking at microsatellites and transposons I would definitely favor longer reads.

For concreteness, you could get 25M 300+300 PE reads from the MiSeq for $1800. That's 8.3M bases per dollar, and one lane would give you 23X coverage.

On a HiSeq, you could get 200M 100+100 PE reads (though maybe some cores do longer?) for $2500. That's 16M bases per dollar, and one lane would give you 62X coverage.

You can evaluate based on your budget and scientific goals, but personally I would do a MiSeq run. You can start to get good assembly results at ~20X coverage, though you can increase that later as you want to close gaps and get longer contigs.

Caveats: these prices may not be the same for all providers and the read lengths, read totals, and prices are always changing... so this snapshot will probably become out-of-date soon.

score 1 · Answer 2 · 2014-05-28

1

Entering edit mode

9.9 years ago

Adrian Pelin ★ 2.6k

I would pick HiSeq. I had some issues with MiSeq, namely higher error rate and uneven coverage of single copy regions.

ADD COMMENT • link 9.9 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

especially in the light of your short fragment sizes there is little benefit to be had from the longer MiSeq reads, your reads will overlap.

ADD REPLY • link 9.9 years ago by Istvan Albert 100k