Should I use MiSeq or HiSeq to generate data for assembling the blowfly genome?
2
0
Entering edit mode
8.8 years ago
leven001 • 0

Hey!

I'm working with a blowfly genome (650M genome size). I have already used Ion Torrent PGM for sequencing but it only yielded about 2-3x coverage and around 4M usable reads (size select: 400, actually around 250, single end reads). I'm looking to sequence my samples on an Illumina platform but I don't know whether to use MiSeq or HiSeq. I am using the sequencing data to do de novo assembly (using CLCbio, V7) since there are no closely related genomes available (closest annotated genome would be Drosphila). Later on, I plan on using the assemblies to locate genes, microsatellites, transposable elements, etc.

What would be more useful: more coverage or longer reads? Any input would be great since I'm new to the bioinformatics field.

genome Assembly sequencing • 9.4k views
0
Entering edit mode

Someone correct me if I'm wrong, but I would assume longer reads would be more informative for de novo assembly.

1
Entering edit mode

yes that is true, on the other hand having higher coverage helps a lot. So it is a tradeoff.

4
Entering edit mode
8.8 years ago
matted 7.7k

I assume you will make new libraries, and therefore aren't limited by the short fragment sizes you had before.

I don't agree with the other post saying that the MiSeq has higher error rates - the "official" word, other publications, and my own experience is that MiSeq is actually better (in terms of per-base accuracy, see e.g. here or here informally).

The tradeoff is read length (MiSeq wins) versus total coverage (HiSeq wins, for fixed cost). For assembly and particularly looking at microsatellites and transposons I would definitely favor longer reads.

For concreteness, you could get 25M 300+300 PE reads from the MiSeq for $1800. That's 8.3M bases per dollar, and one lane would give you 23X coverage. On a HiSeq, you could get 200M 100+100 PE reads (though maybe some cores do longer?) for$2500. That's 16M bases per dollar, and one lane would give you 62X coverage.

You can evaluate based on your budget and scientific goals, but personally I would do a MiSeq run. You can start to get good assembly results at ~20X coverage, though you can increase that later as you want to close gaps and get longer contigs.

Caveats: these prices may not be the same for all providers and the read lengths, read totals, and prices are always changing... so this snapshot will probably become out-of-date soon.

1
Entering edit mode

Good luck getting bigger than 600bp fragments with Nextera commonly used in MiSeq. It is very likely for the PE to overlap, so the price of 8.3M bases per dollar would have to account the fact that some base pairs will be redundant.

1
Entering edit mode
8.8 years ago

I would pick HiSeq. I had some issues with MiSeq, namely higher error rate and uneven coverage of single copy regions.

0
Entering edit mode