What read lengths are produced by modern Illumina sequencers?
8.3 years ago
I am working with a program that generates artificial FASTQ files from a given reference genome and it allows for read length customization. I am trying to produce several FASTQ files with different read lengths. Hence, I want to know what read lengths are produced by real sequencers such as those from Illumina. I found this site which states the maximum read length of some sequencers.

However, when stating, for example, that the maximum read length is 2 x 150 bp, is that supposed to mean that each end of a contig will be 150 bp long? Also, if that is the case, does this mean that the sequencer could be set up to produce read lengths equal to any number of base pairs less than 150? i.e. could the sequencer produce read lengths of 2 x 1 bp, 2 x 2 bp, 2 x 3 bp,...., 2 x 148 bp, 2 x 149 bp? Lastly, which one of those sequencers is the most commonly used for genomes which are about as long as, say, that of E. Coli?

You can also take a look at this link. Best explanation of frequent terminologies - library (contig), insert, fragments, etc.

8.3 years ago
However, when stating, for example, that the maximum read length is 2 x 150 bp, is that supposed to mean that each end of a contig will be 150 bp long?

No. Contigs are something else based on an assembled consensus sequence.

2 x 150 bp means that you have two 150 bp reads of sequence data from a single piece of DNA. The pair of reads are separated by an unspecified length based on the insert size (usually 200 to 1200 bp) which is size-selected during the sequencing prep. This picture might make things more clear.

Mapping Reads by Suspencewl - Own work. Licensed under CC0 via Wikimedia Commons.

The blown up read is 2 x 35. The insert size (bp between the sequencing adapters) is 400-500 bp. All of the reads in the picture could be used to assemble a single contig based on the consensus sequence.

For illumina sequencing, the read length is specified by the reagent kit, so you have limited flexibility there.

The MiSeq is capable of 15 Gb of sequence output. The E. coli genome is ~4.6 Mb in length so in a single run a MiSeq could easily sequence an E. coli genome at ultra-high depth.

8.3 years ago

The read lengths (up to maximum) are somewhat configurable. However, there are some very commonly used lengths that have predominated at different times. In earlier days you would see a lot of 2x36, 2x50, and 2x76 from the Illumina GA or GAII. With the HiSeq 2000/2500 it was most common to see 2x100 or 2x150 (you also often see 2x101 or 2x151). The MiSeqs often produce 2x250 or 2x300. The HiSeq 4000, HiSeq X, and NovaSeq 5000/6000 seem to most often be run at 2x150. I would probably go with 2x100 for your simulations.

8.3 years ago
The read lengths are mostly determined by the kits that are available for purchase for the instrument:

It is possible to sequence shorter, but this would be a waste of money, because its not possible to store the reagents.



If there is no kit available then multiple kits are used eg. 2x50 for SR100.


