Question

Reads, Cluster Numbers, Study Design Comparison

1

Entering edit mode

8.2 years ago

carabiniero8 ▴ 10

Hi All,

I know these are very basic questions and sorry if these sorts of things are too basic or redundant and have been answered elsewhere a while back, but I have not been able to find straightforward answers in this forum or elsewhere, so would appreciate your help.

What is the definition of a "read?" Is a read the number of clusters with DNA molecule fragments from the same library (sample) or is it the total number of DNA molecule fragments from the same library? Or is it something else altogether?
Can someone tell me the total number of clusters on a lane of a MiSeq flow cell? Also, on a lane of a HiSeq 2500 and HiSeqX flow cell?
Lastly, say I am doing a study - PE150 sequencing (so sequencing 150 bases in from each end on each fragment) for 96 samples (libraries) which I want to run out on a single lane of a HiSeq 2500. This would translate into 1.5-2 million reads/library. What would be the difference in data generation and/or advantage/disadvantage if I did the same thing except with PE300 (so 300 bases in from each end)? Is the only difference the cost in price of the chemistry used to generate the reads? i.e. PE300 would sequence the fragments for longer and cost more and also generate more data, so the reads are more likely to be more precise? The total number of reads as I understand it would remain the same. Right? So it is more of a matter of - do I want to spend the extra money to identify and later map each DNA fragment with greater precision - right?

Thank you!

Cara

cluster-number Read • 6.2k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by carabiniero8 ▴ 10

1

Entering edit mode

You have a sample library which consists of fragments. These fragments are sequenced from the ends to get "reads". If only one end is sequenced then you have a single-end read but if the other end (of the same fragment) is sequenced then you have a pair of reads from the same fragment. Pair of reads provides spatial information (specially if you have a reference to align against).

Take a look at the MiSeq spec: http://www.illumina.com/systems/miseq/performance_specifications.html Single-end "Reads" passing filters mentioned on the page equate to clusters (that originated from fragments that may not be necessarily unique) that pass quality/purity criteria.

Specs for a HiSeq 4000 (which is like a single HiSeq X) are here: http://www.illumina.com/content/illumina-marketing/amr/en_US/systems/hiseq-3000-4000/specifications.html For HiSeq 2500: http://www.illumina.com/content/illumina-marketing/amr/en_US/systems/hiseq_2500_1500/performance_specifications.html

Be careful about reading these specs. Both HiSeq 2500/4000 can run two flowcells at the same time where as MiSeq only runs one flowcell/one lane. Flowcells on 2500/5000 have 8 lanes so the single read numbers should be divided by 8 (e.g. 2.1B/8) to get clusters on a single lane. With patterned flowcells for a HiSeq 4000 total cluster numbers are irrelevant (since they are always the same). It is the number of clusters passing filter (which is optimally around 75%) that is important.

You would just get more data from a 300 bp run as opposed to a 150 cycle run. If your insert is of a size such that the two reads from fragments overlap in the middle then that will give you a longer representation. Since illumina reads have some errors towards the end of long reads (~300 cycles) such an overlap can be advantageous for correcting those errors.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.2 years ago by GenoMax 141k