Question: Number of passing clusters vs. number of read pairs vs. total number of reads
0
gravatar for AP
14 months ago by
AP90
AP90 wrote:

Hi all,

I apologize for a rather basic question but I am confused about the terminology.

What is really the difference between:

  • Number of passing filters
  • Number of clusters
  • Number of read pairs per lane
  • Total number of reads

For instance, Hiseq 4000 should produce about 300M reads per lane. What does that mean exactly? If I sequence at PE150, does that mean the total number of expected reads should 600M? This is quite important when budgeting a project. 75,000 fragments with a 20X coverage would require 1,500,000 reads and so 0.0025 lanes of Hiseq 4000 (1,500,000/600M)?

Any help clarifying this would be highly appreciated!

flowcell illumina reads • 1.2k views
ADD COMMENTlink modified 14 months ago by genomax80k • written 14 months ago by AP90
1
gravatar for genomax
14 months ago by
genomax80k
United States
genomax80k wrote:
  • Number of clusters (library fragments anchored to flowcell capable of producing sequence). This number is fixed for patterned flowcells but variable for other flowcells. Library quality dependent.
  • Number of clusters passing chastity filter (initial Illumina data processing filter norms e.g. pure sequence, certain quality)

Chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. Clusters “pass filter” if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles.

  • Number of read pairs per lane = Number of clusters passing filter in that lane (x2, if counting actual reads)

Illumina double counts reads in general so number of reads usually means only 1/2 unique library fragments.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax80k

OK thank you very much for the clarification! So, does that mean I should consider 300M reads when calculating the number of lanes required for e.g. a 20X coverage (like in the example above?); Or should I double the number of reads?

ADD REPLYlink written 14 months ago by AP90

Use Illumina sequencing coverage calculator :-)

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k

Thanks but I don't find it very helpful and clear. I like being able to calculate this by hand myself.

ADD REPLYlink written 14 months ago by AP90

Using published specification for HiSeq 3000/4000 :

2,500,000,000 single-end reads per 8 lanes = 312,500,000 reads per lane OR
5,000,000,000 paired-end reads per 8 lanes = 625,000,000 reads per lane

625,000,000 x 150 = 9.375000e10 total bases per lane for paired-end reads.

What is the average length of your 75,000 fragments going to be? You would be sampling the sequence between the two ends.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour