Question: Why does base quality of reads generally decreases at the end of the read?
gravatar for lakhujanivijay
4.0 years ago by
lakhujanivijay4.7k wrote:

Why does base quality of reads generally decreases at the end of the read? I have learned that it can affect the alignment also but why at the first place does it happen?

I know it has something to do with the sequencing chemistry. Can somebody please explain?

base quality • 4.2k views
ADD COMMENTlink modified 4.0 years ago by memory_donk290 • written 4.0 years ago by lakhujanivijay4.7k

Check this previous answer: Why Does The Base Quality Drop Towards The End For Illumina Reads ?

ADD REPLYlink written 4.0 years ago by Fidel1.9k
gravatar for memory_donk
4.0 years ago by
memory_donk290 wrote:

The really specific answer depends on what platform you're using but I'll go out on a (short) limb and guess its Illumina. If so, the drop-off is a phasing error.

With Illumina, DNA fragments are first bound to a flow cell. A well-prepared flow cell has even spacing between all DNA fragments. Before sequencing, the DNA fragments are amplified with a technique called bridge amplification, resulting in clusters of the same DNA molecule at each spot. Ideally no clusters overlap with each other (this is important for distinguishing clusters from each other). Illumina sequencers wash the flow cell with all 4 nucleotides and a blocker chemical so that only 1 base gets added to each molecule of DNA at a time. Different clusters may add different bases, but within a cluster it should always be the same.

This is how things work in a perfect world. In reality, a few molecules in each cluster will likely fail to add a nucleotide. So lets say we're on cycle 50 of 150. In this cycle 10 out of 1000 molecules fails to add a new nucleotide (an A). Next cycle (cycle 51) when the 990 other molecules add a G, the 10 that failed last cycle will add the A instead. From now on they will be at least 1 cycle behind the rest, polluting the light signal that the sequencer's camera has to read. in cycle 52, some more sequences fall behind from the main group, and some from the group that was already behind fall even further behind. You can see that by cycle 150, several percent of the molecules may well be out-of-sync with the cycle number and by the end of the sequencing run, the last N bases will have a less pure light signal than the first. This is the information Illumina sequencers use to calculate quality scores. New chemistries are largely intended to minimize this phasing problem, increasing the length of reads before quality begins to drop.





ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by memory_donk290
gravatar for piet
4.0 years ago by
planet earth
piet1.7k wrote:

Because you are doing sequencing by synthesization. The more cycles you run, the more errors you accumulate.

ADD COMMENTlink written 4.0 years ago by piet1.7k

Can you please elaborate? Why it is specific to end of reads then?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by lakhujanivijay4.7k

One cycle = one base for a given read. The cycles toward the end of a read are therefore the bases near the end of a read. For example, cycle 147 of a 300-cycle MiSeq kit (2 x 150 bp reads) is the 147th bp of a 150 bp read (for read 1 that is; for the 147th base in read 2 it would be cycle 294).

If each cycle has a small, but measurable, error in incorporation of nucleotides, the ends of the read will have to most errors.

A similar situation exists with oligo synthesis. Each cycle has an error rate associated with incorporating new bases, therefore there is a growing number of oligos that are truncation products. Thus, long oligos have a lower percentage of full-length products.

ADD REPLYlink written 21 months ago by Hunter110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1199 users visited in the last hour