Question: In The Various Dna Sequencing Methods What Restricts The Process From Sequencing Reads Of 100K Base Pairs And Above?
gravatar for Delinquentme
8.7 years ago by
Delinquentme200 wrote:

I posted to reddit/askscience/. I posted to Quora.

Then I realized there is a website where someone might actually KNOW whats going on!

So I present to you the best explanation I've found so far from wikipedia:

Current methods can directly sequence only relatively short (300–1000 nucleotides long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. In all cases the use of a primer with a free 3' end is essential.

I'm interpreting the "insufficient power to resolve large DNA fragments that differ in length by only one nucleotide." to mean that as inputs you need lengths of DNA exactly ... 523 bps in length ( or whatever the machine specifies )

To ME ( with my limited knowledge of the subject matter ) this seems trivial. If what is needed is to "resolve" out DNA lengths with more precision why cant we just lower the viscosity of the gel, make the bath longer and run the electrophoresis for an extended period of time?

... but this still doesn't make sense to me.

In the process of prepping a sample for DNA sequencing the researcher will run a electrophoresis and remove a specific chunk of DNA corresponding to a particular sequence length from the gel.

The question is: Why are we selecting particular lengths of DNA instead of just using the longest possible lengths we've got in the DNA solution? Note: As each sequencing technology is different feel free to specify which you are most familiar with ( I'm interested in all their limitations )

short dna next-gen sequencing • 3.8k views
ADD COMMENTlink modified 6.8 years ago by bede.portz490 • written 8.7 years ago by Delinquentme200
gravatar for Swbarnes2
8.7 years ago by
Swbarnes21.5k wrote:

That wiki entry doesn't sound right.

In general, most of these sequencing processes rely on enzymes, fluorescences, or both.

Enzymes occasionally screw up, and over time, the errors mount up, and then your signal is lost in a sea of noise. Fluorescent chemicals fade, which doensn't help either.

For instance, in Illumina sequencing, the enzymes have to climb up the DNA molecule exactly one base a cycle, and they must cleave the tag off of the previous nucleotide. If the odds of one of those steps failing is 0.1%, then after 100 cycles, a hell of a lot of molecules have had a mistakes, and your data looks like a mess. In Illumina terms, you get a cluster that is a mix of all the colors, because some molecules in the cluster are a base ahead, and some are a base behind, and some still have old fluorescent tags from previous steps on, and overall, your signal is lower than it was at the start, because your fluorescent tags and enzymes have been on the instrument instead of in a nice freezer for a few days.

In order to get more sequence, you need a system that has a much lower error rate, so that it can do the same thing 1000 times, and most of the componants of the system have not messed up even once.

ADD COMMENTlink written 8.7 years ago by Swbarnes21.5k

Just a note that surprised me when I first discovered it: on an Illumina machine, the enzymes are actually "clumbing down" the DNA (not up). Sequencing happens from the "top" of the read down towards the flowcell.

ADD REPLYlink written 8.7 years ago by Steve Lianoglou5.0k

I would imagine secondary structure issues also become an issue for longer strands of DNA

ADD REPLYlink written 8.7 years ago by Daniel Swan13k

In my mind, the enzymes are little monkeys planting colored flags on the trunk of a coconut tree, while a helicopter takes pictures from above, and no way is a moneky going to plant a flag, and then take it out one at a time while climbing down. The monkey is just going to scramble down, and forget the flags. So it has to be up.

I work with computers and large text files and scripts. Enzymes, monkeys, mostly the same thing, right?

ADD REPLYlink written 8.7 years ago by Swbarnes21.5k

so I know with the case of pac bio the fluorescent is a custom nucleotide. Is this the case with Illumina? I don't understand why that would chemically fade? Wouldn't a solution then to be to constantly wash a fresh chemical solution over them?

ADD REPLYlink written 8.7 years ago by Delinquentme200

For pac bio the degradation is apparently very binary--- the polymerase in the well fails due to light exposure after some period (according to their marketing at least). The mechanism is a bit different, and as a result it's possible, although unlikely, to get extremely long reads in the >10kb range with that technology.

ADD REPLYlink written 8.6 years ago by Erik Garrison2.3k
gravatar for Daniel
8.7 years ago by
Cardiff University
Daniel3.8k wrote:

I assume from the terminology used in the question we're talking about Sanger sequencing here. The problem with longer lengths is that after a dye terminated read has ran along the fixed length, there may be wobble in the travelling speed. It's minor in the short sequences but gets amplified in the longer (ie the bases at the end of a large fragment).

Imagine synchronising 50 analogue watches at midnight. Sure, they'll be perfectly in synch for a few hours, but comeback a day or so later and there are bound to be some discrepancies. The same here.

ADD COMMENTlink written 8.7 years ago by Daniel3.8k

Updated my question. If you'd answer for whatever methods you're most familiar with, that'd be super duper.

ADD REPLYlink written 8.7 years ago by Delinquentme200
gravatar for bede.portz
6.8 years ago by
United States
bede.portz490 wrote:

Your initial question has an answer that, while accurate, isn't relevant to high throughput sequencing, in that you are not electrophoresing DNA on a gel in order to visualize the output. However, this was done with radionucleotides and sequencing gels in Sanger sequencing. I mention this, as some of the modern high throughput DNA sequencing methodologies are essentially variants of Sanger sequencing, which relies on chain termination. Illumina sequencing, for example, exploits chain termination and fluorescently labeled (rather than radioisotope labeled) nucleotides, with the additional distinctions being that the chain termination is reversable and there is no electrophoreis step.

A good place to start understanding Illumnia sequencing is by first reading about and understanding Sanger sequencing. As a point of clarification, one could theoretically alter the length and composition of an electrophoresis gel to resolve different fragments of DNA, but in actuality molecules differing by a small number of nucleotides relative to the overall length of the molecule become impossible to resolve form one another.

One you understand Sanger sequencing, you can more easily understand Illumina sequencing and your answer becomes clear. The system attempts to control the elongation, chain termination, and the reversing of the chain termination. Each step can have errors, for example the polymerase can misincorporate nucleotides, fail to add a nucleotide, add additional nucleotides, etc. The error is random and the system relies on the signal amplification gained from a many identical sequences in close proximity to detect a fluorescence signal. Eventually, random errors accumulate, randomly and you get noise. The more nucleotides sequenced, the more errors occur.

ADD COMMENTlink written 6.8 years ago by bede.portz490
gravatar for Chris Miller
8.7 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Good question - It's important to understand at least the basics of the technologies you're going to be working with.

The others have all been talking about Illumina chemistry, and their answers are accurate, but it's also important to remember that there are other types of sequencing. Sequencers like PacBio's latest offering feed a single strand of DNA through a modified polymerase and measure the fluorescence given off as each base is incorporated. In this case, the limiting factor isn't synchronicity, but the fact that the laser used to illuminate the fluorophores "burns out" the enzyme - it gets heated up and eventually denatures. Despite this, they're capable of getting much longer reads (albeit with a pretty gnarly error rate).

There are other technologies on the horizon that may help extend read lengths as well, including those that utilize nanopores. In my opinion, the single molecule techniques are the most likely to result in the kind of long reads you're talking about. The main benefit is that we'll be able to map them into repetitive portions of the genome, which are currently difficult to assay. These techniques aren't likely to replace short-read technologies, but instead, they'll complement them.

ADD COMMENTlink written 8.7 years ago by Chris Miller21k

I recall something about pac bio using ribosomes with fluorescing tags ... are you saying the lasers are used to read these tags? I guess I thought that was simply a sensor picking up the flashing dots?

ADD REPLYlink written 8.7 years ago by Delinquentme200

wouldn't a sensor work better? I guess I dont see the logic behind using something that would heat a sample, if the heat destroys its function.

ADD REPLYlink written 8.7 years ago by Delinquentme200

Fluorophores just sit there until they're excited by energy coming in. Without the laser, you have no ability to visualize it. IIRC, Illumina's system also uses lasers to excite the fluoroscent molecules and get a reading. Since the liquid volumes are so much higher and the reads are so much shorter, heat isn't as much of an issue.

ADD REPLYlink written 8.7 years ago by Chris Miller21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour