Question

What Is Next Generation Sequence Data Analysis And Third Generation Sequence Analysis And Difference Between Them?

3

Entering edit mode

13.9 years ago

Ssonia ▴ 30

Can anyone explain me in detail about next generation sequence analysis and third generation sequence analysis?

next-gen sequencing • 11k views

ADD COMMENT • link updated 11.8 years ago by Biostar 20 • written 13.9 years ago by Ssonia ▴ 30

score 13 · Answer 1 · 2011-08-24

I recommend you go and read these blog posts by Luke Jostins:

Basics: Sequencing DNA, Part 1 (1st generation)
Basics: Sequencing DNA, Part 2 (2nd/3rd generation)

Or this open-access review:

A window into third-generation sequencing (PDF)

Briefly, "generation" refers to the chemistry and technology used by the sequencing process. First generation generally refers to Sanger sequencing. "Next-generation", when you think about it, is a meaningless term, but is generally used to refer to any of the high-throughput methods which were developed after Sanger (e.g. 454, Illumina). And no-one quite knows what third-generation means, but some people use the term to refer to single-molecule methods.

I think the important thing is to understand the underlying process behind each sequencing procedure and not worry too much about silly jargon and buzz phrases.

score 5 · Answer 2 · 2011-08-24

There is no clear definition of 2nd and 3rd generation sequencing, but there are two common views:

Illumina, SOLiD, and Roche are 2nd generation (use fluorescent labels) while Ion Torrent and single-molecule technologies like Helicos are 3rd-generation ("post-light").
Illumina, SOLiD, Roche, and Ion Torrent are 2nd generation ("shotgun sequencing") while single-molecule technologies are 3rd generation.

The data analysis will depend more on technology than on what generation the technology is classified as. 2nd and 3rd generation are both high-throughput. Different technologies have different error rates and patterns. For example:

SOLiD uses colorspace
Roche and Ion Torrent tend to have homopolymer errors
Helicos has very long reads, but relatively low accuracy (I've heard 86%).

score 0 · Answer 3 · 2011-08-24

Most of the sequence analysis since 2006 has been on dealing with very short reads. If you recall Solexa started out with <30bp reads, not that much longer than MPSS. After adapter trimming only about 20-25bp of usable sequence was left. The BLAT website would not even accept such short reads.

Although SBS read lengths have been growing ever since, the aligners and assemblers that were developed to deal with short reads still tended to be highly indexed and very stringent. Bowtie and MAQ did not accept indels. Period. Think about that.

The verbose alignment formats that worked for BLAST now seem like flowery Victorian love letters.

With longer reads the tide is shifting back to being more sensitive with regard to detecting and dealing with larger indels. Aligning and assembling single molecule reads will be a more bayesian and error-prone affair and will require more biological knowledge be translated into the software. There is also weird stuff like phased reads from PacBio which will require some retooling. And direct detection of methylation, if that pans out.