Question

Comprehensive Intro To Next Generation Sequencing

7

Entering edit mode

13.6 years ago

bow ▴ 790

Back when Sanger sequencing was the most popular method, analyzing the sequence data was a simple task and only required relatively little background. Now, with the rapid and overwhelming development of next-generation sequencing, even learning about it seems like a daunting task. As a person still very much unfamiliar with the field, I feel lost in the amount of information that I need to digest: the assembly and alignment methods, the amount of available apps, the pipeline, the different platforms, etc -- I'm confused where I should start!

So my question would be: What is the most comprehensive resource to learn about next-generation sequencing data analysis? Preferably using SOLiD and/or Illumina (going from the most raw data to the final sequence).

I realize the answer might be very long-winded, but what I'm looking for is at least pointers so I can figure out what I need to know next on my own. I very much want to be able to analyze and interpret the data flood that is coming out of the field, but I'm pretty clueless right now. The things I know I learned from various, separate sources and sometimes it's hard to tie them together. So I this information would be a huge help for me (and I'm sure for other initiates alike :) ).

A little bit background: I'm familiar with UNIX (Linux), know a bit of Python and Java (if that helps). Currently I am doing a research project that involves building and assembling SOLiD and Illumina RNA-seq data to a reference genome sequenced using 454.

next-gen sequencing solid illumina • 5.6k views

ADD COMMENT • link updated 13.6 years ago by Alastair Kerr 5.3k • written 13.6 years ago by bow ▴ 790

score 10 · Answer 1 · 2010-09-28

I find an RSS feed to the bioinformatics section of SeqAnswers to be pretty much invaluable. And a subscription to the mailing lists for all the tools I use - MAQ, bowtie, bwa, etc. etc as well as an eye on bioc-sig-sequencing for R/BioConductor related stuff.

The thing I have found recently in the field, is that there is no such things as best-practice established, a tools ecosystem that has not yet undergone enough selection to produce clear winners for most NGS tasks and there is a paucity of documentation. I think this is to be expected in a rapidly moving field, so there is currently no substitute for 1) experience and 2) an eye on what everyone else is doing.

Most of my recent 'oh really?' moments have come chatting with other bioinformaticians doing similar analysis with similar aims (in my case variant detection from exome capture).

Just my $0.02

EDIT: There's also a recent set of summary papers in Briefings in Bioinformatics

score 6 · Answer 2 · 2010-09-28

6

Entering edit mode

13.6 years ago

Ian 6.0k

It might be useful for you to check out the GALAXY 'NGS TOOLBOX' at http://main.g2.bx.psu.edu/. It contains a selection of analysis tools for different aspects of NGS, e.g. variant analysis, ChIP-seq. There is also a section for QC (quality control) of reads, which is an essential step. This is only a small selection of what is available, but it will give you a feel for what can be done. You may also find some of the GALAXY "Quickie" movies of use.

You may also want to familiarise yourself with the SAM/BAM format, which makes data analysis and storage easier http://samtools.sourceforge.net/.

In terms of viewing data I like to upload my data in UCSC browser http://genome.ucsc.edu/, but other stand alone browsers such as IGV http://www.broadinstitute.org/igv/ or Savant http://compbio.cs.toronto.edu/savant/index.html have their own strengths.

Have fun!

ADD COMMENT • link 13.6 years ago by Ian 6.0k

1

Entering edit mode

I have got more use out of IGV and UCSC than I have Galaxy. Does anyone enjoy using Galaxy? I see its point, but dislike using it when there is a Unix shell available..

ADD REPLY • link 13.6 years ago by User 59 13k

0

Entering edit mode

I really do like Galaxy for downstream (post mapping) analyses where the comparison of genome coordinates is involved. I can easily see all the comparisons i have done. I think it is a very easy way to access coordinate based data sets, e.g. UCSC conservation info, alignments, etc.

ADD REPLY • link 13.6 years ago by Ian 6.0k

0

Entering edit mode

Not really used it for that I must admit, I'll have to take a look at that functionality

ADD REPLY • link 13.6 years ago by User 59 13k

score 4 · Answer 3 · 2010-09-28

4

Entering edit mode

13.6 years ago

Alastair Kerr 5.3k

Also have a look at the Bioinformatics NGS virtual issue. It keeps track of a range of latest tools.

ADD COMMENT • link 13.6 years ago by Alastair Kerr 5.3k

0

Entering edit mode

NAR? Bioinformatics surely..

ADD REPLY • link 13.6 years ago by User 59 13k

0

Entering edit mode

oops; well spotted..

ADD REPLY • link 13.6 years ago by Alastair Kerr 5.3k

0

Entering edit mode

This is completely off-topic but Alastair, I believe you and I were lab partners back in Edinburgh, 1989-93. Hello! Good to see you also found your way to bioinformatics.

ADD REPLY • link 13.6 years ago by Neilfws 49k

0

Entering edit mode

Hi Neil! I've been in the field since my PhD back in 1993: I never did like bench science :>

Can you send personal messages via this web site?

ADD REPLY • link 13.5 years ago by Alastair Kerr 5.3k