Question: Comprehensive Intro To Next Generation Sequencing
gravatar for bow
9.0 years ago by
bow790 wrote:

Back when Sanger sequencing was the most popular method, analyzing the sequence data was a simple task and only required relatively little background. Now, with the rapid and overwhelming development of next-generation sequencing, even learning about it seems like a daunting task. As a person still very much unfamiliar with the field, I feel lost in the amount of information that I need to digest: the assembly and alignment methods, the amount of available apps, the pipeline, the different platforms, etc -- I'm confused where I should start!

So my question would be: What is the most comprehensive resource to learn about next-generation sequencing data analysis? Preferably using SOLiD and/or Illumina (going from the most raw data to the final sequence).

I realize the answer might be very long-winded, but what I'm looking for is at least pointers so I can figure out what I need to know next on my own. I very much want to be able to analyze and interpret the data flood that is coming out of the field, but I'm pretty clueless right now. The things I know I learned from various, separate sources and sometimes it's hard to tie them together. So I this information would be a huge help for me (and I'm sure for other initiates alike :) ).

A little bit background: I'm familiar with UNIX (Linux), know a bit of Python and Java (if that helps). Currently I am doing a research project that involves building and assembling SOLiD and Illumina RNA-seq data to a reference genome sequenced using 454.

ADD COMMENTlink written 9.0 years ago by bow790
gravatar for Daniel Swan
9.0 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

I find an RSS feed to the bioinformatics section of SeqAnswers to be pretty much invaluable. And a subscription to the mailing lists for all the tools I use - MAQ, bowtie, bwa, etc. etc as well as an eye on bioc-sig-sequencing for R/BioConductor related stuff.

The thing I have found recently in the field, is that there is no such things as best-practice established, a tools ecosystem that has not yet undergone enough selection to produce clear winners for most NGS tasks and there is a paucity of documentation. I think this is to be expected in a rapidly moving field, so there is currently no substitute for 1) experience and 2) an eye on what everyone else is doing.

Most of my recent 'oh really?' moments have come chatting with other bioinformaticians doing similar analysis with similar aims (in my case variant detection from exome capture).

Just my $0.02

EDIT: There's also a recent set of summary papers in Briefings in Bioinformatics

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by Daniel Swan13k

SeqAnswers seems golden, thanks! Briefings in Bioinformatics looks great, too!

ADD REPLYlink written 9.0 years ago by bow790
gravatar for Ian
9.0 years ago by
University of Manchester, UK
Ian5.5k wrote:

It might be useful for you to check out the GALAXY 'NGS TOOLBOX' at It contains a selection of analysis tools for different aspects of NGS, e.g. variant analysis, ChIP-seq. There is also a section for QC (quality control) of reads, which is an essential step. This is only a small selection of what is available, but it will give you a feel for what can be done. You may also find some of the GALAXY "Quickie" movies of use.

You may also want to familiarise yourself with the SAM/BAM format, which makes data analysis and storage easier

In terms of viewing data I like to upload my data in UCSC browser, but other stand alone browsers such as IGV or Savant have their own strengths.

Have fun!

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by Ian5.5k

I have got more use out of IGV and UCSC than I have Galaxy. Does anyone enjoy using Galaxy? I see its point, but dislike using it when there is a Unix shell available..

ADD REPLYlink written 9.0 years ago by Daniel Swan13k

I really do like Galaxy for downstream (post mapping) analyses where the comparison of genome coordinates is involved. I can easily see all the comparisons i have done. I think it is a very easy way to access coordinate based data sets, e.g. UCSC conservation info, alignments, etc.

ADD REPLYlink written 9.0 years ago by Ian5.5k

Not really used it for that I must admit, I'll have to take a look at that functionality

ADD REPLYlink written 9.0 years ago by Daniel Swan13k
gravatar for Alastair Kerr
9.0 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Also have a look at the Bioinformatics NGS virtual issue. It keeps track of a range of latest tools.

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by Alastair Kerr5.2k

NAR? Bioinformatics surely..

ADD REPLYlink written 9.0 years ago by Daniel Swan13k

oops; well spotted..

ADD REPLYlink written 9.0 years ago by Alastair Kerr5.2k

This is completely off-topic but Alastair, I believe you and I were lab partners back in Edinburgh, 1989-93. Hello! Good to see you also found your way to bioinformatics.

ADD REPLYlink written 9.0 years ago by Neilfws48k

Hi Neil! I've been in the field since my PhD back in 1993: I never did like bench science :>

Can you send personal messages via this web site?

ADD REPLYlink written 9.0 years ago by Alastair Kerr5.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1231 users visited in the last hour