Question: I Am Preparing A Course On Ngs: Any Suggestion ?
42
gravatar for Pierre Lindenbaum
7.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum101k wrote:

Hi all, I am preparing a course on NGS: there will be seven students for 4 hours an I want them to play with some NGS data. No programming skill is required here.

Here is what I plan to do:

  • short intro to NGS
  • structure of a (small) FASTQ file
  • map it with BWA on public Galaxy http://main.g2.bx.psu.edu/
  • index the genome and map the fastqs with MAQ
  • index the genome and map the fastqs with BWA
  • structure of a SAM file
  • GATK recalibration (?)
  • call the SNPs with samtools pileup and generate a VCF
  • explore a BAM file with samtools tview
  • find the rs## at UCSC (table browser or mysql )
  • predict the consequences of a set of SNPs with polyphen2 (btw is there a way to generate a random fastq file with a set of 'forced' mutations ?)

my other ideas:

  • running something in the cloud: do you know if there is a way to run something for free on Amazon ? what kind of analysis could I run ?
  • storing something (the VCF ?) in a database (mysql ? sqlite3 ?) and using rails to display the data
  • generating the tool for using a webservice (SOAP/REST...): what service could I use for this course ?

Any other suggestion ? What would you like to see during this course ?

I'll validate the answer with highest number of votes next week.

Thanks,

Pierre

EDIT: the course should give them the opportunity see what would look the work of someone working with NGS and to have an experience with some real data. I don't know their skill but AFAIK, there are supposed to have some programming courses later.

My only experience is the analysis of "exome capture" data = SNP.

Update: I posted my slides on slideshare: http://www.slideshare.net/lindenb/20101210-ngscourse

next-gen galaxy sequencing • 12k views
ADD COMMENTlink modified 4.3 years ago by sarahhunter590 • written 7.1 years ago by Pierre Lindenbaum101k
5

would you accept attendees? ;)

ADD REPLYlink written 7.1 years ago by Jorge Amigo10.0k
2

You could consider applying to AWS for an education credit http://aws.amazon.com/education ... no guarantees, but might help cover at AWS costs for the workshop

ADD REPLYlink written 7.0 years ago by Mndoci1.2k

same question as Jorge

ADD REPLYlink written 7.1 years ago by Fred Fleche4.2k

@jorge @Fred, unfortunately I'm far from being an expert with the NGS data :-)

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

@Pierre: If you can't accept attendees, you may stream this class online :).

ADD REPLYlink written 7.1 years ago by Khader Shameer17k

@Khader , do you speak French ? ;-)

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

Course is in French ! So if I attend the course, I can improve my French as well. How about the course materials, slides etc ?

ADD REPLYlink written 7.1 years ago by Khader Shameer17k

@Pierre and @Kader : I am starting to play with the youtube API so I would be happy to display the course on Bioinformatics.fr ;-)

ADD REPLYlink written 7.0 years ago by Fred Fleche4.2k

That's cool Fred. So Pierre, you will have enough audience via Bioinformatics. Now the question is will you stream (preferably, an English) version of your course or not ?

ADD REPLYlink written 7.0 years ago by Khader Shameer17k

That's cool Fred. So Pierre, you will have enough audience via Bioinformatics.fr. Now the question is will you stream (preferably, an English) version of your course or not ?

ADD REPLYlink written 7.0 years ago by Khader Shameer17k
30
gravatar for Istvan Albert
7.1 years ago by
Istvan Albert ♦♦ 74k
University Park, USA
Istvan Albert ♦♦ 74k wrote:

Titus Brown at Michigan State University has run a course on Analyzing Next Generation Sequencing Data and as the link shows he has built an amazing resource around it.

His tutorials might give you a good sense on what topics to include and what level of detail may be appropriate.

ADD COMMENTlink written 7.1 years ago by Istvan Albert ♦♦ 74k
8

and of course drop me a note if you discover bugs, problems, etc. And re-use the material as much as you want -- CC-BY-SA.

ADD REPLYlink written 6.4 years ago by Titus Brown80

Istvan, you're my hero

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

well the the credit should go to Titus - I just happened to know about it

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 74k
20
gravatar for lh3
7.1 years ago by
lh330k
United States
lh330k wrote:
  1. Drop maq as it is not widely used as before.
  2. Choose between Galaxy and command line, depending on which suits them best. It should not be hard to find the consensus of 7 students. Similar to cloud computing.
  3. I do not work with raw data now. I would guess Illumina base quality should be better than before. In that case, I am not sure if recalibration is absolutely necessary.
  4. Introduce IGV instead of samtools' tview. Although tview is useful in a few scenarios, IGV is in general more powerful and user friendly. IGV works with VCF. No need to set up database or services.
  5. I know SQL well, but except for setting up a serious web server, I never use it. SQL is overkilling. Those graphic viewers and the UCSC custom track are much more convenient.
  6. Mention Picard and GATK, which are both great packages.
  7. I do not know how serious duplicates are affecting results, but you should mention this is a potential concern.
  8. As others suggested, it would be good to introduce ChIP/RNA-seq and the discovery of structural variations even if these are not the main purposes.

It is important to let students play with real or simulated data.

EDIT: a further comment:

I used to give a two-hour course on variant discovery. I gave each attendee a tar-ball which includes a bacterial genome (S. suis), a variant/read simulator (wgsim), a mapper (bwa), a SNP caller (samtools) and a few scripts. It is only a couple of MB in size, suitable for email. With these, one can do simulation, mapping, SNP calling, visualization and evaluation, nearly the entire pipeline. I have lost the tar-ball, though.

ADD COMMENTlink modified 7.0 years ago • written 7.1 years ago by lh330k

UCSC custom track is another nice idea. Thanks

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

There are also a number of existing tracks--especially in the ENCODE data--that show NGS data types. A survey of a few of those might help people to grasp some of the aspects and challenges. But I think we've already gone way over 4 hours....

ADD REPLYlink written 7.1 years ago by Mary11k
9
gravatar for Michael Dondrup
7.1 years ago by
Bergen, Norway
Michael Dondrup43k wrote:

I think this could be a bit too much for 4 hours, but depends on the skill of the students. If I would do a hands-on course which it seems to be, it might be ok to only use one alignment tool, and then focus more on the possible downstream analyses.

Some points to consider:

  • What is the aim of the course, can you give a more descriptive title than playing with NGS, what should they take home?
  • What is the skill level of your students?

  • Which applications do you want to present, e.g. re-sequencing, RNA-seq, ChIP-seq, SNP calling, copy-number variation? I guess you could at least mention all these approaches and provide a core toolset that is useful in all or most of applications. Or do you want to focus on SNPs?

  • I wouldn't do recalibration, it's maybe too specific.
  • Samtools is definitely a good toolkit
  • alternatively, I like the Bioconductor short-read tools
  • presenting the various file-formats (FASTQ, SAM, BAM) is definitely interesting
  • do you want to mention filtering of reads and alignments based on qualities?

It's all your descision and I think it depends mostly on the first 3 points.

Regarding using the EC2 cloud we are using it via a grant (so someone paying for us), I also remember a cloud application for RNA-seq (called Myrna). I don't know if you can EC2 for free, but then, it's just like a remote-login computer, so I don't see the benefit for your course. And your students couldn't use it at home.

Then:

storing something (the VCF ?) in a database (mysql ? sqlite3 ?) and using rails to display the data generating the tool for using a webservice (SOAP/REST...): what service could I use for this course ?

sound like more programming heavy, you said that wasn't a requirement, so I think for 4 hours an introduction to existing tools and toolkits would serve most students who are new to NGS best.

Edit: Oh, yeah, before I forget it, prepare your computer lab in good time, have the software installed on the computers, if you let the students bring their own laptops, you will spend two hours to install stuff on windows or figuring out where this damn terminal application was in windows or why they cannot compile the alignment tool using gcc....

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Michael Dondrup43k

Michael, I've never played with Bioconductor, what would you suggest to do with it ?

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

As far as I remember it is possible to quickly setup a webserver by just creating a SQL table and invoking RAILS. So I thought it would be awesome to visualize a simple SQL table (e.g: chrom,position,SNP, quality)

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

It might be difficult to teach BioConductor if one never used it, on the other hand, for you, maybe it's easier. I would show how to read in aligned reads, filter them, compute the coverage and detect peaks, do gene binning, and differential expression analysis. Again a bit too much maybe, but you can choose, most of it can also be done with other tools. A visualization option is nice, and if this rails SQL stuff is really so simple, why not? On the other hand something like IGV might be easier to reproduce by the students.

ADD REPLYlink written 7.1 years ago by Michael Dondrup43k

About Bioconductor, I was thinking about a simple example, an illustration of a simple script (for example, I found this: http://www.bioconductor.org/help/workflows/high-throughput-sequencing/ )

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum101k

Something like it, but that example workflow is so boring. Maybe look http://www.bioconductor.org/packages/release/bioc/vignettes/ShortRead/inst/doc/Overview.pdf, documentation of the short read package. On the other hand this might be better in a separate course.

ADD REPLYlink written 7.0 years ago by Michael Dondrup43k
8
gravatar for brentp
7.1 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I agree with @Michael, that's a lot for 4 hours! I think that unless they are comfortable at the command line, it's a good idea to stick with galaxy. Use BFast or bowtie instead of bwa as those seem to be more current. For 4 hours, I think a reasonable workflow would be:

  • get a fastq from SRA (or provided by you)
  • look at the quality distribution using fastqc or fastx-toolkit (or whatever galaxy supports)
  • do some read trimming/filtering and look at quality again
  • map the reads to a reference
  • use samtools to sort, index
  • look at the alignment graphically: either igv, samtools tview, or galaxy

then if there's time, you could do something with Picard like MarkDuplicates

ADD COMMENTlink written 7.1 years ago by brentp22k
5

Bfast author recommends to use BWA for Illumina mapping. Bowtie is fine for ChIP/RNA-seq, but it is not a good choice for SNP calling as it does not do gapped alignment. BWA has possibly mapped most of Illumina data in the world, because 1) the 1000 Genomes project is using BWA; 2) Sanger is using BWA exclusively for mammalian resequencing projects; 3) Broad, another major sequencing center, is using bwa exclusively; 4) so far as I know, WashU and UWash are also using bwa for resequencing. These major sequencing centers have all carefully evaluated popular mappers. There are reasons.

ADD REPLYlink written 7.1 years ago by lh330k
1

I would guess bwa arrived right at the time when people needed a mapper balanced between openness, speed, memory, features, support and, in particular, accuracy. Now bwa is not the only one that achieves the balance, but it is arguably the first.

ADD REPLYlink written 7.0 years ago by lh330k

Most of large resequencing projects at Sanger are also based on bwa alignment exculsively.

ADD REPLYlink written 7.1 years ago by lh330k

good to know. i will look into bwa more carefully. do you know the reasons you mention in your last sentence?

ADD REPLYlink written 7.1 years ago by brentp22k
7
gravatar for Larry_Parnell
7.0 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

In my view, a NextGen Sequencing course should have three main data types:

  1. Discovery of variants in human genomes, possibly linking or associating with a disease trait (phenotype). An example of this is the sequencing of "cancer" genomes from a patient's tumor and his/her normal genome (skin, e.g.).

  2. RNAseq - as a means to assess gene expression. Read depth correlates with gene expression levels and detection of variants. This is especially cool when comparing to the genome and one sees imbalance in heterozygotes: The A allele has 78% of the reads while the G allele has 22% of the RNAseq reads. Hmmm.

  3. Environmental genomics samples. What kind of biological diversity do you see in the human gut, in a hot spring, in a brown field site (contaminated site), in the ocean, etc. etc.

These are the three areas where I see a lot of work being done in research labs and in companies. Make the course as relevant to real-world situations as you can. Your students will thank you fir it!

ADD COMMENTlink written 7.0 years ago by Larry_Parnell16k
6
gravatar for Mitch Skinner
7.1 years ago by
Mitch Skinner650
Emeryville, CA
Mitch Skinner650 wrote:

Amazon does have a free "usage tier", where someone who is new to EC2 can run a "micro" instance free for a year, with 10 gigabytes of storage and a certain amount of IO and external internet transfer. CPU and RAM on micro instances are pretty sharply limited, though; RAM is ~600MB, and CPU is only allowed to be fully used in short bursts (3 minutes if I recall correctly, after which your instance gets throttled).

So, if your data size is small enough, and your students don't already have AWS accounts, then you might be able to get away with it.

If you need more CPU or RAM, though, then it's not too expensive to use EC2 if you only run instances for short periods of time ($0.095 to $2.28 per hour in the EU region, depending on how much CPU and RAM you need). If all of the tools can be used from the command line, or over the web, or with X forwarding, then I think it's a good choice; Titus Brown's course materials have a lot of detail about how it could be done.

If you use an instance with an EBS root volume, then you can set up the disk image ahead of time with all the tools installed. The students could each start an instance based on that. And then, at the end of the course, they could shut down their instance (and not have to pay to keep it running), while still being able to spin it up again later to use or to refer back to. While the instance isn't running, the only cost would be for the EBS storage, which is $0.10 per gigabyte per month.

So I'd expect that you could use EC2; it might cost something, but I think if it's done right then the cost would be quite low.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Mitch Skinner650

nice point with setting up a unified image containing all tools. It adds a bit more complexity though, you will have to install the cloud tool on the local computers, fire up the instances and so on. It might divert from the main focus a bit, such that in the end it's a bit more like a cloud course.

ADD REPLYlink written 7.0 years ago by Michael Dondrup43k

The challenge with the micros is that they are not good for computing, but rather for serving web requests. I'd recommend applying for education credits via the link I posted above. Even without them, always a good idea for coursework.

ADD REPLYlink written 7.0 years ago by Mndoci1.2k

@Michael: I usually just use the web console to manage my instances; I installed the EC2 command-line tools for launch/shutdown/etc., but I don't use them very often. If I had to manage instances programmatically, then it would be different, but for a course like this I think the web tool is fine. The main thing you need on the local machines is SSH.

@mndoci: I agree that micro instances are probably not appropriate; I didn't know about the education credits, thanks for the link.

ADD REPLYlink written 7.0 years ago by Mitch Skinner650
6
gravatar for Obi Griffith
5.6 years ago by
Obi Griffith16k
Washington University, St Louis, USA
Obi Griffith16k wrote:

You might also explore the past lectures from the Canadian Bioinformatics Workshop series (scroll down to Past Courses at bottom). They have complete slide sets (pdf|ppt: CC-BY-SA) and even videos of the lectures. Topics in 2011 included "Informatics on High Throughput Sequencing Data" as well as "Exploratory Data Analysis and Essential Statistics using R", "Pathway and Network Analysis on – omics Data", "Informatics and Statistics for Metabolomics", "Microarray Data Analysis", and "Bioinformatics for Cancer Genomics".

ADD COMMENTlink written 5.6 years ago by Obi Griffith16k
4
gravatar for Khader Shameer
7.1 years ago by
Manhattan, NY
Khader Shameer17k wrote:

I have recently seen this Post by Jeff Servers for Nothing, Bits for Free. I think you could get free access to AWS cloud using a non-paid linux AMI.

Disclaimer: I haven't tried this yet.

ADD COMMENTlink written 7.1 years ago by Khader Shameer17k
4
gravatar for Thaman
7.0 years ago by
Thaman3.2k
Finland
Thaman3.2k wrote:

If it's just about 4 hours class then I think open helix tutorial on Galaxy is enough which included all about uploading, preparing, filtering and analyzing data on different server

http://www.openhelix.com//cgi/tutorialInfo.cgi?id=82

ADD COMMENTlink written 7.0 years ago by Thaman3.2k
4
gravatar for Jeremy Leipzig
7.0 years ago by
Philadelphia, PA
Jeremy Leipzig17k wrote:

The EC2 thing has definite pros and cons. We had a fellow at work take the MSU course and he said everything ran very smoothly at the course, but when he got home he couldn't replicate the setup because he didn't know how to do things like alter his PATH, or interpret errors related to missing dependencies.

If you go the opposite route and force students to install everything and encounter real problems, they will resent you during the course but will benefit in the long run.

Last month I taught a session in NGS for CSHL's Programming for Biology. These slides might be helpful to you: http://gorgonzola.cshl.edu/pfb/2010/LectureNotes/ngs2/ngs2.pdf

ADD COMMENTlink written 7.0 years ago by Jeremy Leipzig17k

Just a quick comment in defense of the MSU course here -- that's what we did have them do :). But there is a limit to what you can effectively teach. This year we gave them pre-packaged AMIs. We'll see how that shakes out over time.

ADD REPLYlink written 6.4 years ago by Titus Brown80
4
gravatar for Bioinfosm
5.5 years ago by
Bioinfosm610
earth
Bioinfosm610 wrote:

Too late for Pierre's question, but I found this presentation by John McPherson really extensive and informative. The first half is almost all intro to NGS for first timers:

http://bioinformatics.ca/workshops/2011/course-content

Module 1: Introduction to cancer genomics (Faculty: John McPherson)

PDF | PPT | MP4 (VIDEO)

ADD COMMENTlink written 5.5 years ago by Bioinfosm610
2

never too late ;-) - thanks for pointing out this great resource

ADD REPLYlink written 5.5 years ago by Istvan Albert ♦♦ 74k
3
gravatar for Daniel Standage
7.1 years ago by
Daniel Standage3.7k
Davis, California, USA
Daniel Standage3.7k wrote:

I would agree with the suggestion to use only one alignment tool (maybe discuss another shortly and give usage examples in some take-home notes, but don't spend too much time on it). Perhaps that time could be invested in discussing and using NGS visualization tools (IGV or Tablet, for example). The "no programming skills" scientists I know all have a healthy amount of skepticism for bioinformatics software and like to visually analyze any results coming out the tail end of an analysis. Sometimes these visualization tools are useful in identifying issues that are hard to read using less or vim!

ADD COMMENTlink written 7.1 years ago by Daniel Standage3.7k
3
gravatar for jvijai
7.0 years ago by
jvijai1.1k
United States
jvijai1.1k wrote:

I attended the CSH Adv Sequencing Course and the structure looks similar. The hardest part I thought was for split read aligners and deciphering StrVars. Something to stay away from for now. Do you plan on explaining SM-hashing and BWA? 4hrs is tough; no? I have to agree with previous posters that in 4 hrs, your best bet is to get them to install a local copy of Galaxy and run through 2-3 workflows, save workflows, modify and run again on larger dataset as a homework assignment. ~JVJ

ADD COMMENTlink written 7.0 years ago by jvijai1.1k
1

You can learn more about the Advanced Sequencing Technologies and Applications course here: http://meetings.cshl.edu/courses/c-seqtech12.shtml

ADD REPLYlink written 5.6 years ago by Malachi Griffith16k

ok what is SM-hashing?

ADD REPLYlink written 7.0 years ago by Jeremy Leipzig17k

Typo, I meant as Smith Waterman, hashing and Burrows Wheeler. Sorry.

ADD REPLYlink written 6.9 years ago by jvijai1.1k
2
gravatar for Niek De Klein
6.2 years ago by
Niek De Klein2.4k
Netherlands
Niek De Klein2.4k wrote:

Maybe some normalisation of the NGS data, can be done easily with excel without programming skills, giving them an idea of different biases, where the data comes from and how the final output is generated.

ADD COMMENTlink written 6.2 years ago by Niek De Klein2.4k
2
gravatar for rob234king
4.3 years ago by
rob234king530
UK/Harpenden/Rothamsted Research
rob234king530 wrote:

Again, maybe the threads a bit old now but hopefully this may be of some use. I’ve put together a tutorial website to share pdf bioinformatics tutorials. There are no tutorials online at present.

This website was created to share bioinformatics tutorials and create a dynamic learning environment that does not become dated by allowing the user community to upload and review tutorials, PDF contributions welcome. I would be interested to get some feedback.

http://elvis.ccc.cranfield.ac.uk/CUBELP/faces/login-page.xhtml

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by rob234king530

In this other questions A: New free community tutorial website for bioinformatics learning you mention that you removed all the tutorials form your site. Should you delete this answer redirecting to it?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Eric Normandeau9.6k

Yes, thanks. Corrected now.

ADD REPLYlink written 4.3 years ago by rob234king530
1
gravatar for sarahhunter
4.3 years ago by
sarahhunter590
Cambridge, UK
sarahhunter590 wrote:

EBI offers a number of (in person and online) courses for NGS analysis (one is currently underway as I type!).

see:

ADD COMMENTlink written 4.3 years ago by sarahhunter590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1363 users visited in the last hour