What Kind Of Bioinformatics Tutorials Would You Like To See Online?
10.0 years ago
User 59 13k

This is a two-part question, so bear with me!

I work on Knowledgeblog which is a lightweight publication system for scientific code, data, and results based around WordPress and extended by an ecosystem of off-the-shelf and custom plugins.

We're currently putting together a 'writeathon' to provide some bioinformatics tutorial material on a Knowledgeblog. What topics do people think would be good to cover?

We're looking for tutorials that might be good for all levels - computer scientists interested in learning some biology, biologists getting interested in bioinformatics, and of course tutorials aimed at bioinformaticians by bioinformaticians.

The second part of the question is more of a call to arms. We have a travel budget, and would be happy to spend some of this encouraging people to come to Newcastle for a day (Tuesday 21st June) to write away with us. Obviously this is more likely to occur if you're in the UK, but close international travel could also be supported in a limited number of cases.

All tutorials will be given a citable DOI, and no promises, but we will go for PubMed inclusion if we get enough content. You could also contribute remotely on the day, should travel be impossible but you still want to get some content up!

Suggestions for tutorial topics under this question would be great, votes will allow us to work out what topics we cover and who we invite! If you're interested in joining us in Newcastle at the end of June then please drop me an email directly (d.c.swan@ncl.ac.uk).

For examples of existing Knowledgeblogs you can have a look at Ontogenesis and Taverna kblogs.

tutorial education
Will authors be able to edit the tutorials after the review?

Good initiative and best of luck! To add to Jan's question: will authors be able to edit tutorials that they have not written themselves? This is vital IMHO.

Jan, very good question - the question of whether an article is canonical is important. The way we work this right now is that if new versions are edited, the old versions remain on the site, linked to at the bottom of the article.

Michael, it doesn't work so much as a wiki. Articles can of course have multiple authors, but I don't think we envisage people changing other peoples articles! The idea would be to have more of a post-publication review - in the comments, or via trackbacks/pingbacks to other blog discussions, that the author could address at some point.

Good luck with this Daniel. Are all images and text under a creative commons (or similar) licence? It would be nice to be able use material from the tutorials in both workshops and seminars without breaking copyright. On a related note, do you have recommended image resolution for the wiki or should the images link to a higher resolution version? This would be idea for their inclusion in other seminars.

Alastair, good point, I think we all feel an appropriate CC licence should be in place for this, but there is no decision on this yet. I guess the image resolution depends on how you author the tutorial. If they're embedded in a Word document and then posted, I suspect they would remain at 'Word' resolution. If you were to edit the post in the WordPress interface, you would be able to exercise more control over the formatting. We would support both endeavours, but the idea of Knowledgeblog was to allow people to post articles to the system using whatever their current toolchain is

Regarding whether it should be a wiki, definitely it should not! I might want to publish a tutorial using for solving a problem X using a tool Y, I don't want others editing it to use a tool Z because the community believes a tool Z is better. They should write their own tutorial on using a tool Z.

10.0 years ago

Excellent effort Daniel ! Best wishes in advance.

I would start with a section on Statistics followed by in-depth tutorial. Statistical concepts will be reference material for various sections in the tutorial section

I think it will be interesting to see the tutorials organized by biological data / experiments.

For example:

Genome sequence:

• Sequence similarity search
• NGS/WES (QC, alignment, variant calling, annotation)
• Phylogeny

Gene expression:

• Mining public data resources for expression data pertaining to specific cellular events
• Analysis of gene expression data using BioConductor packages

GWAS:

• Background on Statistical Genetics
• DbGAP
• Visualization tools

Protein sequence:

• Homology
• Domain/Motif assignment
• Analysis of unassigned regions
• Sequence classification (family, super family, fold level)

Protein Structure:

• Modeling
• Structure analysis (Hydrogen bond, solvent accessibility, disulphide bonds, higher order interactions)
• Structure classification
• Quality assessment of protein structures

Protein-protein interaction:

• Databases
• Visualization of PPI (Cytoscape, BioLayout Express 3D etc)
• Reasoning over the data

Others:

• Machine learning (Discuss various aspect of soft computing algorithms using published datasets)
• Data integration and Data mining topics
Looks like a fabulous beginning for an advanced course in bioinformatics!

Thanks Larry. Do you think we could really organize such a course that transcend between genome and proteome ? EMBO is doing great job by providing grants for teaching, is there anything similar in US ?

Thanks Khader, some good suggestions there and at least some areas we have some expertise in that we could leverage locally.

Thanks Daniel. Please let me know if I can contribute one or two tutorials. I will be happy to be a part of it !

10.0 years ago
Dave Clements ▴ 610

A few approaches to consider:

1. For software installation/configuration tutorials, I recommend the approach used in the GMOD Tutorials. These include starting virtual system images (these use VMware), sample data, and step by step instructions. Most of these came out the annual GMOD courses and reflect exactly what was covered in the course. One drawback of having a starting system image is that those images get stale and need to be refreshed periodically (at GMOD this happens once a year). The instructors create these tutorials in this format for the course.
2. For using software, short video tutorials work very well. The Galaxy Project puts out wildly popular quickies, video tutorials that highlight how to do specific tasks in Galaxy. These only require a few minutes from the user (but take a long time to make).
3. Finally, I also like the OpenHelix approach. OpenHelix creates comprehensive hour long video and slide based tutorials that include worked examples. These take an enormous amount of time to make, but excel at being thorough and clear.
i have a lot of respect for GMOD but I feel like providing ready-to-use virtual instances leaves beginners helpless when they will inevitably need to install dependencies and muck with their PATH to get something working. This is something I've seen first hand.

openhelix is a great resource. It's just a shame not all of the tutorials are free :( The galaxy webcasts are also excellent

Dave, We've used VM's for tutorials before for our Master's course, so not an alien idea to us. I think the idea of more screencast style tutorials is something we had not necessarily considered but perhaps should.

At Ensembl we also have quite some short video tutorials, focusing on specific tasks in Ensembl and BioMart. These are made using Camtasia (http://en.wikipedia.org/wiki/Camtasia_Studio). They are made available through YouTube (http://www.ensembl.org/info/website/tutorials/index.html). They seem to be rather popular, but take quite a lot of time to make ....

Jeremy, I agree that starting with ready-made virtual systems can leave users frustrated when they get outside the safety of that system. You can set "traps" in your teaching examples and then talk about things like checking logs, the screen command and so on, but that won't be comprehensive. I don't have a good idea on how to teach system debugging skills (in any depth) and bioinformatics tools in a short course.

10.0 years ago
Gareth Palidwor ★ 1.6k

On a more advanced level I'd like to see:

- Multiple testing corrections
- Getting started with medline text mining
- Building bioinformatics web apps backended by SQL
- Integrating multiple large data sets
- Bioinformatics projects: structure and lifecycle


 An additional one I thought of this morning was "databases in bioinformatics". In my experience, bioinformatics people use text files or SQL databases for data persistence and access, and not a lot else. A tutorial outlining the other options (berkeley DB, key-value stores, lucene, object serialization, object oriented databases, etc) with examples for each may give even experienced bioinformatics developers some new tools to work with.

I'm pretty sure we're going to hit Integration as a topic anyway, but that's a good list. I might get one of our stats lecturers in to cover MTC, as I think it's a topic only ever mentioned 'in passing' with datasets!

10.0 years ago

My suggestion is not a topic but an approach. The tutorial certainly should be hands-on - there is no doubt about that - but it should go further and offer an interactive feature or critique/accolades from the tutorial leader or writer. A tutorial is about learning and bioinformatics is best taught in a more interactive style than by data dump/slide dump/read the notes on your own time.

agreed, my preferred approach is a standard data set and a progressive series of analyses applied to it, each building on the previous.

Larry, you're right I think there's a lot of scope for critique in something like this which is often lacking from the format.

10.0 years ago

my wishes :-)

• how to write a plugin for Taverna2
• how to "something-bio" using "language-1" when your favorite language is "language-2"
• the internals of NCBI blast
• biostatistics for dummies
• ...
how to write a taverna plugin is in the 2x user manual but I can't point you to a link as the taverna web server is down for 2 days.

@pi , the documentation for T2 is, from my point of view, incomplete & unreadable.

Love the cross-language idea :)

We've already got a knowledgeblog for taverna taverna.knowledgeblog.org). If anyone wants to write a "how-to write a plugin", this would be a good place to add it.

There is a tutorial on writing plugins for Taverna 2 here.

@alaninmcr , Thanks ! this tutorial looks far more complete than the last time I saw it. (I removed my previous comment about it)

10.0 years ago
Gareth Palidwor ★ 1.6k

I prefer task oriented tutorials that use a standard data set to demonstrate a bunch of standard analyses. I do a lot of bioinformatics consulting for scientists and grad students and much of the work is just variations on the same tasks, for example:

• Microarray data

• Quality analysis
• Normalization
• Annotation
• Fold change analysis
• Gene Ontology enrichment analysis
• ChIP Seq

• Quality analysis
• Peak identification
• Peak annotation (association with genes)

Scripts in perl and R are helpful, but I've found TM4 MeV to be particularly useful for non programmers dealing with microarray data.

I've worked on a few tutorials similar to what you describe; the affymetrix one (http://www.stemcore.ca/projects/SCNcourse) is getting rather old (doesn't handle the exon/gene chips), and the ChIP Seq one (http://regulome.ca/2010workshop) should be updated as well.

My background is array data, so that's definitely along the lines of the kind of tutorials I was going to try and get written myself.

The Chip-Seq work would be interesting, I've done a bit of of this recently, and the QA/PI stage would be of great interest.

I always do a QA step first; not much point in proceeding with analysis of crappy data.

10.0 years ago
Genotepes ▴ 950

Most of the interesting things have been listed and covered;

Annotations tools for GWAS results - database and visualisation scripts

Coalescent models

Haplotype and imputation analyses

This is more tutorial centered on problems to solve rather than focused on language or a database.

Christian

GWAS is an interesting topic, but one we'd need someone to come in and do! Suggestions? Volunteers? :)

I could write one or two things although there are researchers more experienced, more native english speakers (and more in UK and even in Newcastle - Heather Cordell if I remember).

But definitely on some issues around GWAs I can write short notes.

10.0 years ago
Hranjeev ★ 1.5k

As for the tutorials, I'd like to see lots of existing papers reverse engineered with its sample datasets. So that we can walk through them step-by-step and know that we got them right. This is much like questions but only that the answers are worked out for you. Probably, much like a journal club only that it is online.

And, the tutorials also can have little pointers/links to other background reading materials which can be comprised of fundamental facts or structured review articles or something in line of that.

A great example for this is: Sémon, M., Lobry, J.R., Duret, L. (2006) No Evidence for Tissue-Specific Adaptation of Synonymous Codon Usage in Humans. Molecular Biology and Evolution, 23:523-529. which has online data sets with interactive we based R (!) so you can reproduce their analysis completely (http://pbil.univ-lyon1.fr/datasets/SemonLobryDuret2005/)

Jean Lobry does a lot of this sort of thing, check the "online reproducibility" links: http://pbil.univ-lyon1.fr/members/lobry/

HRanjeev, this is something we've been thinking of doing with Knowledgeblog anyway. The thinking at the moment is more of an 'enhanced paper' where data and code is embedded into the article and can be 'read' by R, so that the work can be recapitulated on the fly and checked that what is published is indeed correct. Nice to see someone is in line with our thinking!

Great thinking! Since we are in the internet age, glad that someone is actually considering to take it beyond traditional publishing mode. I'm actually excited to see how your concept flourishes. I'm following the academia.edu site also but I don't see it as an interactive avenue just yet. Sometimes the authors don't 'feel' the tangible credit to share their comments or even work on a public peer-review process. Hope this can different with Knowledgeblog. Good luck!

That was an excellent resource gawp. Something new to me and Jean Lobry is really doing a good job there.

10.0 years ago
Lythimus ▴ 210

I think Khader Shameer covered the spectrum fairly well. What I would like to see personally though is a primer on converting a command line-based pipeline into Galaxy. I'm becoming a fan of it but am personally having some issues with some of the advanced features and frankly can't find all the information I'd like to about the capabilities of Galaxy such as if the load balancing (Torque/PBS I believe) is customizable or if it does such a good job I wouldn't need to mark tasks as disk, RAM, or CPU intensive.

I believe there's a fair-sized market for this and believe it would ultimately render people's workflows more accessible. Not to mention strengthen the Galaxy framework as people add more tools and datatypes to it.

Entering edit mode

Great idea - I was going through the Galaxy docs for this last week actually, and you're right I think this would have very broad appeal.

10.0 years ago
Nataly • 0

As a biologist doing genetics, you will make my day, Thx

