0
48
Entering edit mode
7.5 years ago

### Update (October 26, 2016)

in the meantime the book has moved from LeanPub to GitBook. It also turned out to be a lot more demanding than I thought. It is coming together though:

Looking for contributors for various chapters:

1. dbSNP
2. 1000 genomes project

and many others.

I have been teaching bioinformatics courses for a few years now and I have always felt that existing resources were inadequate.

Most are either too programming and unix oriented or too focused on one particular "protocol" ignoring alternatives that may produce different results. In addition most resources tend to focus on installation and running the tool rather than understanding the outputs. Disclaimer: I am guilty of this as well! I always felt that I had to start from zero each time I write a guide and towards the end there is too much material already and I have to cut short at the most interesting parts. But that is because there has never been an updated and reliable resource that I can refer people to. Until now.

I am starting a "bioinformatics handbook" resource it is called the Biostar Handbook. I would like it be a repository of practical advice on bioinformatics methods, a resource that is useful to both beginners and advanced users, a collection of curated experiences of bioinformaticians around the world. The book will be comprehensive with ebook and online components that will continue to grow and expand over the years. It will come at very low cost of about $25 to ensure that the task of maintaining, correcting and supporting it won't solely require personal enthusiasm and could be contracted out if necessary. I would like to invite everyone to contribute via GitHub: you will retain authorship, copyright and distribution rights on all content you create. And since we are creating the ultimate guide to Bioinformatics ;-) I think it will be a great adventure for everyone involved. Contribution guide: http://biostarhandbook.com/contribute.html Book website: https://leanpub.com/biostarhandbook Help us create the best bioinformatics resource that was ever conceived! biostar-handbook • 5.5k views ADD COMMENT 2 Entering edit mode ### Update in the meantime the book has moved from LeanPub to GitBook. It also turned out to be a lot more demanding than I thought. It is coming together though: http://read.biostarhandbook.com/ Looking for contributors for various chapters: 1. dbSNP 2. 1000 genomes project and many others. ADD REPLY 1 Entering edit mode Oh this is interesting. But what happened between 13 months ago (when the thread was created) and 12 hours ago (when you posted an update)? Are people contributing or is this all 'just' your effort as far? ADD REPLY 0 Entering edit mode For me this worked out a bit like software development - where some code that I write is not published. The previous year was mostly exploring what works and what does not. After the announcement I put a lot work into it, there is an almost full book worth of material that I wrote and then I taught from it for a semester in the Spring of 2016. In the end I disliked that format and ditched it (though I have reused some chapters). That book was primarily in pdf format, then there were slides based on the book and there was an associated website with code snippets - it ended up a bit disjointed and confusing - I myself got confused after a while and could not find what I was looking for. The lessons that I learned from that experience made me rethink the book format. So this is the Book 2.0 with 1.0 staying in the drawer. ADD REPLY 0 Entering edit mode Ok, if yourself are confused, then I don't feel bad that I am as well. I do understand that this is work in progress, anyway: 1 The release date December 2016 is not relevant no more I assume...? (Since some chapters are not there at all and some other look like they could use some work) 2 So am I right to assume that what we see online on github is the book 2.0? And the pdf and additional material is ditched? Because it makes no sense to edit the github stuff, if all we see is just 'additional' material and there is more somewhere else that we don't know about. Did you also ditch the idea of releasing it as hard copy? (Just aiming for open source online version of it? Or Do you still think about doing a proper book, which, in my opinion has quite a few consequences ... copyrights, quality, etc) 3 Do you want to keep it that an author is responsible for a chapter? What happens if somebody just changes little bits of a chapter - there could be a conflict with an author of a chapter. (Except if you wrote most of it and you say you don't care, that makes it easier) 4 If you want people to contribute, should it all be via github? Is that the plan? I think, especially after your experiences the last year, a little bit of a plan is needed. So far, not too many people did contribute, is that correct? ADD REPLY 0 Entering edit mode I am going to make a new top post on this once I get more feedback. Here is what I learned over the past year and what my current approach is: 1. December 1st, 2016 is the release date for what we can call Minimally Viable Product - a book that is a good introduction and discusses most aspects of modern bioinformatics but can be greatly improved. I have all the materials, it is just a matter of putting them together. Of course it will be a basic and minimal but appropriate for beginners. 2. I don't see the book being ever being "finished" - it will grow new chapters over the years, and sections will become obsolete others will be greatly reworked. 3. There will be related chapters on more advanced techniques - they will be distributed from the same site but may have a separate entry point as to not over complicate the introductory content. 4. The book will only be free to access for moderators and people with high scores on Biostar - for everyone else there will be some cost - probably subscription based$25/year or so. This will be used to maintain the resource and Biostars. Eventually I can see us being able to offer payment for editorial type of jobs or contributions.Make no mistake - editing a book is a tedious and not so fun job.
5. The book is designed for the web but will also come in pdf and ebook formats as well.
6. Authorship will follow the standard scientific practice - authors that substantially and materially contribute will be listed as authors on a separate authorship page, people that contribute small changes will be listed as contributors in an acknowledgment section.
7. Contributions need to be made via : https://github.com/biostars/biostar-handbook-contrib create a subfolder, add a your markdown files/data there. We'll have editors merge content. People may advance to different roles, but first we have to see who wants to contribute in which way. The main gitbook repository will be private and invitation only.
0
Entering edit mode

How would you feel if I replaced all the installation code with conda? Is that really less informative?

0
Entering edit mode

I never got to use conda myself hence I don't know what state it is in. I always considered it a python package manager but it may be more than that.

I do know from experience that homebrew is robust, and stays out of the way - the downside is that is OSX only.

If conda works comparably well I'd be more than happy to add that as either an alternative or as replacement if it turns out to be better.

0
Entering edit mode

For reference, conda is now largely the preferred packages in Galaxy...so that covers the popular stuff (bowtie2, bwa, salmon, samtools, etc.) at least. You can also install bioconductor packages with it.

0
Entering edit mode

OK so I will slowly provide replacements to the installation routines throughout this guide. By providing a universal means of specifying project dependencies, IMHO conda/bioconda have tremendous momentum in the bioinformatics community well outside of python.

0
Entering edit mode

Send me your github account and I can add you as collaborator.

This applies to everyone else that wants to collaborate.

0
Entering edit mode

I also feel there should be links to actual Biostars questions and tags at the bottom of each page. This handbook seems a bit divorced from Biostars as it stands.

1
Entering edit mode

Sounds good to me.

Also you'd be pleased to know that based on your suggestion conda has been integrated to the installation instructions up to the point I am hoping to be able to provide a single link that installs everything in one shot like so:

0
Entering edit mode

Great to see it. Can we categorize for which areas contributors are needed?

1
Entering edit mode

We need contributors for everything that is missing ;-)

On a more serious tone - everything can be improved or expanded. What I am trying to address and I am hoping that this will come through is to go beyond the "typing velvetg hashsize=35 cov=auto is how you assemble a genome "- because that is not true really.

When we run these tools but we continuously asses and evaluate the results - I really want to demonstrate and teach the thought process rather than just the method. Any contribution that helps clarify and strengthen this aspect is welcome.

0
Entering edit mode

How in detail did you want to discuss the sequencing technologies (can't help but mention that your data about MinION is outdated)? In my humble opinion it's vital to have a good understanding of the technology you are working with before starting an analysis.

1
Entering edit mode

Some of the challenges with the new technologies is that they evolve so fast. The chapter on MinION and other technologies should reflect the state of the art - and would need to be updated regularly. The level of detail needs to be the one that has relevance to a bioinformatics data analysis.

How the technique works can be written out in a few paragraphs - what the data looks like and how to deal with it - is probably more complicated and may need other sections.

One realization that I felt liberating was to not feel the pressure to provide complete and encyclopedic content. We'll let that come later if ever. We'll just put in important things that are of high relevance and see what happens.

At the same time data from these instruments is being deposited and stored in SRA hence we need to also know what it looked like last year - we may need to re-analyze that as well. So we have to address more outdated technologies as well. Here is the content of the SRA broken down by platforms (how many runs per platform) - I think these should be mentioned to some extent as well:

1,231,420 ILLUMINA
182,407 LS454
31,865 ABI_SOLID
31,223 PACBIO_SMRT
14,550 ION_TORRENT
3,736 COMPLETE_GENOMICS
869 HELICOS
398 CAPILLARY
96 OXFORD_NANOPORE
1
Entering edit mode

I was always thinking about this but not as a book. I wished there could be something like a wiki page in biostars with summarized information (from biostars) on very common problems/pipelines to avoid duplicate questions and also which should serve as reference material for beginners. But a book will be really a very good initiative.

0
Entering edit mode

The book sections will be a high level overview of the "what and why", with examples that may not be runnable without extra setup. The web sections will present every single command systematically.

I found that these two goals cannot be satisfied in a single resource. They either make the book way to long or the code too verbose. You can't really interrupt the commands with lengthy interpretation as they require a very different mindset.

0
Entering edit mode

The NGS wikibook has those kind of aims and contains a lot of migrated content from the seqanswers wiki. Earlier this year there were talks about reinvigorating the project but I'm unsure of its current status.

1
Entering edit mode

I think efforts like this need to be framed in the context of a bigger goal beyond just create a comprehensive resource.

My goal is to use and reuse this resource in my courses/workshops and other training efforts. Up to this point I always create a brand new site with partial overlap with the old one and that is not a good way of approaching this. I have at least four sites like that. The same thing happened with the MSU Bioinformatics workshop, there are six or seven versions of it, all in various states of abandon now as the old ones are not fully maintained and may contain outdated information. This just makes bioinformatics even more confusing, you can easily find these guides that don't actually work anymore.

I am hoping that this will align with other people's goals and we can build a resource we can all use and keep up to date for reasons other than just greater good - but actually saves time and makes us all more productive.

0
Entering edit mode

Neat initiative. It will be ebook/online only or what? I'm guessing this will be a great way to write guides for software you get published.

Would you even consider a chapter for dinky scripts or is it for published software only? As an example: I have one that allows you to access biomart from the command line, and while it is useful (and could probably be used for 95% of the "convert my genes" qs here) it is obscure. Would/could it get a short chapter in the biomart/conversion section or what?

1
Entering edit mode

I think scripting is what most people need from bioinformatics. Putting together the pieces in a coherent and useful way that solve a real problem.

So yes, what you describe is what I hope the book will primarily do - show how to get stuff done in the real world.

What we will do to these scripts is bring a level of consistency and uniformity to the formatting across all, document them the same way etc.

0
Entering edit mode

Somehow I totally missed it. I would love to contribute. I am working on my dissertation right now but would love to participate once I am done.

0
Entering edit mode

You are most welcome, please do! I've been hacking away at this furiously.

I am now planning a whole other section that is modeled after Mission Impossible: you have 1 minute to analyze the Ebola Genome where one would show how to solve a realistic problem in a very short amount of time.

Ok perhaps the goal won't be to do it in just one minute but say a way that is doable on a laptop perhaps by solving the problem on a subset of the data (shortest chromosome). The goal is still to demonstrate a complete workflow.