Forum: Best Way To Learn Bioinformatics For System Level Programmers.
gravatar for amjadcsu
6.8 years ago by
Saudi Arabia
amjadcsu80 wrote:


I have a CS degree with experience in Linux system level programming.In my current job at medical research centre doing Next generation sequencing, i am involved in maintaining HPC infrastructure for researchers. To get a better understanding of researchers needs, i would like to learn bioinformatics. Can someone guide me with best books and tools for this transition.


bioinformatics forum • 4.6k views
ADD COMMENTlink modified 6.6 years ago by Drio920 • written 6.8 years ago by amjadcsu80
1 DNA seen through the eyes of a coder

ADD REPLYlink written 6.8 years ago by Biomonika (Noolean)3.1k

Probably not the best book, but good for an easy start: Bioinformatics For Dummies. Advice: talk to researchers regularly, and spend some time on this forum! ;)

ADD REPLYlink written 6.8 years ago by zx87549.7k
gravatar for to.stephen.henderson
6.8 years ago by
to.stephen.henderson50 wrote:

My experience is that there are relatively few good practical bioinformatics books (as opposed to some good bioinformatics algorithm books).

They date very very quickly.

That said I am hopeful for the forthcoming: Bioinformatics Data Skills (Vince Buffalo),

enter image description here

ADD COMMENTlink written 6.8 years ago by to.stephen.henderson50

Interesting, I hope the book will turn out well and avoids the classic trap of being a Unix/Perl/Python book with a little bit of biology mixed into it

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Istvan Albert ♦♦ 85k

I share this frustration with too, which is why I am writing this book. It's the book I wish I had when learning bioinformatics. It's an intermediate book (assumes readers know a bit of a scripting language), as this is what is lacking in current bioinformatics books. Many biologists learn a scripting language and a bit of Unix, and then begin doing bioinformatics. I think this can be dangerous and lead to non-reproducible or incorrect results. My book emphasizes working with data in a careful way using existing robust open-source tools and libraries. Fundamentally, a book on bioinformatics is a tricky thing, because everything goes out of date so quickly in this rapidly changing field. Bioinformaticians are able to keep ahead of changing technology because they have a core skillset - they can easily manipulate big datasets and actively check whether new software is working with their data. My book's goal is to share these data skills. I hope people enjoy it and find it useful!! Folks can tweet or email me if they have inquiries.

ADD REPLYlink modified 10 months ago by RamRS30k • written 6.8 years ago by Vince Buffalo460

Sounds great. A lot what we do is problem solving in the data context, the actual details of which particular tool we run will change all the time.

I am adding it to my pre-order list on Amazon, keep us posted on any developments:

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Istvan Albert ♦♦ 85k
gravatar for SES
6.8 years ago by
Vancouver, BC
SES8.4k wrote:

The best way to gauge the needs of the researchers would be to arrange a meeting with some of the most active groups and have them explain their needs/uses of the cluster. We did this about a year ago at my university and the Sys Admin and IT staff said it was immensely helpful. That will give the researchers a chance to learn some different approaches to using the cluster more efficiently, and it will help someone in your position to find out where money should be spent on infrastructure.

Bioinformatics is so vast I don't think it's easy to give advice on what to learn without knowing the intended applications. The type of research and also the computational skill level will determine what is needed in terms of support. I can tell you that bioinformatics involves a lot of scripting, so being proficient in a scripting language is a benefit. The most helpful thing I can offer is to be aware of what tools are available (e.g., there is a bioinformatics software list by category at SEQanswers). A common pitfall I see of people from different fields is trying to tackle every problem with a custom approach when tools already exist for the job.

ADD COMMENTlink written 6.8 years ago by SES8.4k

I would also add that this is one of those examples of why teams of people are so important now in this field. To be frank, it would take an immense amount of study and practice for a developer or systems engineer, etc to become a bioinformatics expert. It is good to know something about it in your case, but to be a good bioinformatician you need the biology background as well. Ideally you should have the wet-lab researchers, bioinformaticians, and developers/IT/engineers working together.

ADD REPLYlink written 6.8 years ago by DG7.2k
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

I would also look for review papers that summarize a technology or those that have introduced popular tools. Those always have a lot of data that you can use to familiarize yourself with the process.

ADD COMMENTlink written 6.8 years ago by Istvan Albert ♦♦ 85k
gravatar for Alastair Kerr
6.8 years ago by
Alastair Kerr5.3k
Manchester/UK/Cancer Biomarker Centre at CRUK-MI
Alastair Kerr5.3k wrote:

Nothing will be better than taking time to talk to potential users as different fields have different requirements. To highlight this point:

  1. Cryo-electron microscopy benefits for fast GPU-boxes
  2. De-novo assembly of deep sequencing data (e.g. using the Velvet program) requires a server with lots of RAM. (A group here uses a server with 1TB of memory for some more difficult genomes)
  3. Population variant analysis can benefit from a larger clusters (e.g. The GATK pipeline from the BROAD institute)
  4. Deep-sequencing in general can generate a lot of data, some of which can be compressed and intermediate files removed. Managing these files cost efficiently and sanely in a HPC environment is extremely useful.
  5. Mass-spec analysis still uses a lot of commercial software that is pay-by-processor, which can be cost crippling if not installed on a VM
  6. Balance between the latest tool and stability: some users need reproducible results, some need the latest tool or library installed, all (should) require that provenance metadata from the programs and libraries are stored.

Once you know what is needed, I would then familiarise yourself with some of the key tools and repositories. Details of some I have not yet mentioned are in the OBF and I would also look at Bioconductor, myExperiment and Galaxy

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Alastair Kerr5.3k
gravatar for Drio
6.6 years ago by
United States
Drio920 wrote:

I found this book particularly useful when I started to work with genomic datasets. It helped me to get my head around the biology, which in the end I think is what matters the most.

ADD COMMENTlink modified 10 months ago by RamRS30k • written 6.6 years ago by Drio920
gravatar for donfreed
6.8 years ago by
San Francisco
donfreed1.5k wrote:

For NGS, the Broad Institute has a lot of information on their GATK pipeline which is applicable to most NGS pipelines.

And the videos from the Workshop are great.

These resources are the first place I send people interested in NGS and will give you a good idea of what the researchers are trying to do.

ADD COMMENTlink written 6.8 years ago by donfreed1.5k
gravatar for Pavel Senin
6.8 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

It would certainly help if researchers could communicate to you their research hypothesis and what is the quantitative result they are looking for -- a table, or some number. Knowing this goal, not only helps to plan a sequence of steps that will get you there, but greatly contributes to your motivation and understanding of the "bigger picture". In turn, knowing the sequence of steps and tools chain, together with a researcher you will be able to assess the risks - i.e. potential biases tools can introduce and so on - which is also important.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Pavel Senin1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour