Forum: Advice For Newcomers To The Bioinformatics Field
gravatar for Medhat
6.8 years ago by
Medhat8.6k wrote:

What advice would you like to give to newcomers to bioinformatics field? What piece of information would make your life much easer if someone had told you in the beginning of your career in bioinformatics, except having a biostars account :)

bioinformatics forum • 13k views
ADD COMMENTlink modified 21 months ago by hd1fernando0 • written 6.8 years ago by Medhat8.6k

related: "I want to learn bioinformatics! A guide for complete beginners." by Nick Loman

ADD REPLYlink written 6.7 years ago by Pierre Lindenbaum127k
gravatar for KCC
6.8 years ago by
Cambridge, MA
KCC4.0k wrote:

There were a few difficulties that I had when I first joined the bioinformatics that I would classify as cultural or psychological:

  1. Sometimes, there is no tool that does what you want. Although you might think it's basic and obvious and of course somebody else must have done this before, sometimes, there just isn't a tool that does what you want. Either it's so simple that everybody always reinvents the wheel since it's just 5 minutes of programming, or each lab has their own version which seemed too unimportant to publish or the approach that seems obvious from the point of view of your research, really is special to just your research (ie. relatively simple to do, but not needed by most). The solution is to google for a few days to get a sense of what's out there. Ask a question on biostars including what you have found so far. Accept that if the community says there is no such tool, there probably isn't. When I was new to bioinformatics, it was hard to say to my wet-lab colleagues that there was no simple tool that did the kind of analysis they wanted, that I would have to write something myself. Quite often they would bring me a paper where the analysis had been done implying the solution must be out there somewhere if I googled hard enough.

  2. Sometimes, there are many, many tools that do what you want and there is no hierarchy or rating. This can be frustrating. Sometimes, you are lucky and there is a ranking of tools in a review paper. Quite often, this review will be a bit out-of-date. The solution is to pick a few that sound good and try them out. Look at how they work with your data. If you are lucky, they give very similar answers. In that case you might go with the one that is most widely used in the journals you want to publish in and is easiest to use. If they give widely different answers, you might have to keep looking at the results of each tool until you have a sense of which one produces believable results in terms of the biology. You might have to do a few experiments as well. Try to come up with bioinformatics tests as well. Eventually, if you work at this long enough, you will get a sense of which tool seems to work best for your data.

  3. Many questions depend highly on your data, and so there is no clear answer to what might seem like an obvious question. I work a lot with sequencing and there are many known biases to sequencing, for instance PCR bias. So, if you tell me that you see anomalies in the GC content of your data, I can immediately say that sometimes there is PCR bias. However, if you ask me what went wrong with your particular sample, it's hard. The amount of detective work involved ends up being almost as much work as a full research project.

  4. Many tools will require huge amounts of work to install and run. If you are used to the type of programs that one encounters as a casual OS X or Windows user, then you are used to downloading software, clicking install and having things work out. However, many of those tools are vastly more polished than the software you will encounter as someone new to bioinformatics. Now installing software might involve multiple steps including installing other software. So you will need to accept the possibility that installing the software you need to do an analysis, might take as long or longer than the analysis itself.

  5. Things will often take a long time to run. Find things to do while your software is running. Read journal articles, answer emails, discuss things with colleagues. Accept that your life probably involves a lot of waiting now.

  6. You will have to be proactive about software support. If the software you are using doesn't work then you might have to email authors, join mailing lists, ask questions on biostars and dig through forums to get the answers. I once had a bit of software that took me more than a year to figure out how to use. I emailed the authors several times and asked questions on biostars and still it took a while.

  7. Sometimes you have to get answers in weird places. I never feel very happy about telling my non-computational colleagues that I found some information on how to use a tool in a online forum like '' or '' It doesn't feel like a very solid position to take. It's even worse if the person who wrote the answer is somebody with a name like 'burt5000'. You have to take these moments in stride and test what you have been told. You can only do your best.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by KCC4.0k
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

Install linux


  • the fundamentals of Biology
  • command line
  • bash
  • make
  • the NCBI E-Utils/Biomart

forget about

  • windows
  • GUIs
  • microsoft excel
  • microsoft excel
  • did I mention microsoft excel ?

later, learn:

  • a scripted language
  • a compiled language
  • a RCS
  • put all your new knowledge in a blog
ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Pierre Lindenbaum127k

yes u did mentioned Microsoft :)

ADD REPLYlink modified 21 months ago • written 6.8 years ago by Medhat8.6k

I couldn't agree more with regards to Microsoft excel. I saved output from wannovar in excel format and later discovered a gene name had been changed to a date. Thanks excel.

ADD REPLYlink written 6.8 years ago by Matt Miossec330

For the sake of discussion, it would be good to define some of the terms you use. For example, what is a 'Microsoft' anyways? :P

ADD REPLYlink written 6.8 years ago by Eric Normandeau10k

as far as i can understand Pierre talks about Microsoft excel and windows op

ADD REPLYlink written 6.8 years ago by Medhat8.6k

I was just joking ;)

ADD REPLYlink written 6.8 years ago by Eric Normandeau10k
gravatar for Damian Kao
6.8 years ago by
Damian Kao15k
Damian Kao15k wrote:
  • Use a naming convention for your files.
  • Never assume anything works 100% correctly out of the box. Always spot check after you run a script/software package. I am still learning this...
  • Don't get too caught up with the methods and forget the question.
  • Know your file formats.
  • Start a blog. You'll find describing what you are doing to an internet audience will allow you to see holes in your work. Also a great way to store code.
  • There is no perfect data. Sometimes you just have to accept that no amount of massaging the data will make your analysis that much better.
ADD COMMENTlink written 6.8 years ago by Damian Kao15k

The perfect point : 

  • Don't get too caught up with the methods and forget the question.
ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Medhat8.6k
gravatar for zam.iqbal.genome
6.8 years ago by
United Kingdom
zam.iqbal.genome1.7k wrote:

Learn how to do things in unix (if you can't already). Becoming familiar with the filesystem, shell scripts, downloading, unzipping, installing, compiling. All these things seem obvious once you've been in the field a while, but they are a barrier to people - once overcome, many things are so much easier!

ADD COMMENTlink written 6.8 years ago by zam.iqbal.genome1.7k
gravatar for Maxime
6.8 years ago by
Maxime70 wrote:

another point that I think was forgotten (even if it's in fact related to the blog idea) :

  • be social, talk to people, talk to biologists, talk to computer scientists, the more point of view you got, the better your own understanding is
ADD COMMENTlink written 6.8 years ago by Maxime70

funnily, as a professional asocial, I've upvoted your answer :-)

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum127k
gravatar for Christian
5.9 years ago by
Cambridge, US
Christian2.8k wrote:

I recently gave a talk at my former university titled "How to be a bioinformatician". You might find some useful slides in here:

ADD COMMENTlink written 5.9 years ago by Christian2.8k

Thanks a lot :) I wish I can give you more than +1

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Medhat8.6k
gravatar for Ben
6.8 years ago by
Edinburgh, UK
Ben2.0k wrote:

In addition to the great answers already given:

  • Set up an RSS reader with some relevant pubmed search terms/authors, some journal/preprint feeds, relevant blogs (Google Reader we hardly knew ye)
  • Version control everything, not just production scripts but also LaTeX documents, SVGs ..., try to use meaningful commit messages even in private repos
  • Just because you learnt (e.g.) Python first, doesn't mean you should automatically use matplotlib rather than taking the time to learn R and ggplot2
  • Master a text editor (preferably Emacs)
ADD COMMENTlink written 6.8 years ago by Ben2.0k

I'm having a hard time resisting the temptation to edit 'Emacs' for 'vi' :)

ADD REPLYlink written 6.8 years ago by Eric Normandeau10k

My advice: don't get caught up in "software X versus software Y" arguments :)

ADD REPLYlink written 6.8 years ago by Neilfws48k
gravatar for Eslam Samir
3.2 years ago by
Eslam Samir100
Germany / Würzburg / Universität Würzburg (IMIB)
Eslam Samir100 wrote:

I would like to thank you all for these precious information. After working in this field, I would like to add "please try to watch some MOOC but you should do it yourself. After about 3 month I was able to make a program myself which I shared for the public benefit.

Good Luck all

Sequence Database curator.

illustration of both approaches

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Eslam Samir100
gravatar for hd1fernando
21 months ago by
hd1fernando0 wrote:

Maybe you can have your answer here:

The Biostar Handbook. A bioinformatics e-book for beginners.

ADD COMMENTlink written 21 months ago by hd1fernando0

This question is 5 years old. It is not about reading book, it is about good practice and things you should learn/do that can't be found in books :)

ADD REPLYlink written 21 months ago by Medhat8.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2070 users visited in the last hour