Forum:Advice For Newcomers To The Bioinformatics Field
9
33
Entering edit mode
7.9 years ago
Medhat 8.9k

What advice would you like to give to newcomers to bioinformatics field? What piece of information would make your life much easer if someone had told you in the beginning of your career in bioinformatics, except having a biostars account :)

bioinformatics Forum • 14k views
ADD COMMENT
1
Entering edit mode

related: "I want to learn bioinformatics! A guide for complete beginners." by Nick Loman http://pathogenomics.bham.ac.uk/blog/2013/07/i-want-to-learn-bioinformatics-a-guide-for-complete-beginners/

ADD REPLY
29
Entering edit mode
7.9 years ago
KCC ★ 4.0k

There were a few difficulties that I had when I first joined the bioinformatics that I would classify as cultural or psychological:

  1. Sometimes, there is no tool that does what you want. Although you might think it's basic and obvious and of course somebody else must have done this before, sometimes, there just isn't a tool that does what you want. Either it's so simple that everybody always reinvents the wheel since it's just 5 minutes of programming, or each lab has their own version which seemed too unimportant to publish or the approach that seems obvious from the point of view of your research, really is special to just your research (ie. relatively simple to do, but not needed by most). The solution is to google for a few days to get a sense of what's out there. Ask a question on biostars including what you have found so far. Accept that if the community says there is no such tool, there probably isn't. When I was new to bioinformatics, it was hard to say to my wet-lab colleagues that there was no simple tool that did the kind of analysis they wanted, that I would have to write something myself. Quite often they would bring me a paper where the analysis had been done implying the solution must be out there somewhere if I googled hard enough.

  2. Sometimes, there are many, many tools that do what you want and there is no hierarchy or rating. This can be frustrating. Sometimes, you are lucky and there is a ranking of tools in a review paper. Quite often, this review will be a bit out-of-date. The solution is to pick a few that sound good and try them out. Look at how they work with your data. If you are lucky, they give very similar answers. In that case you might go with the one that is most widely used in the journals you want to publish in and is easiest to use. If they give widely different answers, you might have to keep looking at the results of each tool until you have a sense of which one produces believable results in terms of the biology. You might have to do a few experiments as well. Try to come up with bioinformatics tests as well. Eventually, if you work at this long enough, you will get a sense of which tool seems to work best for your data.

  3. Many questions depend highly on your data, and so there is no clear answer to what might seem like an obvious question. I work a lot with sequencing and there are many known biases to sequencing, for instance PCR bias. So, if you tell me that you see anomalies in the GC content of your data, I can immediately say that sometimes there is PCR bias. However, if you ask me what went wrong with your particular sample, it's hard. The amount of detective work involved ends up being almost as much work as a full research project.

  4. Many tools will require huge amounts of work to install and run. If you are used to the type of programs that one encounters as a casual OS X or Windows user, then you are used to downloading software, clicking install and having things work out. However, many of those tools are vastly more polished than the software you will encounter as someone new to bioinformatics. Now installing software might involve multiple steps including installing other software. So you will need to accept the possibility that installing the software you need to do an analysis, might take as long or longer than the analysis itself.

  5. Things will often take a long time to run. Find things to do while your software is running. Read journal articles, answer emails, discuss things with colleagues. Accept that your life probably involves a lot of waiting now.

  6. You will have to be proactive about software support. If the software you are using doesn't work then you might have to email authors, join mailing lists, ask questions on biostars and dig through forums to get the answers. I once had a bit of software that took me more than a year to figure out how to use. I emailed the authors several times and asked questions on biostars and still it took a while.

  7. Sometimes you have to get answers in weird places. I never feel very happy about telling my non-computational colleagues that I found some information on how to use a tool in a online forum like 'biostars.org' or 'seqanswers.com' It doesn't feel like a very solid position to take. It's even worse if the person who wrote the answer is somebody with a name like 'burt5000'. You have to take these moments in stride and test what you have been told. You can only do your best.

ADD COMMENT
28
Entering edit mode
7.9 years ago

Install linux

Learn

  • the fundamentals of Biology
  • command line
  • bash
  • make
  • the NCBI E-Utils/Biomart

forget about

  • windows
  • GUIs
  • microsoft excel
  • microsoft excel
  • did I mention microsoft excel ?

later, learn:

  • a scripted language
  • a compiled language
  • a RCS
  • put all your new knowledge in a blog

EDIT 2020:

  • learn nextflow and/or snakemake
ADD COMMENT
2
Entering edit mode

yes u did mentioned Microsoft :)

ADD REPLY
1
Entering edit mode

I couldn't agree more with regards to Microsoft excel. I saved output from wannovar in excel format and later discovered a gene name had been changed to a date. Thanks excel.

ADD REPLY
0
Entering edit mode

For the sake of discussion, it would be good to define some of the terms you use. For example, what is a 'Microsoft' anyways? :P

ADD REPLY
0
Entering edit mode

as far as i can understand Pierre talks about Microsoft excel and windows op

ADD REPLY
0
Entering edit mode

I was just joking ;)

ADD REPLY
18
Entering edit mode
7.9 years ago
  • Use a naming convention for your files.
  • Never assume anything works 100% correctly out of the box. Always spot check after you run a script/software package. I am still learning this...
  • Don't get too caught up with the methods and forget the question.
  • Know your file formats.
  • Start a blog. You'll find describing what you are doing to an internet audience will allow you to see holes in your work. Also a great way to store code.
  • There is no perfect data. Sometimes you just have to accept that no amount of massaging the data will make your analysis that much better.
ADD COMMENT
2
Entering edit mode

The perfect point : 

  • Don't get too caught up with the methods and forget the question.
ADD REPLY
13
Entering edit mode
7.9 years ago

Learn how to do things in unix (if you can't already). Becoming familiar with the filesystem, shell scripts, downloading, unzipping, installing, compiling. All these things seem obvious once you've been in the field a while, but they are a barrier to people - once overcome, many things are so much easier!

ADD COMMENT
8
Entering edit mode
7.9 years ago
Maxime ▴ 80

another point that I think was forgotten (even if it's in fact related to the blog idea) :

  • be social, talk to people, talk to biologists, talk to computer scientists, the more point of view you got, the better your own understanding is
ADD COMMENT
3
Entering edit mode

funnily, as a professional asocial, I've upvoted your answer :-)

ADD REPLY
7
Entering edit mode
7.0 years ago
Christian ★ 3.0k

I recently gave a talk at my former university titled "How to be a bioinformatician". You might find some useful slides in here:

http://www.slideshare.net/ChristianFrech/how-to-be-a-bioinformatician

ADD COMMENT
0
Entering edit mode

Thanks a lot :) I wish I can give you more than +1

ADD REPLY
5
Entering edit mode
7.9 years ago
Ben ★ 2.0k

In addition to the great answers already given:

  • Set up an RSS reader with some relevant pubmed search terms/authors, some journal/preprint feeds, relevant blogs (Google Reader we hardly knew ye)
  • Version control everything, not just production scripts but also LaTeX documents, SVGs ..., try to use meaningful commit messages even in private repos
  • Just because you learnt (e.g.) Python first, doesn't mean you should automatically use matplotlib rather than taking the time to learn R and ggplot2
  • Master a text editor (preferably Emacs)
ADD COMMENT
2
Entering edit mode

I'm having a hard time resisting the temptation to edit 'Emacs' for 'vi' :)

ADD REPLY
3
Entering edit mode

My advice: don't get caught up in "software X versus software Y" arguments :)

ADD REPLY
1
Entering edit mode
4.3 years ago
Eslam Samir ▴ 100

I would like to thank you all for these precious information. After working in this field, I would like to add "please try to watch some MOOC but you should do it yourself. After about 3 month I was able to make a program myself which I shared for the public benefit.

Good Luck all

Sequence Database curator.

illustration of both approaches

ADD COMMENT
0
Entering edit mode
2.8 years ago

Maybe you can have your answer here:

The Biostar Handbook. A bioinformatics e-book for beginners.

ADD COMMENT
0
Entering edit mode

This question is 5 years old. It is not about reading book, it is about good practice and things you should learn/do that can't be found in books :)

ADD REPLY

Login before adding your answer.

Traffic: 1678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6