What advice would you like to give to newcomers to bioinformatics field? What piece of information would make your life much easer if someone had told you in the beginning of your career in bioinformatics, except having a biostars account :)
There were a few difficulties that I had when I first joined the bioinformatics that I would classify as cultural or psychological:
Sometimes, there is no tool that does what you want. Although you might think it's basic and obvious and of course somebody else must have done this before, sometimes, there just isn't a tool that does what you want. Either it's so simple that everybody always reinvents the wheel since it's just 5 minutes of programming, or each lab has their own version which seemed too unimportant to publish or the approach that seems obvious from the point of view of your research, really is special to just your research (ie. relatively simple to do, but not needed by most). The solution is to google for a few days to get a sense of what's out there. Ask a question on biostars including what you have found so far. Accept that if the community says there is no such tool, there probably isn't. When I was new to bioinformatics, it was hard to say to my wet-lab colleagues that there was no simple tool that did the kind of analysis they wanted, that I would have to write something myself. Quite often they would bring me a paper where the analysis had been done implying the solution must be out there somewhere if I googled hard enough.
Sometimes, there are many, many tools that do what you want and there is no hierarchy or rating. This can be frustrating. Sometimes, you are lucky and there is a ranking of tools in a review paper. Quite often, this review will be a bit out-of-date. The solution is to pick a few that sound good and try them out. Look at how they work with your data. If you are lucky, they give very similar answers. In that case you might go with the one that is most widely used in the journals you want to publish in and is easiest to use. If they give widely different answers, you might have to keep looking at the results of each tool until you have a sense of which one produces believable results in terms of the biology. You might have to do a few experiments as well. Try to come up with bioinformatics tests as well. Eventually, if you work at this long enough, you will get a sense of which tool seems to work best for your data.
Many questions depend highly on your data, and so there is no clear answer to what might seem like an obvious question. I work a lot with sequencing and there are many known biases to sequencing, for instance PCR bias. So, if you tell me that you see anomalies in the GC content of your data, I can immediately say that sometimes there is PCR bias. However, if you ask me what went wrong with your particular sample, it's hard. The amount of detective work involved ends up being almost as much work as a full research project.
Many tools will require huge amounts of work to install and run. If you are used to the type of programs that one encounters as a casual OS X or Windows user, then you are used to downloading software, clicking install and having things work out. However, many of those tools are vastly more polished than the software you will encounter as someone new to bioinformatics. Now installing software might involve multiple steps including installing other software. So you will need to accept the possibility that installing the software you need to do an analysis, might take as long or longer than the analysis itself.
Things will often take a long time to run. Find things to do while your software is running. Read journal articles, answer emails, discuss things with colleagues. Accept that your life probably involves a lot of waiting now.
You will have to be proactive about software support. If the software you are using doesn't work then you might have to email authors, join mailing lists, ask questions on biostars and dig through forums to get the answers. I once had a bit of software that took me more than a year to figure out how to use. I emailed the authors several times and asked questions on biostars and still it took a while.
Sometimes you have to get answers in weird places. I never feel very happy about telling my non-computational colleagues that I found some information on how to use a tool in a online forum like 'biostars.org' or 'seqanswers.com' It doesn't feel like a very solid position to take. It's even worse if the person who wrote the answer is somebody with a name like 'burt5000'. You have to take these moments in stride and test what you have been told. You can only do your best.
- the fundamentals of Biology
- command line
- the NCBI E-Utils/Biomart
- microsoft excel
- microsoft excel
- did I mention microsoft excel ?
- a scripted language
- a compiled language
- a RCS
- put all your new knowledge in a blog
- Use a naming convention for your files.
- Never assume anything works 100% correctly out of the box. Always spot check after you run a script/software package. I am still learning this...
- Don't get too caught up with the methods and forget the question.
- Know your file formats.
- Start a blog. You'll find describing what you are doing to an internet audience will allow you to see holes in your work. Also a great way to store code.
- There is no perfect data. Sometimes you just have to accept that no amount of massaging the data will make your analysis that much better.
Learn how to do things in unix (if you can't already). Becoming familiar with the filesystem, shell scripts, downloading, unzipping, installing, compiling. All these things seem obvious once you've been in the field a while, but they are a barrier to people - once overcome, many things are so much easier!
another point that I think was forgotten (even if it's in fact related to the blog idea) :
- be social, talk to people, talk to biologists, talk to computer scientists, the more point of view you got, the better your own understanding is
I recently gave a talk at my former university titled "How to be a bioinformatician". You might find some useful slides in here:
In addition to the great answers already given:
- Set up an RSS reader with some relevant pubmed search terms/authors, some journal/preprint feeds, relevant blogs (Google Reader we hardly knew ye)
- Version control everything, not just production scripts but also LaTeX documents, SVGs ..., try to use meaningful commit messages even in private repos
- Just because you learnt (e.g.) Python first, doesn't mean you should automatically use
matplotlibrather than taking the time to learn R and
- Master a text editor (preferably Emacs)
I would like to thank you all for these precious information. After working in this field, I would like to add "please try to watch some MOOC but you should do it yourself. After about 3 month I was able to make a program myself which I shared for the public benefit.
Good Luck all
Maybe you can have your answer here: