Today's interview ties into one of the recurring questions on Biostars: What level of training is needed to be successful in this field? Is it even possible to transition from the wet lab to purely computational work? Should one stay in school for another degree or seek employment? Read on to find out more.
Brian Haas is a Senior Computational Biologist at the Broad Institute - and author of Trinity a de-novo RNA-Seq transcript assembler.
What fewer may know is that he is among the most accomplished bioinformaticians in the world. No really - let's just count his contributions to research published in the most selective journals: Nature, Science, PNAS ... it adds up to 19 such papers since 2000. This is not counting all his bioinformatics oriented papers of which there is no shortage either. So what type of training, computational course work and preparation was necessary to achieve this? Beyond answering the standard uses-this questions Brian was willing to share his experience on getting started in the field:
Brian Haas of Trinity
"I started doing wet-lab research during my junior year of undergrad at SUNYA (1993) and decided to continue on in the graduate program. I enjoyed learning biology and doing experiments - when they worked, but found I had very little patience for when my experiments didn't quite go as planned (many war stories of considerable entertainment value). I eventually realized that what I enjoyed doing most was computational analysis and using bioinformatics software.
In my younger years (1980's), I enjoyed writing programs in computer classes, etc, but never seriously considered it as a career option for whatever reason, and during my undergrad years, I was focused on a pre-med curriculum, and simply hadn't considered the value of pursuing computer science. After realizing that doing hands-on molecular biology and biochemistry was really not for me, and my growing interests in computational biology, my graduate advisor allowed me to take an undergraduate 'intro to programming' course, and once I was enrolled, I was completely hooked.
Shortly thereafter (1999), I earned an MS in molecular biology and obtained a job at The Institute for Genomic Research (TIGR). It was at TIGR where I had phenomenal opportunities to learn bioinformatics and develop programming skills, and work with some of the world's renowned bioinformaticians. Across the street from TIGR was a Johns Hopkins satellite university where many of my peers were taking programming courses and working towards an MS in computer science, and I couldn't resist that as well - particularly given that TIGR had a great benefits package and handled the costs for most everything. So, I picked up another MS degree (2005), but this time in computer science - which in my mind, helped formalize my transition from the bench to bioinformatics and coding.
In 2007, I moved to the Broad Institute, and my experiences here have been very similar, working in a very rewarding environment with brilliant researchers, and having extraordinary opportunities to continue to develop bioinformatics software and related computational research of molecular biology, as well as to continue my education in various ways.
For those that are interested in making the switch from the bench to doing computer programming, or interested in developing bioinformatics skills to further supplement their lab skills, I figure that the resources available today must make it so much easier and tangible. Freely available MOOCS, such as through Coursera, place freely available instructional tools at everyone's fingertips. Computer hardware is now very cheap and ubiquitous (as compared to when I started doing this in the 90's, when not everyone owned their own computer), and Cloud Computing is now a readily available option. And, finally, at many universities, bioinformatics and computation have become well entwined into the life sciences curriculum, and so hopefully it's not too difficult to identify colleagues with both the skills and research interests to help guide you in the right direction."
What hardware do you use?
Macbook Pro, and ssh-ing into our linux servers at the Broad. Most all my development is targeted to linux, but I absolutely love my macbook pro for everything else.
What is your text editor?
Emacs for everything (perl, python, C++, R) except for coding in Java, where I use Eclipse.
What software do you use for your work?
My core bioinformatics toolkit consists of: blast, gmap, gsnap, blat, clustalw, muscle, cdhit, the Tuxedo suite (bowtie, tophat, cufflinks), mummer, Bill Pearson's FASTA suite of alignment tools, the AAT package, BWA, samtools, IGV, genomeview, cap3, fasttree, hmmer, meme, dotter, FigTree, repeatscout, RSEM, Jalview, cap3, and some of the tools that I've helped to develop: Trinity, PASA, evidencemodeler, transdecoder, and trinotate. Other more general programming and development tools I use include GCC, perl, python, R, java, mysql, sqlite, apache web server, and graphviz.
What do you use to create plots and charts?
Mostly R, but I will occasionally use MS Excel.
What do you consider the best language to do bioinformatics with?
It used to be that Perl use dominated bioinformatics, but Python's popularity seems to have skyrocketed.
I still code mostly in Perl, but have been slowly transitioning to python. C/C++ is the way to go for extremely efficient software, but debugging C/C++ code, at least for me, is extremely unpleasant as compared to almost every other language I've worked with.
For statistical analysis and plotting data, R is a great choice, though python is also quite powerful for this (another reason I'm inclined to start doing more python programming).
What bioinformatics tools/software do not get enough recognition?
- GenomeView is a very flexible alternative to IGV and very easy to launch and view data right from the command-line, and I expect is likely under-utilized.
- Some of the earlier-developed alignment tools that are rigorous (albeit often slow) are often under-utilized, such as Bill Pearson's FASTA suite for general sequence alignment solutions - which should probably be in everyone's bioinformatics toolkit, and Xiaoqiu Huang's AAT (annotation and analysis tool) suite - which provides one of the most sensitive protein and transcript spliced alignment software ever to be developed.
See all post in this series https://www.biostars.org/tag/uses-this/
To be notified of a new post in the series follow the first post: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this
There should be a disclaimer next to my name in that I largely owe most of any perceived or actual successes to those brilliant folks that I've had the great pleasure of working with over the years. I would not have gotten very far without them! I've just been fairly lucky in certain respects. Thanks for reading! ~brian haas
And here's our excuse to allow the use of MS Excel! :-)