Patrick Schloss is an Associate Professor in the Department of Microbiology and Immunology of the University of Michigan Medical School. He is the author of tools such as DOTUR, SONS and mothur - each an increasingly sophisticated program for analyzing microbial data. At the present time mothur offers a complete package covering just about all the data analysis needs of the microbial ecology community
As the names of the tools indicate Dr. Schloss is a family man - we once enrolled a bioinformatics staff member in the mothur training program ran by Dr. Schloss - amusingly the check had to be made out to Five Kids Farm. Since then the farm has been renamed first to Six Kids Farm then to Seven Kids Farm with the motto: "Growing adults since 2000". As it turns out when not programming and running a scientific lab Dr. Schloss runs a bone fide farm where he lives with his family that is now comprised of seven children. Thus when not running make from the command line Dr. Schloss helps make real food, rides a tractor, operates farm equipment, raises sheep and cows.
Before interviewing Dr. Schloss mothur was a mystery to me. It is a radically different take on bioinformatics- it contains a single executable with no other dependencies that runs from the command line on every platform: Linux, Mac and Windows, yet still exposes a vast utility and functionality. Now I get it - mothur is the tool for people that can't afford to waste time - it dares to forego the accumulated crud of "modern" software engineering and dependency management by being completely self reliant and independently functional. Make no mistake that is a gigantic undertaking - it requires reimplementing all algorithms into simple portable C++ code. But with that it becomes a tool like no other.
Dr. Schloss regularly offers his mothur workshop the next will be held in December 17-19 in Detroit.
Pat Schloss of mothur
How did you get started in bioinformatics?
My BS and PhD are both in biological engineering where my only course in programming was Pascal [stop laughing]. But that one language really helped me learn others. I think it's frequently seen that if you learn one programming/foreign language, it becomes easier to learn additional languages. My progression was Pascal -> Perl -> C++ -> R. Despite learning Pascal as an undergrad, I'm really a self-taught programmer.
When I was a postdoc I had about 500 16S rRNA gene sequences from a single 0.5-g sample of soil. We were working on a poster and my advisor said that it would be nice to have one of those "curvy thingies" (a rarefaction curve). I said, "sure…" Well that simple problem required us figuring out how to cluster like sequences together and then figure out how to do rarefaction. In hindsight, these are pretty simple problems. I had a problem in my mind and that was the critical piece. With that problem in mind, I could pick up "Learning Perl" and go through it thinking of the problem and where various parts of the language might fit in. I would literally do a chapter of the book and then do all of the exercises. I was pretty committed and got through the book in a few weeks. Aside from having the problem in mind, I also benefitted from actually coding as I was learning it. This served to reinforce the concepts and help me memorize what was going on. To learn C++ someone took a different perl script I was working on and wrote it in C++. It was light years faster and I used my perl and their C++ to start to learn C++.
When I talk to biologists about learning to code I encourage them to follow that process:
- have a problem in mind;
- get a good book and go through it a chapter at a time;
- use the language as part of your daily life;
- find other people to help you learn.
Oh yeah, that "curvy thing"? Well that turned in to DOTUR, which has now been cited by more than 1,567 publications.
What hardware do you use?
I use a MacBook Pro laptop with an extra monitor. My lab also has a computer cluster that we use for heavy lifting.
What is your text editor?
I mainly use TextWrangler and have been slowly moving to Atom as I experiment with using markdown/R markdown for my writing. If I were a cool kid I suppose I'd learn emacs or vi. For our mothur work, we use Xcode.
What software do you use for your work?
Running a lab there's a lot of boring stuff that I seem to use more than I'd like: word, ical, mail, adium, safari... When I get to actually do science I use mothur, R, git, and the text editors.
What do you use to create plots and charts?
R. 4 years ago I enforced a moratorium on the use of Excel to generate plots in the lab. I'm starting to enforce a moratorium on Prism. You can tell that someone generated plots in Excel/Prism by how awful they look and generally you can tell they were made in R by how good they look.
What do you consider the best language to do bioinformatics with?
That's unfair! It depends. If the code is going out to the masses or needs to be fast, we do it in C++. If it's something I just need done, I'm doing it more and more in R and less and less in R.
I think waaay too much is made of programmer time and not enough is made of user time/frustration. Users do not want to hack your perl or python scripts or download a gazillion dependencies. It's just a pain - I don't even want to do it! If you multiply the time savings for the user by the number of papers citing mothur (>2000), you will more than make up for the developer time. I tell people that it doesn't really matter what language you use as long as you learn a language.
What bioinformatics tools/software do not get enough recognition?
I'm always thinking from the biologists perspective and not from the computer scientist perspective. With that in mind, I think biologists either don't know or don't appreciate the power of tools like git or literate programming tools like IPython notebooks or the knitr and slidify R packages.
Bioinformaticists could do a great amount of good by helping those developers and making these tools more accessible to biologists.
See all post in this series: https://www.biostars.org/t/uses-this/
To be notified of a new post in the series follow the first post: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this