Aaron Quinlan is an Assistant Professor in the Center for Public Health Genomics in the Department of Public Health Sciences at The University of Virginia. His contributions to science are detailed on his lab page and include several highly regarded software packages that are among the most impactful tools of modern bioinformatics.
- BedTools: a swiss-army knife of tools for a wide-range of genomics analysis tasks.
- Gemini: flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome.
- lumpy: a general probabilistic framework for structural variant discovery
Importantly each of these tools follow the highest standards of modern and agile software engineering, come with with extensive documentation, test suites, issue trackers and public source code repositories.
What hardware do you use?
I use a Macbook Pro for development, a University cluster for analysis, and glasses for astigmatism.
What is your text editor?
I toggle between MacVim and SublimeText for code. For tutorials and writing documentation, I use SublimeText to generate Markdown and employ pandoc to convert to other formats. Pandoc is an absolute gem. In addition, I use my family and colleagues to improve my writing. Most of them are far better writers than I.
What software do you use for your work?
All the things, really. But mostly:
- General programming: Python, C++, and to a lesser degree, C.
- Basic data analysis: Python, but primarily UNIX tools. Never underestimate what can be done on the command line. With simplicity comes great power.
- Complex data analysis: Python and R. Crucial Python libraries for my work include: numpy, sqlalchemy, scikit-learn, pysam, and pybedtools. While my use of R is admittedly limited with respect to others, essential R packages for my work include: ggplot2 and dplyr.
- Graphic work: Adobe Illustrator and pen (Sanford Uniball 0.5mm) and paper.
- Manuscript and grant writing: Latex, http://writelatex.com, Google Docs, and to an ever-decreasing degree, Microsoft Word.
- The other 98% of my time: Gmail.
What do you use to create plots and charts?
- R: ggplot2 and base graphics.
- The whiteboard with far too many dry markers.
What do you consider the best language to do bioinformatics with?
For general purpose work, I would have to choose Python because of the flexibility and readability of the language and the quality and extent of the packages available. A close second, would be R, primarily owing to the staggering breadth of packages in Bioconductor. I personally find R to be an arcane and inconsistent programming language, but the heroic efforts by Hadley Wickham and the entire Bioconductor team make it worth the trouble.
What bioinformatics tools/software do not get enough recognition?
- IGV and the UCSC Genome Browser: In genomics research, artifacts abound and as such, data visualization is essential to allow the human brain to tease out fishy patterns and inconsistencies. While they are certainly appreciated, my view is that the number of experimental and analytical artifacts that have been resolved (and therefore omitted from the literature) via these essential tools is often forgotten.
- Freebayes: this is a quite sophisticated and flexible Bayesian tool for identifying genetic variation. Unlike most other tools, it allows full control over expected ploidy, enables pooled variant calling, and employs a haplotype-based discovery model. (Disclaimer: I admit a bias here, as I worked in the Marth lab on some of the early prototypes that eventually became the excellent tool that Erik Garrison has developed).
- Ensembl Biomart: Biomart makes the retrieval of fundamental genomics data so trivial that it is often ignored. This is a gem of a resource and I think that sometimes tools become so intuitive and routine that we forget the effort and thought that went into them.
- UNIX. We should never forget that much of bioinformatics research would never be possible without UNIX.
See all post in this series https://www.biostars.org/t/uses-this/
To be notified of new post in the series follow the first post: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this