Heng Li is a research scientist at the Broad Institute, working with David Reich and David Altshuler. Dr. Li received his PhD from the Institute of Theoretical Physics at the Chinese Academy of Sciences in 2006. His current interests include the analysis of new sequencing data, population genetics and phylogenetics. He is the principal developer of several projects including SAMtools, BWA, Fermi, MAQ, TreeSoft and TreeFam. His tools are among the most widely used bioinformatics tools of all time and are characterized by meticulous attention to performance, usability and precision.
Beyond just their primary utility his tools have championed a novel way of interacting with scientific software by packaging multiple disparate functions into the same program that are invoked via program command
syntax. In this approach programs become self documenting, discoverable and easy to remember. All users need to recall is the original program name say bwa
that, when run on its own, will inform users of all capabilities of the software and allows them to drill down to more information. Contrast this to the typical scientific software packages that often contain several arcane program names each with its own peculiar call syntax that require consulting the manual. More software packages have adopted this new standard and we can only hope that the trend will continue.
Heng Li often participates in online communities including Biostars under the username lh3 He also maintains and contributes to a large number of open source software repositories https://github.com/lh3
Heng Li of BWA and Samtools
What hardware do you use?
Macbook for local things. Two crowded clusters with thousands of CPU cores each for batch jobs.
What is your text editor?
vi
What software do you use for your work?
Mostly unix tools and my own tools.
What do you use to create plots and charts?
gnuplot for plotting. OmniGraffle and occasionally inkscape for charting and diagrams.
What do you consider the best language to do bioinformatics with?
Generally, the combination of a fast language and a slower scripting language would be ideal. My current combination is C and Javascript, though I wouldn't recommend this to others.
What bioinformatics tools/software do not get enough recognition?
Actually a lot. Many alternatives to the mainstream tools/pipelines are better choices for specific tasks. For example,
- SNAP if you really need fast turnaround,
- GEM/yara if you want to get hits within an edit distance threshold,
- gap5 for alignment editing,
- nvbio for a lot of GPU-specific algorithms,
- several alignment-free tools (e.g. sailfish, though I have never used it) when you need to avoid mapping bias, and also de novo assemblers for large genomes.
In addition, there are "xargs" for constructing batch jobs and "less -S" for viewing text files. For me, the majority of bugs are found with "less".
See all posts in this series https://www.biostars.org/tag/uses-this/
To be notified of new post in the series follow the first post: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this