Question: Which Bioinformatics Tools Are Written In Python
gravatar for Chen Sun
3.2 years ago by
Chen Sun260
United States
Chen Sun260 wrote:

which bioinformatics tools are written in python?

I ask this question because new bioinformatic programmers or new pythoners like me can read the source code to find out how python can be used to deal with complex bioinformatics problems besides the problems solved in related books such as "Beginning Python for Bioinformatics"

Thank you

python tool • 6.2k views
ADD COMMENTlink modified 14 months ago by Chris Evelo9.6k • written 3.2 years ago by Chen Sun260
gravatar for Jeroen Van Goey
3.2 years ago by
Jeroen Van Goey2.0k
Ghent, Belgium
Jeroen Van Goey2.0k wrote:

There are so many! To get you started:

  • Biopython: set of freely available tools for biological computation
  • PyMOL: molecular visualization system
  • PyCogent is a software library for genomic biology
  • Galaxy: an open, web-based platform for data intensive biomedical research
  • pygr: sequence and comparative genomics analyses, even with extremely large multi-genome data sets
  • Biskit: facilitates the manipulation and analysis of macromolecular structures, protein complexes, and molecular dynamics trajectories
  • Ruffus: a lightweight python module for running computational pipelines
  • Pysam: for reading and manipulating Samfiles
  • msatcommander: locates microsatellite (SSR, VNTR, &c) repeats within fasta-formatted sequence or consensus files
  • glu-genetics: tools to store, clean, and analyze data generated by whole-genome or candidate gene association scans
  • PySCeS provides a variety of tools for the analysis of cellular systems
  • OpenAlea: odules to analyse, visualize and model the functioning and growth of plant architecture
  • ETE assists in the automated manipulation, analysis and visualization of phylogenetic and other type of trees
  • bx-python: allows for rapid implementation of genome scale analyses
  • RSeQC: comprehensively evaluate high throughput sequence data especially RNA-seq data
  • incf-omni: analysis and simulation construction of the nervous system
  • genetrack: storing, querying and visualizing genomic interval oriented data
  • chimerascan: detection of chimeric transcripts in high-throughput sequencing data

Since you're new to the field of bioinformatics, you might also be interested in:

  • ANGUS, a site built around the 2010 course on Analyzing Next-Generation Sequencing Data. It contains a number of detailed tutorials on mapping, assembly, mRNAseq, ChIP-seq, and resequencing analysis using Python.
  • this article by Peter Norvig on species barcoding

To give another example of the very valid point that Dk made: the company I work for (Applied Maths) sells a bioinformatics software suite called BioNumerics. The core of the program is written in C++, but Python is used to customize the software to specific clients' needs:

  • to create custom reports,
  • to import and export non-standard formats,
  • to automate series of actions that are executed repeatedly,
  • to perform custom calculations, etc.
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Jeroen Van Goey2.0k
gravatar for Damian Kao
3.2 years ago by
Damian Kao12k
Damian Kao12k wrote:

Most commonly used tools are written in compiled languages like C or java simply because they run faster and the ability to access low level memory resources are crucial to analyzing large amounts of data. When python is used in these packages, it is usually in the form of 'pipeline glue'.

Tophat ( is a perfect example of that. It consist of several smaller programs written in C. Python is then used to interpret user paramters and run the smaller programs in sequence.

Interpreted languages like python or perl are usually used for format conversions or statistics reporting.

Good place to start for real examples is to read up on BioPython ( Their tutorials have tons of real life examples. You can come up with small projects for yourself like writing a script that analyzes gc content of a fasta file, or a script that parses a blast output file and filter on various criteria.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Damian Kao12k

I will chime in to say QIIME ( is another example.

ADD REPLYlink written 3.2 years ago by Cliff Beall370

I believe you're right about the speed consideration, the ability of C or C++ to access low level RAM...etc lets one possibilities to tune a program as close to the hardware as possible (one can also try assembler), but I'm sure the way of coding to achieve a specific task is more critical. Look at, for instance, this biostar discussion. (Leszek answer). For the thread interest I would say: Python is good, but use dict() and set() types instead of lists whenever you can.

ADD REPLYlink written 3.2 years ago by Manu Prestat3.4k

My answer is malformated due to the transition of the website. If you read my answer together with reformated table in a separate answer, you will know a proper C/C++ implementation is 4-fold faster than Leszek's script. The C++ one is slow due to a stdio synchronization issue which I only know recently. Also, each data structure has its own use. It is just in that example dict() is better.

ADD REPLYlink written 3.2 years ago by lh325k

OK, I see. When I read that answer properly the last time, the best implementation race was not over yet :-) However, that still supports the fact the way of coding is very critical, whatever the programming language. That is a very good post, I like biostar especially for that kind of these. BTW, I'll compile right back Pierre's code.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Manu Prestat3.4k

on formatting: a new fix is incoming will be applied over the weekend most likely

ADD REPLYlink written 3.2 years ago by Istvan Albert ♦♦ 57k
gravatar for Adam
3.2 years ago by
United States
Adam810 wrote:

The short-read mapper, Stampy, is written in Python.

ADD COMMENTlink written 3.2 years ago by Adam810
gravatar for Woa
3.2 years ago by
United States
Woa2.4k wrote:

I would suggest search google or google scholar with your topic of interest plus something like "python script" or "python code" eg.

protein structure superposition + "python script"

ADD COMMENTlink written 3.2 years ago by Woa2.4k
gravatar for Manu Prestat
3.2 years ago by
Manu Prestat3.4k
Marseille, France
Manu Prestat3.4k wrote:

The Biopieces suite is made of python and ruby.

ADD COMMENTlink written 3.2 years ago by Manu Prestat3.4k

Very little is in Python, yet. Most is Perl and Ruby.

ADD REPLYlink written 2.8 years ago by Martin A Hansen2.9k
gravatar for sgruenwald
14 months ago by
United States/San Diego/Diagnomics
sgruenwald10 wrote:

A good way to find interesting modules is to search the pip python libraries:

ADD COMMENTlink written 14 months ago by sgruenwald10
gravatar for Chris Evelo
14 months ago by
Chris Evelo9.6k
Maastricht, The Netherlands
Chris Evelo9.6k wrote:

Our Go-Elite Gene Ontology and general gene set overrepresentation analysis tool is written in Python. It was described here.

ADD COMMENTlink written 14 months ago by Chris Evelo9.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 756 users visited in the last hour