Question: Which Bioinformatics Tools Are Written In Python
gravatar for Chen Sun
20 months ago by
Chen Sun180
United States
Chen Sun180 wrote:

which bioinformatics tools are written in python?

I ask this question because new bioinformatic programmers or new pythoners like me can read the source code to find out how python can be used to deal with complex bioinformatics problems besides the problems solved in related books such as "Beginning Python for Bioinformatics"

Thank you

ADD COMMENTlink modified 20 months ago by Manu Prestat2.8k • written 20 months ago by Chen Sun180
gravatar for Jeroen Van Goey
20 months ago by
Jeroen Van Goey1.8k
Antwerp, Belgium
Jeroen Van Goey1.8k wrote:

There are so many! To get you started:

  • Biopython: set of freely available tools for biological computation
  • PyMOL: molecular visualization system
  • PyCogent is a software library for genomic biology
  • Galaxy: an open, web-based platform for data intensive biomedical research
  • pygr: sequence and comparative genomics analyses, even with extremely large multi-genome data sets
  • Biskit: facilitates the manipulation and analysis of macromolecular structures, protein complexes, and molecular dynamics trajectories
  • Ruffus: a lightweight python module for running computational pipelines
  • Pysam: for reading and manipulating Samfiles
  • msatcommander: locates microsatellite (SSR, VNTR, &c) repeats within fasta-formatted sequence or consensus files
  • glu-genetics: tools to store, clean, and analyze data generated by whole-genome or candidate gene association scans
  • PySCeS provides a variety of tools for the analysis of cellular systems
  • OpenAlea: odules to analyse, visualize and model the functioning and growth of plant architecture
  • ETE assists in the automated manipulation, analysis and visualization of phylogenetic and other type of trees
  • bx-python: allows for rapid implementation of genome scale analyses
  • RSeQC: comprehensively evaluate high throughput sequence data especially RNA-seq data
  • incf-omni: analysis and simulation construction of the nervous system
  • genetrack: storing, querying and visualizing genomic interval oriented data
  • chimerascan: detection of chimeric transcripts in high-throughput sequencing data

Since you're new to the field of bioinformatics, you might also be interested in:

  • ANGUS, a site built around the 2010 course on Analyzing Next-Generation Sequencing Data. It contains a number of detailed tutorials on mapping, assembly, mRNAseq, ChIP-seq, and resequencing analysis using Python.
  • this article by Peter Norvig on species barcoding

To give another example of the very valid point that Dk made: the company I work for (Applied Maths) sells a bioinformatics software suite called BioNumerics. The core of the program is written in C++, but Python is used to customize the software to specific clients' needs:

  • to create custom reports,
  • to import and export non-standard formats,
  • to automate series of actions that are executed repeatedly,
  • to perform custom calculations, etc.
ADD COMMENTlink modified 20 months ago • written 20 months ago by Jeroen Van Goey1.8k
gravatar for Damian Kao
20 months ago by
Damian Kao10k
Damian Kao10k wrote:

Most commonly used tools are written in compiled languages like C or java simply because they run faster and the ability to access low level memory resources are crucial to analyzing large amounts of data. When python is used in these packages, it is usually in the form of 'pipeline glue'.

Tophat ( is a perfect example of that. It consist of several smaller programs written in C. Python is then used to interpret user paramters and run the smaller programs in sequence.

Interpreted languages like python or perl are usually used for format conversions or statistics reporting.

Good place to start for real examples is to read up on BioPython ( Their tutorials have tons of real life examples. You can come up with small projects for yourself like writing a script that analyzes gc content of a fasta file, or a script that parses a blast output file and filter on various criteria.

ADD COMMENTlink modified 20 months ago • written 20 months ago by Damian Kao10k

I will chime in to say QIIME ( is another example.

ADD REPLYlink written 20 months ago by Cliff Beall250

I believe you're right about the speed consideration, the ability of C or C++ to access low level RAM...etc lets one possibilities to tune a program as close to the hardware as possible (one can also try assembler), but I'm sure the way of coding to achieve a specific task is more critical. Look at, for instance, this biostar discussion. (Leszek answer). For the thread interest I would say: Python is good, but use dict() and set() types instead of lists whenever you can.

ADD REPLYlink written 20 months ago by Manu Prestat2.8k

My answer is malformated due to the transition of the website. If you read my answer together with reformated table in a separate answer, you will know a proper C/C++ implementation is 4-fold faster than Leszek's script. The C++ one is slow due to a stdio synchronization issue which I only know recently. Also, each data structure has its own use. It is just in that example dict() is better.

ADD REPLYlink written 20 months ago by lh320k

OK, I see. When I read that answer properly the last time, the best implementation race was not over yet :-) However, that still supports the fact the way of coding is very critical, whatever the programming language. That is a very good post, I like biostar especially for that kind of these. BTW, I'll compile right back Pierre's code.

ADD REPLYlink modified 20 months ago • written 20 months ago by Manu Prestat2.8k

on formatting: a new fix is incoming will be applied over the weekend most likely

ADD REPLYlink written 20 months ago by Istvan Albert ♦♦ 39k
gravatar for Adam
20 months ago by
United States
Adam590 wrote:

The short-read mapper, Stampy, is written in Python.

ADD COMMENTlink written 20 months ago by Adam590
gravatar for Woa
20 months ago by
United States
Woa2.1k wrote:

I would suggest search google or google scholar with your topic of interest plus something like "python script" or "python code" eg.

protein structure superposition + "python script"

ADD COMMENTlink written 20 months ago by Woa2.1k
gravatar for Manu Prestat
20 months ago by
Manu Prestat2.8k
Manu Prestat2.8k wrote:

The Biopieces suite is made of python and ruby.

ADD COMMENTlink written 20 months ago by Manu Prestat2.8k

Very little is in Python, yet. Most is Perl and Ruby.

ADD REPLYlink written 16 months ago by Martin A Hansen2.7k
Please log in to add an answer.

  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 619 users visited in the last hour