I will chime in to say QIIME (http://www.qiime.org) is another example.
which bioinformatics tools are written in python?
I ask this question because new bioinformatic programmers or new pythoners like me can read the source code to find out how python can be used to deal with complex bioinformatics problems besides the problems solved in related books such as "Beginning Python for Bioinformatics"
Thank you
There are so many! To get you started:
Since you're new to the field of bioinformatics, you might also be interested in:
To give another example of the very valid point that Dk made: the company I work for (Applied Maths) sells a bioinformatics software suite called BioNumerics. The core of the program is written in C++, but Python is used to customize the software to specific clients' needs:
Most commonly used tools are written in compiled languages like C or java simply because they run faster and the ability to access low level memory resources are crucial to analyzing large amounts of data. When python is used in these packages, it is usually in the form of 'pipeline glue'.
Tophat (http://tophat.cbcb.umd.edu/) is a perfect example of that. It consist of several smaller programs written in C. Python is then used to interpret user paramters and run the smaller programs in sequence.
Interpreted languages like python or perl are usually used for format conversions or statistics reporting.
Good place to start for real examples is to read up on BioPython (http://biopython.org/wiki/Biopython). Their tutorials have tons of real life examples. You can come up with small projects for yourself like writing a script that analyzes gc content of a fasta file, or a script that parses a blast output file and filter on various criteria.
I believe you're right about the speed consideration, the ability of C or C++ to access low level RAM...etc lets one possibilities to tune a program as close to the hardware as possible (one can also try assembler), but I'm sure the way of coding to achieve a specific task is more critical. Look at, for instance, this biostar discussion. http://www.biostars.org/post/show/10353/how-to-efficiently-parse-a-huge-fastq-file/ (Leszek answer). For the thread interest I would say: Python is good, but use dict() and set() types instead of lists whenever you can.
My answer is malformated due to the transition of the website. If you read my answer together with reformated table in a separate answer, you will know a proper C/C++ implementation is 4-fold faster than Leszek's script. The C++ one is slow due to a stdio synchronization issue which I only know recently. Also, each data structure has its own use. It is just in that example dict() is better.
OK, I see. When I read that answer properly the last time, the best implementation race was not over yet :-) However, that still supports the fact the way of coding is very critical, whatever the programming language. That is a very good post, I like biostar especially for that kind of these. BTW, I'll compile right back Pierre's code.