Overview

Question

Tool:Pyteomics - An Open-Source Python Framework For Mass Spectrometry (Ms) Based Proteomics

9

Entering edit mode

11.1 years ago

lev.levitsky ▴ 90

Overview

Pyteomics is a collection of tools aimed at low-level routine operations common for MS-based proteomics data analysis. It appeared as a collection of reusable code we wrote in the lab for our own needs. It has been in the open since early 2012.

What Pyteomics is:

*A framework: It is a collection of building blocks for easy software development or prototyping.
A well-documented library: The documentation includes the complete API reference, a tutorial and examples.
An open-source project: The source code repository is open for review and contributions. We recently merged the first pull request. Yay!
An indexed Python package: This means one-command installation from PyPI.

What Pyteomics is NOT:

A format converter;
A proteomic search engine...
or any kind of ready-to-use software. It's a library for developers (but it's very easy to become one).

References

Goloborodko, A.A.; Levitsky, L.I.; Ivanov, M.V.; and Gorshkov, M.V. (2013) "Pyteomics - a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics", Journal of The American Society for Mass Spectrometry, 24(2), 301-304. DOI: 10.1007/s13361-012-0516-6
Levitsky, L.I.; Klein, J.; Ivanov, M.V.; and Gorshkov, M.V. (2018) "Pyteomics 4.0: five years of development of a Python proteomics framework", Journal of Proteome Research. DOI: 10.1021/acs.jproteome.8b00717

Functionality

Calculation of peptide properties
- molecular mass, including: specific isotopic composition; specific ion type and charge; chemical composition arithmetics and accessing the Unimod database, finding most probable isotopic composition. This part is not limited to peptides or proteins.
- calculation of peptide charge at given pH and calculation of the isoelectric point.
- retention time prediction. Pyteomics comes with built-in implementation of the retention coefficient theory and has a lot of pre-defined sets of retention coefficients available from the literature. It also has a routine for fitting user data to the best retention coefficient set (including optional logarithmic length correction). Another option is installing pyteomics.biolccc which implements a statistical sequence- and condition-specific model of peptide retention (see this paper for details).
Working with data
- FASTA files. Pyteomics has a parser for FASTA files and also allows creation of your own FASTA databases. A routine for decoy database creation is included. The new PEFF format is supported as well.
- pepXML, mzIdentML and TandemXML. Two most common formats for proteomic search engine output and the X!Tandem output format can be easily parsed with Pyteomics to facilitate programmatic data extraction and analysis. Easily create Pandas DataFrames to analyze search results! Target-decoy FDR filtering included!
- MGF, mzXML and mzML. Parsers for three common formats for MassSpec data are also included. For MGF there's even a write function.
  - Other formats. OpenMS formats (trafoXML, featureXML, TraML), ProteinProphet files, ms1 files, etc.
Miscellaneous
- Pyteomics supports various operations with peptide/protein sequences, including modified amino acid residues: parsing, validation, in silico cleavage with pre-set or user-defined enzymes, generation of possible forms of sequences with variable modifications.
- It has some handy routines for data analysis, such as a linear regression helper.
- Visualization is facilitated by the included matplotlib-based functions.

License

Apache License 2.0.

Links & Feedback

See also: TheorChromo Online, an on-line tool implementing the BioLCCC retention time model.

proteomics python mass-spec sequence • 9.2k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 11.1 years ago by lev.levitsky ▴ 90