I am looking for a command line tool to get a consensus sequence, with a proper example usage not from a paper where everything is theoretical or from a software suite. I have reviewed options but quality of answers and documentation is bad. For example there is UGENE but they don't provide a command line. Also the Wikipedia page does not list any software or source code. A Consensus Sequence Algorithms And Notation where the question was about Consensus Sequence and then people started suggesting Sequence Logos? I don't want sequence logo :)
Some people links to a program called USEARCH, but there is no command line usage example and it seems is not open source or the code is not available (documentation looks confusing too). Same happens with a software called CD-HIT. The biopython considers two algorithms: One is "dumb" and another is based in "Position Specific Score Matrices". However I don't want to go through python, or perl, ruby, etc. Another one called Consensus Maker from the HIV sequence database is web-based and not open source.
Situation is worst after reading the UGENE manual. They describe several algorithms:
- JalView (Default) — it is based on the JalView algorithm. Returns ’+’ if there are 2 characters with high frequency. Returns symbol in lower case if the symbol content in a row is lower than the specified threshold.
- ClustalW — emulates the ClustalW program and file format behavior.
- Levitsky — this algorithm is proposed by Victor Levitsky to calculate consensus of DNA alignments. At first, it collects global alignment frequencies for every symbol using extended (15 symbols) DNA alphabet. Then, for every column it selects the rarest symbol in the whole alignment with percentage in the column greater or equals to the threshold value.
- Strict — the algorithm returns gap character (’—’) if symbol frequency in a column is lower than the threshold specified.
So there are lot of links and explanations but no command line?