Question: Consensus sequence software?
gravatar for andreagarcia871
6.4 years ago by
andreagarcia87160 wrote:

I am looking for a command line tool to get a consensus sequence, with a proper example usage not from a paper where everything is theoretical or from a software suite. I have reviewed options but quality of answers and documentation is bad. For example there is UGENE but they don't provide a command line. Also the Wikipedia page does not list any software or source code. A Consensus Sequence Algorithms And Notation where the question was about Consensus Sequence and then people started suggesting Sequence Logos? I don't want sequence logo :)

Some people links to a program called USEARCH, but there is no command line usage example and it seems is not open source or the code is not available (documentation looks confusing too). Same happens with a software called CD-HIT. The biopython considers two algorithms: One is "dumb" and another is based in "Position Specific Score Matrices". However I don't want to go through python, or perl, ruby, etc. Another one called Consensus Maker from the HIV sequence database is web-based and not open source.

Situation is worst after reading the UGENE manual. They describe several algorithms:

  • JalView (Default) — it is based on the JalView algorithm. Returns ’+’ if there are 2 characters with high frequency. Returns symbol in lower case if the symbol content in a row is lower than the specified threshold.
  • ClustalW — emulates the ClustalW program and file format behavior.
  • Levitsky — this algorithm is proposed by Victor Levitsky to calculate consensus of DNA alignments. At first, it collects global alignment frequencies for every symbol using extended (15 symbols) DNA alphabet. Then, for every column it selects the rarest symbol in the whole alignment with percentage in the column greater or equals to the threshold value.
  • Strict — the algorithm returns gap character (’—’) if symbol frequency in a column is lower than the threshold specified.

So there are lot of links and explanations but no command line?


sequence consensus • 8.6k views
ADD COMMENTlink modified 6.4 years ago by Daniel3.8k • written 6.4 years ago by andreagarcia87160

Wow! What an exhaustive research!!! Wouldn't it have been just faster to code it yourself?
I compute the consensus sequence from the Position-Weight Matrices (PWM). Biopython can do it (yes I know you don't want it -that'a why it's a comment and not an answer).

ADD REPLYlink written 6.4 years ago by Hugues250
gravatar for vaskin90
6.4 years ago by
Milan, Italy
vaskin90290 wrote:

To extract consensus in UGENE with the commandline interface you need to use UGENE Workflow Designer. There is an element that extracts consensus from each input alignment. You can create a workflow that reads alignment, extracts consensus and output consensus sequences to files. Since any workflow can be run fromcommandlin  that's exactly what you need. Actually, you can find the pre-built workflow in the Samples bar of the Workflow Designer.

Workflow Designer docs:

Running workflows from commandline

So it will be something like: ugenecl --task=extract_consensus --in=in.aln --out=out.fa

P.S. These functions will be released only in UGENE 1.14, that it why you did not find it in the documentation. But to get them right now, you can use the development snapshot from here (

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by vaskin90290
gravatar for Asaf
6.4 years ago by
Asaf8.4k wrote:

The program phrophecy of EMBOSS package should do it.

An example of an EMBOSS server can be found here, you can download and install it locally using command line.

ADD COMMENTlink written 6.4 years ago by Asaf8.4k

If the OP just want a consensus sequence from a multiple sequence alignment, then they could try the EMBOSS "Alignment Consensus" tools:

A quick Google for these will find many public services offering access to these tools:

ADD REPLYlink modified 13 months ago by RamRS30k • written 6.4 years ago by hpmcwill1.1k
gravatar for Daniel
6.4 years ago by
Cardiff University
Daniel3.8k wrote:

I wrote this perl script to take in a fasta alignment and output a full IUPAC consensus and also list 50 probes of X size which are weighted for A/C/T/G. It might help:

Usage is: ./degenerate_code_generator alignment_file.fasta window_size outfile.txt

use strict;
use warnings;

use Bio::Seq;
use Bio::AlignIO;
use Bio::Tools::IUPAC;

my $usage = "USAGE:\t./degenerate_code_generator REFERENCE_FILE.fasta Probe_size OUTFILE.txt\n";

my $infile = $ARGV[0] or die $usage . "\n";
my $probesize = $ARGV[1] or die $usage . "\n";
my $outfile = $ARGV[2] or die $usage . "\n";

my $alnio = Bio::AlignIO->new(-file => $infile, -format => 'fasta');

open OUTFILE, ">$outfile";
select OUTFILE;

my $cs;

while (my $aln = $alnio->next_aln) {
    $cs = $aln->consensus_iupac;
    my $seq = Bio::Seq->new(-seq => $cs, -alphabet => 'dna');

$cs =~ s/-//g;
print "<FULL>$cs\n";

my $score;

my $winsize = $probesize;
my $i = 1;
my $cs_len = length($cs);
my %hash;

while ($i < $cs_len) {
        my $window = substr($cs, $i,$winsize);

        foreach (split(//,$window)){
                        $score += "5";
                        $score += "1";
        $hash{$window} = $score;
        $score = 0;

my @keys = sort { $hash{$b} <=> $hash{$a} } keys(%hash);
my @vals = @hash{@keys};

my $c = 0;

#print "Score\tProbe\n";

while ($c < 51){
        print "$vals[$c]\t$keys[$c]\n";
ADD COMMENTlink modified 5.0 years ago • written 6.4 years ago by Daniel3.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2053 users visited in the last hour