I'm looking for an aligner that preserves the case (lower/upper) of the input sequences. So far, I've tried ClustalW, Muscle, MAFFT, and TCoffee, though I could be missing a switch for one of them somewhere.
Reasoning and background:
I'm writing a quick script to find regions of interest, align the coding sequence they fall into, and output a nicely formatted text file.
My thought was to show the short regions in capitals and the remaining sequence in lower-case. I've created the sequences with the correct upper/lower case characters, but when I throw it through ClustalW, it comes out all upper-case.
I'd prefer an option that has a ready-made module from BioPerl (as the rest of my script is in perl), but command-line only options are also okay.
Sample script (but the same happens from the command-line):
use Bio::Tools::Run::Alignment::Clustalw; use Bio::AlignIO; my $aligner = Bio::Tools::Run::Alignment::Clustalw->new; my $alignment = $aligner->align('test.fsa'); my $out = Bio::AlignIO->newFh(-format => 'phylip'); print $out $alignment;
Sample input file:
>test atgaaaaagaattttattgggaaatcaattttaagcatagctgctattagtttaacggta tcaacatttgccggtgaatctcatgcacaaactaaggCTGAAAAATATAACGAGTatc >test_2 atgaaaaaGAATTTATTGGGAAATCaattttaagcatagctgctattagtttaacggtat caacatttgccggtgaatctcatgcacaaactaaggctgaaaaatataacgagtatca
Output of script (no lowercase):
2 119 test ATGAAAAAGA ATTTTATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA test_2 ATGAAAAAGA ATTT-ATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATC- TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATCA