Aligner That Preserves Case?
1
5
Entering edit mode
10.0 years ago
Fwip ▴ 490

Summary:

I'm looking for an aligner that preserves the case (lower/upper) of the input sequences. So far, I've tried ClustalW, Muscle, MAFFT, and TCoffee, though I could be missing a switch for one of them somewhere.

Reasoning and background:

I'm writing a quick script to find regions of interest, align the coding sequence they fall into, and output a nicely formatted text file.

My thought was to show the short regions in capitals and the remaining sequence in lower-case. I've created the sequences with the correct upper/lower case characters, but when I throw it through ClustalW, it comes out all upper-case.

I'd prefer an option that has a ready-made module from BioPerl (as the rest of my script is in perl), but command-line only options are also okay.

Sample script (but the same happens from the command-line):

use Bio::Tools::Run::Alignment::Clustalw;
use Bio::AlignIO;

my aligner = Bio::Tools::Run::Alignment::Clustalw->new; myalignment = aligner->align('test.fsa'); myout = Bio::AlignIO->newFh(-format => 'phylip');
print $out$alignment;


Sample input file:

>test
atgaaaaagaattttattgggaaatcaattttaagcatagctgctattagtttaacggta
tcaacatttgccggtgaatctcatgcacaaactaaggCTGAAAAATATAACGAGTatc
>test_2
atgaaaaaGAATTTATTGGGAAATCaattttaagcatagctgctattagtttaacggtat
caacatttgccggtgaatctcatgcacaaactaaggctgaaaaatataacgagtatca


Output of script (no lowercase):

 2 119
test         ATGAAAAAGA ATTTTATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA
test_2       ATGAAAAAGA ATTT-ATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA

TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATC-
TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATCA

bioperl aligner msa • 2.6k views
1
Entering edit mode

why not just take the output, and use a script to format it again?

1
Entering edit mode

I can do that, but there's the potential that gaps will be inserted during the alignment (some of my real sample data exhibits this, before and also inside of the "capital" region, but is too long to post here), and I would have to track that and adjust for it. I've done it before, but it's ugly-looking fragile code.

As sometimes people use case as a way of encoding mask data, I would have expected at least one of the popular aligners to have an option to preserve it.

7
Entering edit mode
10.0 years ago
Whetting ★ 1.6k

check Mafft. According to its manual (http://mafft.cbrc.jp/alignment/software/anysymbol.html) it has the possibility to maintain case...

0
Entering edit mode

Thank you! This looks like it will work perfectly with the --preservecase option.