Aligner That Preserves Case?
1
5
Entering edit mode
11.2 years ago
Fwip ▴ 500

Summary:

I'm looking for an aligner that preserves the case (lower/upper) of the input sequences. So far, I've tried ClustalW, Muscle, MAFFT, and TCoffee, though I could be missing a switch for one of them somewhere.

Reasoning and background:

I'm writing a quick script to find regions of interest, align the coding sequence they fall into, and output a nicely formatted text file.

My thought was to show the short regions in capitals and the remaining sequence in lower-case. I've created the sequences with the correct upper/lower case characters, but when I throw it through ClustalW, it comes out all upper-case.

I'd prefer an option that has a ready-made module from BioPerl (as the rest of my script is in perl), but command-line only options are also okay.

Sample script (but the same happens from the command-line):

use Bio::Tools::Run::Alignment::Clustalw;
use Bio::AlignIO;

my $aligner = Bio::Tools::Run::Alignment::Clustalw->new;
my $alignment = $aligner->align('test.fsa');
my $out = Bio::AlignIO->newFh(-format => 'phylip');
print $out $alignment;

Sample input file:

>test
atgaaaaagaattttattgggaaatcaattttaagcatagctgctattagtttaacggta
tcaacatttgccggtgaatctcatgcacaaactaaggCTGAAAAATATAACGAGTatc
>test_2
atgaaaaaGAATTTATTGGGAAATCaattttaagcatagctgctattagtttaacggtat
caacatttgccggtgaatctcatgcacaaactaaggctgaaaaatataacgagtatca

Output of script (no lowercase):

 2 119
test         ATGAAAAAGA ATTTTATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA 
test_2       ATGAAAAAGA ATTT-ATTGG GAAATCAATT TTAAGCATAG CTGCTATTAG TTTAACGGTA 

             TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATC- 
             TCAACATTTG CCGGTGAATC TCATGCACAA ACTAAGGCTG AAAAATATAA CGAGTATCA
bioperl aligner msa • 3.0k views
ADD COMMENT
1
Entering edit mode

why not just take the output, and use a script to format it again?

ADD REPLY
1
Entering edit mode

I can do that, but there's the potential that gaps will be inserted during the alignment (some of my real sample data exhibits this, before and also inside of the "capital" region, but is too long to post here), and I would have to track that and adjust for it. I've done it before, but it's ugly-looking fragile code.

As sometimes people use case as a way of encoding mask data, I would have expected at least one of the popular aligners to have an option to preserve it.

ADD REPLY
7
Entering edit mode
11.2 years ago
Whetting ★ 1.6k

check Mafft. According to its manual (http://mafft.cbrc.jp/alignment/software/anysymbol.html) it has the possibility to maintain case...

ADD COMMENT
0
Entering edit mode

Thank you! This looks like it will work perfectly with the --preservecase option.

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6