Question: Is There An Easy Way To Properly Orient Sequences For Clustalw Alignment?
gravatar for David M
8.7 years ago by
David M550
David M550 wrote:

I have a number (~500) of 1-2kbp sequences which I'd like to align using clustalw; in this case the sequences are from transposable elements. The problem is that (obviously) the sequences have a distinct directionality, and the program I use to mine the sequences from a genomic contig set doesn't guarantee any particular orientation. My concern is that for some sequences I should be aligning the reverse complement, rather than the original orientation.

Is there a way in clustalw to get around this problem? Is there some other program that can put my sequences in the proper orientation before I align them with clustalw?

A quick example:

ATCGCGATATCG and CGATATCGCGAT can clearly be aligned, since the second sequence is the reverse complement of the first. If I ran clustalw with them as is, however, the alignment would be far from ideal.

alignment clustalw multiple • 7.4k views
ADD COMMENTlink written 8.7 years ago by David M550
gravatar for Ahdf-Lell-Kocks
8.7 years ago by
Ahdf-Lell-Kocks1.6k wrote:

You can use PAGAN which with the --compare-reverse option will look at both directions:

./pagan --compare-reverse --readsfile sequences.fasta
ADD COMMENTlink modified 12 months ago by RamRS30k • written 8.7 years ago by Ahdf-Lell-Kocks1.6k

Hi David,

PAGAN is a new program and still actively developed. The feature you were using is very recent and indeed didn't support DNA ambiguity code. I've now pushed an updated version that fixes this issue. The latest version can be obtained with 'git'.

The new version should do the reverse-complement alignment with ambiguity characters and also supports translated alignment and translated alignment using the best ORF in the read sequences. Unfortunately I haven't got time to document all the new features. Please contact me if you find them interesting.

Regards, Ari

ADD REPLYlink written 8.7 years ago by Ari120

Does pagan allow for the presence of unknown ('N') characters? I'm getting an error that says: "Unexpected characters found. Reverse-complement failed".

ADD REPLYlink written 8.7 years ago by David M550
gravatar for Gww
8.7 years ago by
Gww2.7k wrote:

Perhaps you could build a small sequence database of known sequences and then align your queries sequences against the database to determine their orientation. You would need to choose an aligner with a suitable length for your queries but that shouldn't be hard to find. If the queries orientation is in the wrong direction you could then reverse complement it prior to the multiple sequence alignment.

ADD COMMENTlink written 8.7 years ago by Gww2.7k
gravatar for Jorge Amigo
8.7 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

I haven't used any aligner that would do this, although I've read that the Guidance server has a HeadOrTails HoT algorithm to work with reversed sequences. what I would do is surely to forget about ClustalW (it certainly has done his job, but there are now better aligners out there) and use MAFFT to align both your ~500 sequences set and another ~500 sequences reversing the previous ones. the higher alignment scores of one of the two sets would be the hint needed to focus on that particular set, and once knowing the sequence orientation then you can study the set deeper.

ADD COMMENTlink written 8.7 years ago by Jorge Amigo12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1263 users visited in the last hour