Is There An Easy Way To Properly Orient Sequences For Clustalw Alignment?
4
5
Entering edit mode
12.3 years ago
David M ▴ 580

I have a number (~500) of 1-2kbp sequences which I'd like to align using clustalw; in this case the sequences are from transposable elements. The problem is that (obviously) the sequences have a distinct directionality, and the program I use to mine the sequences from a genomic contig set doesn't guarantee any particular orientation. My concern is that for some sequences I should be aligning the reverse complement, rather than the original orientation.

Is there a way in clustalw to get around this problem? Is there some other program that can put my sequences in the proper orientation before I align them with clustalw?

A quick example:

ATCGCGATATCG and CGATATCGCGAT can clearly be aligned, since the second sequence is the reverse complement of the first. If I ran clustalw with them as is, however, the alignment would be far from ideal.

clustalw alignment multiple • 11k views
ADD COMMENT
4
Entering edit mode
12.3 years ago
Ahdf-Lell-Kocks ★ 1.6k

You can use PAGAN which with the --compare-reverse option will look at both directions:

./pagan --compare-reverse --readsfile sequences.fasta
ADD COMMENT
1
Entering edit mode

Hi David,

PAGAN is a new program and still actively developed. The feature you were using is very recent and indeed didn't support DNA ambiguity code. I've now pushed an updated version that fixes this issue. The latest version can be obtained with 'git'.

The new version should do the reverse-complement alignment with ambiguity characters and also supports translated alignment and translated alignment using the best ORF in the read sequences. Unfortunately I haven't got time to document all the new features. Please contact me if you find them interesting.

Regards, Ari

ADD REPLY
0
Entering edit mode

Does pagan allow for the presence of unknown ('N') characters? I'm getting an error that says: "Unexpected characters found. Reverse-complement failed".

ADD REPLY
2
Entering edit mode
12.3 years ago
Gww ★ 2.7k

Perhaps you could build a small sequence database of known sequences and then align your queries sequences against the database to determine their orientation. You would need to choose an aligner with a suitable length for your queries but that shouldn't be hard to find. If the queries orientation is in the wrong direction you could then reverse complement it prior to the multiple sequence alignment.

ADD COMMENT
1
Entering edit mode
12.3 years ago

I haven't used any aligner that would do this, although I've read that the Guidance server has a HeadOrTails HoT algorithm to work with reversed sequences. what I would do is surely to forget about ClustalW (it certainly has done his job, but there are now better aligners out there) and use MAFFT to align both your ~500 sequences set and another ~500 sequences reversing the previous ones. the higher alignment scores of one of the two sets would be the hint needed to focus on that particular set, and once knowing the sequence orientation then you can study the set deeper.

ADD COMMENT
1
Entering edit mode
2.3 years ago
onestop_data ▴ 330

This blog post teaches an easy way on how to create a multiple sequence alignment (MSA) aware of forward and reverse complement directions.

https://onestopdataanalysis.com/multiple-sequence-alignment-msa-reverse-complement/

ADD COMMENT

Login before adding your answer.

Traffic: 3138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6