Retrieving Multiple Sequences For Protein Alignmnets
2
1
Entering edit mode
12.3 years ago
Eric ▴ 90

I would like to run multiple alignments for proteins in ~20 mammalian species. At the moment, I am retrieving the sequences manually from Ensembl and entering them in ClustalW2. Is there a more efficient way to retrieve and align the sequences? Any help is appreciated.

Thanks, Eric

multiple protein sequence ensembl clustalw • 2.9k views
ADD COMMENT
2
Entering edit mode

An aside: If you're aligning proteins, use Clustal Omega instead of ClustalW2. It's faster and produces alignments of higher quality

ADD REPLY
1
Entering edit mode
12.3 years ago

If you only need to retrieve existing alignments from Ensembl, you can use the data dumps or the Perl API. If you need to incorporate extra sequences to the alignment, you can do it with PAGAN.

ADD COMMENT
1
Entering edit mode
12.2 years ago
Biojl ★ 1.7k

The easiest way is to download all the FASTA sequences from the species from the FTP server at ENSEMBL. Then you could store them in a python dictionary or a perl hash... (etc) and just feed the alignment programme with the sequences you want to align in each run. Fast & efficient. I do this all the time with python + mafft (or prank-F) but can be implemented with other programming languages and/or alignment programmes.

ADD COMMENT

Login before adding your answer.

Traffic: 1225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6