Muliple sequence alignment for sequences in more than 4000 files
2
1
Entering edit mode
5.4 years ago
utkarsh.sood ▴ 40

Hi,

I have more than 4000 files containing orthologous gene clusters in each file from a bacterial community. How can I perform multiple sequence alignment of all the files in fast and efficient manner? Should I use clustal W, if yes how?

Thanks!

sequence orthologs clustalW • 2.6k views
0
Entering edit mode

As long as the individual files are not large you may be able to use any MSA program. Basically you are looking to do 4000 separate MSA's? A cluster would be the way to go for something this large.

This paper discusses acceleration of MSA's but requires special hardware which you may not have.

0
Entering edit mode

I have the access to server with 24 cores and 128Gb RAM.Can I create multiple ClustalW alignments for thousands of fasta in a directory. The input would be: 1.fasta, 2.fasta... 6405.fasta; where a given file commonly contains14 or more proteins.

The output would be:1.aln, 2.aln... 6405.aln

1
Entering edit mode
5.4 years ago
Medhat 9.0k

If your data is big you can use Kalign Very fast MSA tool that concentrates on local regions. Suitable for large alignments

Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods

0
Entering edit mode

Thanks for your help! Do I have to upload 4000 files separately (as each file contain multiple sequence for 1 protein) ?

0
Entering edit mode

If you need to align each file separately yes, If you want to align them all at the same time you can concat them all in one file. for example; you can run this command in the directory that contains all files

cat ls *.fasta > files_concat.fasta

0
Entering edit mode

Web tool owners may not like you uploading 4000 files (if it is possible in the first place). As indicated by @medhat you should download and use the tool locally. You should be able to do based on your response to my post above (in terms of hardware).

1
Entering edit mode
5.4 years ago
Rob ▴ 120

If you need really fast alignment, I suggest you to try mafft (http://mafft.cbrc.jp/alignment/software/), it has some really fast and accurate methods for that. But if your files are small you can use clustalw / clustal omega / muscle / t_coffee or whatever you want.