Question: Muliple sequence alignment for sequences in more than 4000 files
1
gravatar for utkarsh.sood
2.9 years ago by
utkarsh.sood30
India
utkarsh.sood30 wrote:

Hi,

I have more than 4000 files containing orthologous gene clusters in each file from a bacterial community. How can I perform multiple sequence alignment of all the files in fast and efficient manner? Should I use clustal W, if yes how?

Thanks!

clustalw sequence orthologs • 1.3k views
ADD COMMENTlink modified 2.9 years ago by Rob90 • written 2.9 years ago by utkarsh.sood30

As long as the individual files are not large you may be able to use any MSA program. Basically you are looking to do 4000 separate MSA's? A cluster would be the way to go for something this large.

This paper discusses acceleration of MSA's but requires special hardware which you may not have.

ADD REPLYlink written 2.9 years ago by genomax68k

I have the access to server with 24 cores and 128Gb RAM.Can I create multiple ClustalW alignments for thousands of fasta in a directory. The input would be: 1.fasta, 2.fasta... 6405.fasta; where a given file commonly contains14 or more proteins.

The output would be:1.aln, 2.aln... 6405.aln

ADD REPLYlink written 2.9 years ago by utkarsh.sood30
1
gravatar for Medhat
2.9 years ago by
Medhat8.3k
Texas
Medhat8.3k wrote:

If your data is big you can use Kalign Very fast MSA tool that concentrates on local regions. Suitable for large alignments

Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Medhat8.3k

Thanks for your help! Do I have to upload 4000 files separately (as each file contain multiple sequence for 1 protein) ?

ADD REPLYlink written 2.9 years ago by utkarsh.sood30

If you need to align each file separately yes, If you want to align them all at the same time you can concat them all in one file. for example; you can run this command in the directory that contains all files

cat `ls *.fasta` > files_concat.fasta

also you can download it and use it locally

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Medhat8.3k

Web tool owners may not like you uploading 4000 files (if it is possible in the first place). As indicated by @medhat you should download and use the tool locally. You should be able to do based on your response to my post above (in terms of hardware).

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax68k
1
gravatar for Rob
2.9 years ago by
Rob90
Rob90 wrote:

If you need really fast alignment, I suggest you to try mafft (http://mafft.cbrc.jp/alignment/software/), it has some really fast and accurate methods for that. But if your files are small you can use clustalw / clustal omega / muscle / t_coffee or whatever you want.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Rob90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1502 users visited in the last hour