Multiple sequence alignment
5
0
Entering edit mode
6.1 years ago

I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.

sequence alignment blast • 2.6k views
1
Entering edit mode

You need to clarify if all sequence files in question are protein.
Sounds like you sequence files may be DNA. If so you will need to do some additional work (translate) before any of following packages (mentioned in various answers) can be used.

1
Entering edit mode
6.1 years ago
gearoid ▴ 200

If you have 120 FASTA files with one sequence each, and another with your wild type (and you're using Linux/OS X), first use cat to concatenate all the sequences into one file, e.g.

cat seq1.fasta seq2.fasta ... seqN.fasta > all_my_sequences.fa


or

cat *.fasta > all_my_sequences.fa


Then go to the EBI Clustal Omega server and upload all_my_sequences.fa, or paste the contents of the file in the box. Change the output format to whatever you want (Clustal format is probably better for humans and Pearson/FASTA for computers), then just click submit.

1
Entering edit mode

I am afraid, this line

cat *.fa > all_my_sequences.fa

is dangerous. It's better to do something like

cat *.fa > all_my_sequences.txt

or

cat *.fa > all_my_sequences.fasta

And I like to use Mafft for the multiple alignment:

http://mafft.cbrc.jp/alignment/software/

1
Entering edit mode

Can you elaborate on why it's dangerous? I guess you can only run that command once, is that what you mean?

I like MAFFT, but in this case I would probably want to use MAFFT L-INS-i or G-INS-i, rather than the default MAFFT parameters, and I just tried to give the simplest option I could think of (no software installation or changing parameters on the web server).

T-Coffee might also be a good option for this number of sequences.

1
Entering edit mode

I've had this as a mistake several times, cat will use all *.fa files, including the output-file, that is why output-file

extension should be different.

The full mafft comand is a long string with different parameters. It allows many iterations, this is useful sometimes.

It would look like:

mafft-7.215-with-extensions/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --legacygappenalty initial_file.fasta > align.fa

1
Entering edit mode

I think it works as long as the file that you're writing to doesn't exist already, but you're right, it's sloppy--I updated my answer.

The long string of parameters is why I didn't recommend MAFFT for this question, I was just trying to keep it simple. It's a great option, though.

1
Entering edit mode

Or perhaps:

find /the/dir/where/the/seqs/are/ -maxdepth 1 -type f -iname "*.fa" | xargs cat | muscle -in - -out aligned.fa

1
Entering edit mode
6.1 years ago

if your working on windows you can try the Bioedit tool

0
Entering edit mode
6.1 years ago
Benn 8.3k

Did you try clustalW? http://www.clustal.org/clustal2/

0
Entering edit mode
6.1 years ago
agata88 ▴ 840

For protein alignments they recommend Clustal Omega.

0
Entering edit mode
0
Entering edit mode
6.0 years ago
Suzanne ▴ 80

Jalview www.jalview.org.uk) is versatile free tool for MSA which can run all the main MSA algorithms. Look at their YouTube Jalview Online Training videos for more information. It also has integrated structure, annotation, PCA and tree windows.