Question: Multiple sequence alignment
0
gravatar for saravanakumar992
3.2 years ago by
saravanakumar99250 wrote:

I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.

kindly please help me.

blast alignment sequence • 1.4k views
ADD COMMENTlink modified 3.1 years ago by Suzanne60 • written 3.2 years ago by saravanakumar99250
1

You need to clarify if all sequence files in question are protein.
Sounds like you sequence files may be DNA. If so you will need to do some additional work (translate) before any of following packages (mentioned in various answers) can be used.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax68k
1
gravatar for gearoid
3.2 years ago by
gearoid200
gearoid200 wrote:

If you have 120 FASTA files with one sequence each, and another with your wild type (and you're using Linux/OS X), first use cat to concatenate all the sequences into one file, e.g.

cat seq1.fasta seq2.fasta ... seqN.fasta > all_my_sequences.fa

or

cat *.fasta > all_my_sequences.fa

Then go to the EBI Clustal Omega server and upload all_my_sequences.fa, or paste the contents of the file in the box. Change the output format to whatever you want (Clustal format is probably better for humans and Pearson/FASTA for computers), then just click submit.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by gearoid200
1

I am afraid, this line

cat *.fa > all_my_sequences.fa

is dangerous. It's better to do something like

cat *.fa > all_my_sequences.txt

or

cat *.fa > all_my_sequences.fasta

And I like to use Mafft for the multiple alignment:

http://mafft.cbrc.jp/alignment/software/

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by natasha.sernova3.5k
1

Can you elaborate on why it's dangerous? I guess you can only run that command once, is that what you mean?

I like MAFFT, but in this case I would probably want to use MAFFT L-INS-i or G-INS-i, rather than the default MAFFT parameters, and I just tried to give the simplest option I could think of (no software installation or changing parameters on the web server).

T-Coffee might also be a good option for this number of sequences.

ADD REPLYlink written 3.2 years ago by gearoid200
1

I've had this as a mistake several times, cat will use all *.fa files, including the output-file, that is why output-file

extension should be different.

The full mafft comand is a long string with different parameters. It allows many iterations, this is useful sometimes.

It would look like:

mafft-7.215-with-extensions/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --legacygappenalty initial_file.fasta > align.fa

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by natasha.sernova3.5k
1

I think it works as long as the file that you're writing to doesn't exist already, but you're right, it's sloppy--I updated my answer.

The long string of parameters is why I didn't recommend MAFFT for this question, I was just trying to keep it simple. It's a great option, though.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by gearoid200
1

Or perhaps:

find /the/dir/where/the/seqs/are/ -maxdepth 1 -type f -iname "*.fa" | xargs cat | muscle -in - -out aligned.fa
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by 5heikki8.4k
1
gravatar for kapil.joshi036
3.2 years ago by
Student ,School of life sciences, Manipal University, Manipal, India
kapil.joshi03680 wrote:

if your working on windows you can try the Bioedit tool

ADD COMMENTlink written 3.2 years ago by kapil.joshi03680
0
gravatar for Benn
3.2 years ago by
Benn6.9k
Netherlands
Benn6.9k wrote:

Did you try clustalW? http://www.clustal.org/clustal2/

ADD COMMENTlink written 3.2 years ago by Benn6.9k
0
gravatar for agata88
3.2 years ago by
agata88790
Poland
agata88790 wrote:

For protein alignments they recommend Clustal Omega.

ADD COMMENTlink written 3.2 years ago by agata88790

http://www.clustal.org/omega/

ADD REPLYlink written 3.2 years ago by Benn6.9k
0
gravatar for Suzanne
3.1 years ago by
Suzanne60
Dundee, Scotland
Suzanne60 wrote:

Jalview www.jalview.org.uk) is versatile free tool for MSA which can run all the main MSA algorithms. Look at their YouTube Jalview Online Training videos for more information. It also has integrated structure, annotation, PCA and tree windows.

ADD COMMENTlink written 3.1 years ago by Suzanne60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour