Question: Difficulty with Muscle Sequence Alignment
0
gravatar for rc16955
2.5 years ago by
rc1695560
rc1695560 wrote:

Hi all,

I am trying to perform an alignment of 11 very long (whole chromosome) DNA sequences using Muscle 3.8.31 for Linux, but am encountering a problem.

This is my input:

muscle-3.8.1 -diags -in S_ratti_Chr2.fa -out S_ratti_Chr2.afa

The code seems to run for a while, but then I get the following:

    S_ratti_Chr2 11 seqs, max length 16759905, avg  length 16759518
00:48:36  1053 MB(65%)  Iter   1    1.52%  K-mer dist pass 1
00:48:47  1053 MB(65%)  Iter   1  100.00%  K-mer dist pass 1
00:48:47  1053 MB(65%)  Iter   1    1.52%  K-mer dist pass 2
00:48:47  1053 MB(65%)  Iter   1  100.00%  K-mer dist pass 2
00:49:07  24533 MB(100%)  Iter   1   10.00%  Align node       
/cm/local/apps/torque/4.2.4.1/spool/mom_priv/jobs/4974615.master.cm.cluster.SC: line 15: 95045 Killed                  muscle-3.8.1 -diags -in S_ratti_Chr2.fa -out S_ratti_Chr2_2.afa

No output file is generated. I am very new to bioinformatics and this is the first time I have attempted to use Muscle (or any aligning method); could anyone help me understand why the command is failing? Lay terminology would be greatly appreciated as I am also not very familiar with coding. I'd also like to hear of any other aligning methods if there are any generally considered better than Muscle.

Many thanks in advance.

genome • 1.5k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by rc1695560

See Muscle manual, it may be useful. There is no limit on sequence length,

http://www.drive5.com/muscle/muscle.html

I would suspect some atypical symbol in the sequence (different from ACGT).

Although it started running, but was killed on the first iteration...

Another alignment program, a younger one is Mafft. Below there is its web-site.

http://mafft.cbrc.jp/alignment/software/

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by natasha.sernova3.4k

Many thanks for your reply; I am not sure that the problem is one of an atypical character as it seems to stop at different places each time I run it, for example another attempt produced the report "line 15: 103937 Killed". Nevertheless, would you be able to recommend a way of identifying atypical characters in the fasta?

And thank you for linking to Mafft; I will look into it.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by rc1695560

Are you running it locally on your computer or on a cluster ?

ADD REPLYlink written 2.5 years ago by microfuge1.0k

Looks like this is being run on a cluster.

First thing to check would be to see how much memory is being consumed. I suspect that you may be running out of it/hitting a quota.

ADD REPLYlink written 2.5 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1033 users visited in the last hour