Question

Problem with MAFFT Multiple Sequence Alignment

0

Entering edit mode

3.3 years ago

pfee418 ▴ 10

Hi guys, I ran MSA with MAFFT (FFT-NS-i method with two cycles) in my university server, which has 31.3GB memory and 31.9GB swap memory. My input file has 8579 sequences (All human coronaviruses) [~253MB]. The job has running for 5 days already and suddenly the job has been killed not long ago (after I logged out from the server for a few mins). Here is the statement/program lines has appeared:

/usr/local/bin/mafft: line 2604: 25726 Killed "$prefix/dvdr" -W $minimumweight $bunkatsuopt -E $fixthreshold -s $unalignlevel $legacygapopt $mergearg $outnum -C $numthreadsit -t $randomseed $rnaoptit $memopt $scorecalcopt $localparam -z 50 $seqtype $model -f "-"$gop -Q $spfactor -h $aof -I $iterate $weightopt $treeinopt $algoptit $treealg -p $parallelizationstrategy $scoreoutarg -K $nadd < pre > /dev/null 2>> "$progressfile"

I've been watching the status of the server when the MSA job is running and I realised MAFFT took a lot of memory space to run MSA (particularly this iterative refinement method). The used memory space can go as high as 97% (30.9G/31.3G). And today I saw the memory swap went high as well, the highest memory swap is 29-30+G/31.9G (I couldnt remember the specific number of I think is around this range).

I thought I accidentally killed the job before I detached the screen but I remembered clearly I didnt killed the job. If I killled the job the program lines above would not appeared. So I tried to search the cause of the job killed and I saw from a forum that people might it could be because of memory problem. I am not sure is it because of memory space issue. Hence, can anyone explain me why my job/process get killed?

Thank you in advanced for all the suggestions and explanations and I will appreciate all the responses. :)))

msa genome mafft alignment • 4.3k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 3.3 years ago by pfee418 ▴ 10

1

Entering edit mode

MAFFT online server (https://mafft.cbrc.jp/alignment/server/) is a very good option as an alternative.

ADD REPLY • link 3.3 years ago by MSRS ▴ 580

0

Entering edit mode

Hi there, I have tried MAFFT online server but my sequence data are too big to run in MAFFT server. My job get killed when I tried in MAFFT server due to time restriction. MAFFT online server only allows users' MSA job to run for 24 hours. After 24 hours the server will automatically terminate your job. This is why I've been using my university's server due to time restriction by MAFFT server.

ADD REPLY • link 3.3 years ago by pfee418 ▴ 10

0

Entering edit mode

Hi, Why not try with this version (https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html). you can find some information here. Thank you.

ADD REPLY • link 3.3 years ago by MSRS ▴ 580

score 2 · Answer 1 · 2020-12-22

You already asked about it before running this task, but I can't blame you for wanting to find out on your own. There can't be more than one or two multiple sequence alignment programs that can handle the number and length of sequences you are trying to align. MAFFT is not one of them, and it certainly doesn't help that you are working with relatively modest memory.

The bottom line is that your combination of RAM memory and swap is not enough for this task, so once the job reaches the limit there is nothing else for the system to do but to kill it. Regardless of what program you choose in the end, I think you should think about 128 Gb or RAM and the same or double in swap space.

Maybe one of these programs can do what you want:

score 1 · Answer 2 · 2021-01-24

I recently met up with the same problem (line 2604: XXXX Killed). I found that this is because I used an old version of mafft. Mafft added a function of "Rapid calculation of full-length MSA of closely-related viral genomes" (https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html) , it also mentioned that "On command line, use version 7.467 or later. Earlier versions (≤7.458) had the same options but were inefficient for this purpose. " Therefore I think you should check your mafft version first.

score 1 · Answer 3 · 2021-10-04

In your circumstance, I would like to recommend you to give it a try for Nextalign. https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextalign-cli.html

nextalign \
 --sequences=data/sars-cov-2/sequences.fasta \
 --reference=data/sars-cov-2/reference.fasta \
 --genemap=data/sars-cov-2/genemap.gff \
 --genes=E,M,N,ORF1a,ORF1b,ORF3a,ORF6,ORF7a,ORF7b,ORF8,ORF9b,S \
 --output-dir=output/ \
 --output-basename=nextalign