Alingment of ~23 Mb sequences
0
0
Entering edit mode
6.7 years ago
l.souza ▴ 80

I've tried to make an MSA between three sequences of different species fce rom the same genus, with ~23 million bases, to find motifs related to the virulence of these organisms. I used MAFFT with the fastest algorithm - FFT-NS-1 - but it broke my computer. Then, I tried to run into CIPRES server, but the process was killed after the time limit (72h).

Is there a fastest way to align these huge sequences?

alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

What is your end goal?

ADD REPLY
0
Entering edit mode

Since OP is using MAFFT I am going to guess a MSA.

ADD REPLY
0
Entering edit mode

You are aligning what to what? Please be as informative as possible. You have already asked a couple of question on biostars, by now you should know that we need all details you can provide.

ADD REPLY
0
Entering edit mode

I edited the post. Now, I think this is enough to understand the problem!

ADD REPLY
0
Entering edit mode

Do you need them to be multiply aligned, or would 3 pairwise alignments suffice? If so you can use MUMmer for whole genomes. Though 23MB might still be pushing it.

Are you trying to do this on a personal computer or do you have compute access?

ADD REPLY
0
Entering edit mode

I've tried in my personal computer, in a particular server and in CIPRES. Does MUMmer work only for pairwise alignment?

ADD REPLY
0
Entering edit mode

Yeah I believe so. Kalign is another option for large sequences that can manage an MSA.

I'm not surprised your own PC couldn't handle it, you'll probably need to try and get access to a server of something kind as the process may take a long time and will almost certainly need more resources than you have

ADD REPLY
0
Entering edit mode

There is also LASTZ. According to its author :

LASTZ can perform full [human] chromosome-to-chromosome alignments in 2G of memory.

ADD REPLY
0
Entering edit mode

What do you mean by "three sequences of different species from the same genus"? Three genes per species? Are you aligning each one of the genes / sequences separately? Or are you concatenating and aligning them all at once? How many species do you have? Have you tried to remove identical sequences?

ADD REPLY
0
Entering edit mode

Three whole genome sequences, each one from different species. I've tried to align them all at once.

ADD REPLY
0
Entering edit mode

Try LASTZ as suggested, or Mauve (with GUI), or LAST. There are innumerable programs suitable for this task. MAFFT may be unfit, if there are rearrangements between species. Are the genomes circular? Do they contigs start/stop at the same position?

ADD REPLY
0
Entering edit mode

I'm gonna try these. They're linear.

ADD REPLY

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6