Importance of Multiple Sequence Alignment
2
0
Entering edit mode
5 weeks ago

I am planning to work on designing "hardware accelerator for Multiple Sequence Alignment (MSA) tool, MAFFT". I decided to select MAFFT algorithms as my research target because I found in publications that some of the MAFFT algorithms are more accurate compared to other solutions for MSA but their adoption in Industry and Academia has been limited due to high time-consumption. Since I belong to technology domain and not to the bioinformatics, I would like to know whether my targets are worth hardware acceleration effort from Bioinformaticians' point of view for Industry/ Research. Please provide some important reasons for your suggestions.

1
Entering edit mode

I would like to know whether my targets are worth hardware acceleration effort from Bioinformaticians' point of view for Industry/ Research.

Anything that saves time is going to be of worth to the scientists. If your question is whether they would be willing to pay for this is hard to say. Doing MSA's is not a task that is an integral part of workflows like alignment of NGS data is.

2
Entering edit mode
5 weeks ago
Mensur Dlakic ★ 15k

I found in publications that some of the MAFFT algorithms are more accurate compared to other solutions for MSA but their adoption in Industry and Academia has been limited due to high time-consumption.

I don't think this is accurate, so I would verify that source. MAFFT is one of the faster and more widely used aligners, and it has several different modes. Some of them are fast, and others are slower and more accurate, but in general I don't think there is any problem with speed when it comes to MAFFT.

I would like to know whether my targets are worth hardware acceleration effort from Bioinformaticians' point of view for Industry/ Research.

MAFFT is multi-threaded, and there are some speed and accuracy benchmarks here. Generally speaking, it is faster and more accurate than most popular program, and the only weakness I am aware of is that it is very slow with long sequences.

As they say, you can never have too much speed or memory when it comes to computers, and faster MSA programs are always welcome. That said, the hardware acceleration may not be significant enough to merit the effort. I am pretty sure that MAFFT authors made a GPU version of the program, but it turned out to be similar in speed or slower to the CPU multi-threaded version. I interpret that to mean that there is limited room to parallelize the MAFFT algorithm, but it could certainly be that other reasons were in play.

1
Entering edit mode
5 weeks ago

Performing sequence alignments faster is in general are extremely valuable,

multiple sequence alignments even more so since the tasks are even more resource-intensive and the complexity goes up quickly.

In the past the assemblies were hard to come by, thus most bioinformatics was focused on short read alignments. With the advent of long-read sequencing aligning multiple long reads will become the next frontier and is bound to be of great interest.

Thus I think speeding up the process will probably have a huge value.