Question

Reducing runtime of low-score alignments

0

Entering edit mode

8 months ago

benjlishani7 • 0

Hello all :)

I am running a python script that aligns transcripts of certain genes to the transcripts of their paralogs (using pairwise2.align.localxx). There are approximately 1500 alignments carried out, and most of them are done pretty quickly; but some alignments take an unproportionally long time (20 minutes and longer). I noticed the results of these slow alignments all have a very low alignment score - so I assume the aligner is just having a hard time figuring out a way to fit two very different sequences.

Currently, because of the long alignment time, the script runs for many hours without finishing (if it doesn't crush, which also happens). So my question is - how can I make the script more efficient, so it would not linger so much on sequences that have very little in common? For me, it doesn't matter if the all alignments which have lower than a certain minimum score would be skipped/dropped.

Thank you in advance!

python pairwise-alignment alignment • 893 views

ADD COMMENT • link updated 8 months ago by Istvan Albert 103k • written 8 months ago by benjlishani7 • 0

1

Entering edit mode

Python is not very efficient for that type of work. There are all kinds of existing tools that are much faster: BLAST, FASTA and DIAMOND are probably best known among them. While these are local aligners - you may need a global aligner - they can be adjusted to make global alignments. Unless there is something peculiar you need that these programs can't do, I suggest you switch from your python scripts to one of these programs.

ADD REPLY • link 8 months ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Seconding Mensur, use a dedicated program. Have you looked at a program designed for aligning transcripts?

ADD REPLY • link 8 months ago by Mark ★ 1.7k

score 0 · Answer 1 · 2025-01-23

As others have mentioned you should probably use a dedicated tool maybe exonerate instead of Python.

If you must use Python, maybe you could attempt a local alignment first with shorter subsequences from your query. That way, you may be able to assess whether the final alignment would be above a cutoff of similarity.

With that you'd be adding a little heuristics decision-making layer.