Retrieving Best N Local Alignments
4
1
Entering edit mode
13.2 years ago
Guidobot ▴ 20

Can anyone point me to a program [C/Python] that runs a Smith-Waterman pairwise local alignment but (unlike EMBOSS/water) has options for one or both of these:

  1. Return the best N hits of query sequence.

  2. Extend the/each local alignment to produce the best full local alignment for the (much smaller) query sequence.

pairwise sequence alignment • 4.1k views
ADD COMMENT
2
Entering edit mode

I would consider #2 to be a global alignment

ADD REPLY
0
Entering edit mode

True, true, especially because for poorer matches the full best match can be different. However, the full alignment (w/ EMBOSS/needle) can take much longer and produces much bigger files. I guess question should have been: 2a. A Needleman-Wunsch implementation that just produces alignment and score over shorter sequence matching. 2b. A Needleman-Wunsch implementation that runs faster because an option to limit the DA window (matrix size), e.g. to 2x query length.

But I will also look to FASTA to see if it answers this too. Cheers.

ADD REPLY
0
Entering edit mode

Yep, exonerate is what I wanted. Love being able to format the output with --ryo.

Unfortunately there seems to be a bug with -bestn (-n) [for query vs. multiple targets] but the --percent option is closer to what I wanted. Also seems relatively slow in --exhaustive mode and (possibly because) I can't seem to disable rev.comp. query alignment.

ADD REPLY
1
Entering edit mode

hello Guidobot, this website works differently than a normal forum. If you want to reply to a specific answer, use the comment option. To indicate the best answer to your question, you should use the 'check this answer as the best one' option.

ADD REPLY
0
Entering edit mode

Thanks. I indeed added this in an attempt to mark the thread 'answered'.

ADD REPLY
5
Entering edit mode
13.2 years ago

I like exonerate, which can be run with the affine:local and -n options to get what you want. It is a nice tool that runs from the command line and does not need to format databases or further configurations

$: exonerate -q querysequence.txt -t targetsequences.txt -m affine:local -n 3 # n=3 to get the 3 best alignments.
ADD COMMENT
0
Entering edit mode

Wow. Thanks! I checked out the man page and this looks perfect, and then some.

ADD REPLY
0
Entering edit mode

The affine:best (N) alignments was what I needed. In fact affine:global appears to be of no use for small query sequences searches. Have to be real careful with parameters, e.g. runs appear to go on indefinitely if you use +ve gapopen values. The -bestn option does not appear to be working as advertised.

ADD REPLY
5
Entering edit mode
13.2 years ago

If I understand you correctly, point 1 in your question is just asking for a tool that can search a sequence against a sequence database using the Smith-Waterman local alignment algorithm and return be N best hits. The ssearch program from Bill Pearson's famous FASTA package can do that.

I am not sure I understand your second point? Is that not merely a matter of subsequently running a Needleman-Wunsch global alignment for the best N hits?

ADD COMMENT
2
Entering edit mode

Sorry, but in your question you explicitly asked for a program (C/Python), not for an online resource. I do not get why you ask for that if you are not happy about having to download and build the software. Also you now ask for N best global alignments whereas in your question you asked for N best local alignments.

ADD REPLY
0
Entering edit mode

Thanks again Jeremy. I'll look at this and see if it's what I need.

ADD REPLY
0
Entering edit mode

I should have mentioned that I'm working with nucleotides. The FASTA site does not allow retrieval of the best N hits (which is a pity as my PC is a bit of a clunker). It appears I would have to download/build the FASTA-36 version to solve (1). This also has ggsearch for best N global scores but this is only for protein seqs.

If I build FASTA-36 locally do you know if any executable can handle N best global DNA sequence alignments? (Sorry, doc on site is quite limited.)

ADD REPLY
0
Entering edit mode

Apologies if I've appeared to change the question. I am looking for a python/C program so I can test larger files, although a on-line resource would also be handy right now and I'm grateful for the information. The objective of the both (1)+(2) of my question is to get the best N global alignments. My original thinking was I could do (2) by using the best N local alignments. Cheers.

ADD REPLY
1
Entering edit mode
13.2 years ago

It sounds like perhaps BFAST bfast.sourceforge.net) will fill your needs. Read the paper here: http://dx.doi.org/10.1371/journal.pone.0007767

ADD COMMENT
0
Entering edit mode

Thanks Michael. This looks great for what I need further down the line. Right now I'm looking to pairwise compare sets of sequences where small sequences (45bp) are expected to hit targets multiple times, but with relatively high numbers of short INDEL and mismatch read errors. A small program that uses NW to return N best alignments would be ideal.

ADD REPLY
0
Entering edit mode

Ah sorry, I guess I misunderstood. Looks like exonerate might be just what you need then! Good luck!

ADD REPLY
1
Entering edit mode
12.1 years ago
Bill Pearson ★ 1.0k

I am also confused by the question. The ssearch36 program, available at fasta.bioch.virginia.edu/ will compare two sequences and give you local alignments a "query" sequence with a "target" sequence; you should be able to see an arbitrary number of alignments by modifying the expect threshold. ssearch36 will make certain that no part of the "target" sequence is "re-used". If you select the "compare two sequences" button from the ssearch library search page, you can compare two DNA sequences (you can only search protein libraries).

Alternatively, the lalign36 program can also show multiple non-overlapping alignments, but lalign is capable of reusing the target sequence. Thus, if a protein (or DNA sequence) contains four homologous domains A1 A2 A3 A4, then all possible significant alignments, e.g. A1 A2 A3 : A2 A3 A4, A1 A2 : A3 A4, and A1:A4 are shown (A1 A2 A3 A4 : A1 A2 A3 A4 is not shown, because that is the identity alignment). lalign36 also works with either protein or DNA sequences.

I do not understand the question about extending the alignment outside the local region.

ADD COMMENT

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6