Gaps Problems in Needleman-Wunsch Pairwise Sequence Alignment
0
0
Entering edit mode
4.3 years ago

I want to align two protein sequences. For example,
sequence 1: AYGEC
sequence 2: A(GG)C

Note that sequence 2 is always shorter than sequence 1. So gaps inevitably appear in the sequence 2 of final alignment.

My goal is to align them 1) without end-gaps and 2) no gap is allowed in (GG) of sequence 2. So the possible final alignment would be
AYGEC
A-(GG)C
or
AYGEC
A(GG)-C

Can anyone suggest me any available command-line tools to do it? I know many tools are allowed to set a high end-gap open penalty to prevent end gaps. But I have not found any tools to prevent gaps between assigned residues.

alignment sequence • 1.8k views
ADD COMMENT
0
Entering edit mode

You can set the gap open and gap extend penalties high.

Your question doesn't fully match your example though as:

I have not found any tools to prevent gaps between assigned residues.

This isn't the challenge. You aren't trying to prevent gaps (other than between known patterns) but instead coerce the alignment to match some a priori idea of which bits should and shouldn't match. Indeed, your example differs only in where a gap is inserted.

This doesn't really sound like a good alignment approach to me, and would probably lead to you manually editing the alignments one way or another anyway.

I personally am not aware of such a tool. Depending on the actual objective, a regex approach to find matches to known subsequences might be more appropriate.

ADD REPLY
0
Entering edit mode

Sorry I gave a bad example. I think my problem is a constraint global alignment. If I have a sequence 1 to be aligned to a reference protein sequence 2, I already know some residue segments in the reference sequence must have no gaps to be inserted into them. Thus, the rest part of residues are free, and the final alignment depends on the dynamic programming and trace back matrices. So I think I can not just do a simple regular expression to align myself.

ADD REPLY
0
Entering edit mode

I think you need something like a glocal alignment. It's easy enough to remove gaps from the ends of alignments after the fact, but if you particularly care about preserving small motifs, you still need a local alignment based approach to some degree.

If you are aligning to a reference, there shouldn't be gaps appearing in the reference really (don't do multiple sequence alignment in this case). You are assuming the reference is already correct, so you really just want to align the query sequences no?

ADD REPLY
0
Entering edit mode

Yes, I actually did multiple sequence alignment. I wanted to check if any loops are aligned correctly. How I did this is that I see the sequences having the corresponding secondary structures as referecne sequences. I can correct misaligned sequence by manual correction. However, I had too many sequences to do such corrections. I wanted to write a script to do such corrections automatically. So yeah, as you said, I just want to "correct" query sequences.

ADD REPLY
0
Entering edit mode

I think you need a semi-local or local multiple pairwise alignment to your reference. If you do multiple sequence alignment, no sequence is 'privileged' as the reference sequence, so they will all be subject to the addition of gaps etc.

ADD REPLY
1
Entering edit mode

I think I bypass this problem by other tricks. But thank you for helping me though.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6