Question: Gaps Problems in Needleman-Wunsch Pairwise Sequence Alignment
0
gravatar for charlieyu.bt99
3 months ago by
charlieyu.bt9910 wrote:

I want to align two protein sequences. For example,
sequence 1: AYGEC
sequence 2: A(GG)C

Note that sequence 2 is always shorter than sequence 1. So gaps inevitably appear in the sequence 2 of final alignment.

My goal is to align them 1) without end-gaps and 2) no gap is allowed in (GG) of sequence 2. So the possible final alignment would be
AYGEC
A-(GG)C
or
AYGEC
A(GG)-C

Can anyone suggest me any available command-line tools to do it? I know many tools are allowed to set a high end-gap open penalty to prevent end gaps. But I have not found any tools to prevent gaps between assigned residues.

sequence alignment • 181 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by charlieyu.bt9910

You can set the gap open and gap extend penalties high.

Your question doesn't fully match your example though as:

I have not found any tools to prevent gaps between assigned residues.

This isn't the challenge. You aren't trying to prevent gaps (other than between known patterns) but instead coerce the alignment to match some a priori idea of which bits should and shouldn't match. Indeed, your example differs only in where a gap is inserted.

This doesn't really sound like a good alignment approach to me, and would probably lead to you manually editing the alignments one way or another anyway.

I personally am not aware of such a tool. Depending on the actual objective, a regex approach to find matches to known subsequences might be more appropriate.

ADD REPLYlink modified 3 months ago • written 3 months ago by Joe16k

Sorry I gave a bad example. I think my problem is a constraint global alignment. If I have a sequence 1 to be aligned to a reference protein sequence 2, I already know some residue segments in the reference sequence must have no gaps to be inserted into them. Thus, the rest part of residues are free, and the final alignment depends on the dynamic programming and trace back matrices. So I think I can not just do a simple regular expression to align myself.

ADD REPLYlink written 3 months ago by charlieyu.bt9910

I think you need something like a glocal alignment. It's easy enough to remove gaps from the ends of alignments after the fact, but if you particularly care about preserving small motifs, you still need a local alignment based approach to some degree.

If you are aligning to a reference, there shouldn't be gaps appearing in the reference really (don't do multiple sequence alignment in this case). You are assuming the reference is already correct, so you really just want to align the query sequences no?

ADD REPLYlink written 3 months ago by Joe16k

Yes, I actually did multiple sequence alignment. I wanted to check if any loops are aligned correctly. How I did this is that I see the sequences having the corresponding secondary structures as referecne sequences. I can correct misaligned sequence by manual correction. However, I had too many sequences to do such corrections. I wanted to write a script to do such corrections automatically. So yeah, as you said, I just want to "correct" query sequences.

ADD REPLYlink written 3 months ago by charlieyu.bt9910

I think you need a semi-local or local multiple pairwise alignment to your reference. If you do multiple sequence alignment, no sequence is 'privileged' as the reference sequence, so they will all be subject to the addition of gaps etc.

ADD REPLYlink written 3 months ago by Joe16k
1

I think I bypass this problem by other tricks. But thank you for helping me though.

ADD REPLYlink written 3 months ago by charlieyu.bt9910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour