I have a set of 10-12 very closely related chromosome sequences (from different strains) aligned to a "single" reference chromosome. Now I need to generate multiple sequence alignment of these without afftecting individual alignments to the reference. All that I need is to add relative inserts at respective sequence positions, so that I get a global alignment with respect to reference.
"Note. Here I am NOT looking for sequence similarities".
I hope some scripts are already available to do this? I dont want to reinvent the wheel. OR any suggestions to script (preferably in perl) to address this problem.
Adding more information:
What I need is to do is to "add dashes" at relative "insert" positions in other sequences, so that I get a global alignment with respect to reference.
For example with 10 sequnces: If from position 30 to 55 in seq1 has an insert. but not in other 9 sequences. In the final expanded alignment I will insert 26 dashes (-) (from 30-55) in the sequences 2 to 10 and in the reference.
And in another situation like above if seq1 has insert at 30-55 and if seq3 has insert from 40-45 and seq6 has insert from 42-49. Then I need to insert dashes like above, except in these positions (in seq3: 40-45 and in seq6: 112-119)
Sample Input: Original reference: CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Ref1: CGACAAT--GCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC Seq1: CGACAATAAGCACGACAGAGGAAGCAGAACAGATA-----ATTGCCTCTCATTTTC-CTCCC Ref1: CGACAATGCACGACAGAGGAAGC--AGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC Seq2: CGACAAT-CACGACAGAGGAAGCTTAGAACAGATATTTAG---GCCTCTCATTTTCTCTCCC Ref1: CGACAATGCACGACAGAGGAAG----CAGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC Seq3: CGACAATGCACGACAGAGGAAGTTTTCAGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC
sample Output: Final Multiple sequence alignment:
Ref1: CGACAAT--GCACGACAGAGGAAG----C--AGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC Seq1: CGACAATAAGCACGACAGAGGAAG----C--AGAACAGATA-----ATTGCCTCTCA----TTTTC-CTCCC Seq2: CGACAAT---CACGACAGAGGAAG----CTTAGAACAGATATTTAG---GCCTCTCA----TTTTCTCTCCC Seq3: CGACAAT--GCACGACAGAGGAAGTTTTC--AGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC