I have a set of 10-12 very closely related chromosome sequences (from different strains) aligned to a "single" reference chromosome. Now I need to generate multiple sequence alignment of these without afftecting individual alignments to the reference. All that I need is to add relative inserts at respective sequence positions, so that I get a global alignment with respect to reference.
"Note. Here I am NOT looking for sequence similarities".
I hope some scripts are already available to do this? I dont want to reinvent the wheel. OR any suggestions to script (preferably in perl) to address this problem.
Adding more information:
What I need is to do is to "add dashes" at relative "insert" positions in other sequences, so that I get a global alignment with respect to reference.
For example with 10 sequnces: If from position 30 to 55 in seq1 has an insert. but not in other 9 sequences. In the final expanded alignment I will insert 26 dashes (-) (from 30-55) in the sequences 2 to 10 and in the reference.
And in another situation like above if seq1 has insert at 30-55 and if seq3 has insert from 40-45 and seq6 has insert from 42-49. Then I need to insert dashes like above, except in these positions (in seq3: 40-45 and in seq6: 112-119)
Sample Input: Original reference: CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Pairwise alignments:
Ref1: CGACAAT--GCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Seq1: CGACAATAAGCACGACAGAGGAAGCAGAACAGATA-----ATTGCCTCTCATTTTC-CTCCC
Ref1: CGACAATGCACGACAGAGGAAGC--AGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Seq2: CGACAAT-CACGACAGAGGAAGCTTAGAACAGATATTTAG---GCCTCTCATTTTCTCTCCC
Ref1: CGACAATGCACGACAGAGGAAG----CAGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC
Seq3: CGACAATGCACGACAGAGGAAGTTTTCAGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC
sample Output: Final Multiple sequence alignment:
Ref1: CGACAAT--GCACGACAGAGGAAG----C--AGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC
Seq1: CGACAATAAGCACGACAGAGGAAG----C--AGAACAGATA-----ATTGCCTCTCA----TTTTC-CTCCC
Seq2: CGACAAT---CACGACAGAGGAAG----CTTAGAACAGATATTTAG---GCCTCTCA----TTTTCTCTCCC
Seq3: CGACAAT--GCACGACAGAGGAAGTTTTC--AGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC
Is there any tool available to convert fasta or clustal to MAF format?
thanks @Haibao Tang, Lemme give it a try. But was wondering how strict it preserves "once a gap, always a gap". If so then this what I wanted.
I am not aware of the tools, but I would write a script to do the conversion, based on the specs of the MAF format.
Thanks @Haibao Tang I tried it. Even though it says "once Gap always a Gap" when I tried to align a chromosome from multiple strains, I see slight shift in the indels in the MSA, if I change the input order.