This question is cross-posted at SeqAnswers with the alignments. I have 5 related EST sequences and I want to assemble contigs. I am using the sequence assembly program CAP3. Here is my data:
>ES981358
GTTACTGTCCAGAGATTCGAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAGGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTTGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTGCCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCTTGACCTTTCAATGTGATTCAGCATCAAGAAACAATTGATATGAATAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGATTCTTCTGCTCCGTGCAGCTGTGATCTCACTTCGCAGTTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCATTAAGGCAAAAATGTTATATAACTTCATAAAAAGTTATAGGAGATGTTTTCTTCCTTTCATATAAACAAAACAAGATTGTGGAATCCATTTTAGTCCAA
>EV115773
ACTGNGGTCGCCTGCAGGTACCGGTCTCGGATTCCCGGGTCGACCCACGCGTCCGACCGTTGCCGCGCCTCGTTACTCTCTGGCCTCTCTGCAATCATCCATCACCATGTTCATGTTCATCGCTGCAGATTCTGCTCTGAGGTTCACATTGGTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGTAGATTTCGAGGAAGTCAACAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGGTCACAGGGGTCAAGAAACTGATGGAGAGATACAAAGTGTGGACATGTGGTTACTGTCCAGAGATTCAAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCNGTTGTGGAGTTGTGTGTGCAAGGTGGGTGCACCGTGCCTGACCAGTC
>EV115686
GGTATATGAAACCTTTGAACAAACTATTGGACTAAAATGGATTCCACAATCTTGTTTTGTTTATATGAAAGGAAGAAAACATCTCCTATAACTTTTTATGAAGTTATATAACATTTTTGCCTTAATGGTACAAAACCTCAACGGGAATGTTGAAATTGCATTAACCTCAAAGCTATTGAGAAGCTGCGAAACTGCGAAGTGAGATCACAGCTGCACGGAGCAGAAGAATCAAACATCTAACTATGTTTCCTAATATGTATATGGAGCTTTCGTTTATTCATATCAATTGTTTCTTGATGCTGAATCACATTGAAAGGTCAAGCGACTAGATCGACTTCATCACGTTGAGGGTAAACCACATCGAGTCTCATCATACTCTTGTACTGGTCAGGCACCGGTGCACCACCTTGCACACACAACTCCACAACCGCAGGAGCTTTCCCATAAAACCTCTGCAAGCTATTGTCAAGAGCAGAAGAATCTGGATCCCGAACATGCCAAACATAGTTTGGACCGACCACGTCGTCAATAGTCGCTTCTTGCCACGCGTGCATTCCGTCTCGCATCTGATGTTTTGTCGCTTTACACATCTTCACTTTGTGTCCCTTTGGACCCACTTGAATCTCTGGACAGTAACCACATGTCCACACTTTGTATCTCTCCATCAGTTTCTTGACCCCTGTGACCATTTCAAACCATGAGTCCATCGTCTCAACGCTTAACTCCTTCAAGCTCTTCTTCTCTGTACTGTAATGA
>EV005953
GAACTCAGAAACCGCAGTTACTAGTGTTAGCACGGCTACACCTTTACAAGAAGATGATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTAAGACAATGGATTCATGGTTTGAAATGGTCACAGGGGTCAAGGAACTGATGGAGAAATACAAAGTGTGGACTTGTGGTTACTGTCCAGAGGTTCAAGTGGGTCCCAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAGCATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTGTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTCGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTACCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCGTGACCTTTCAAGGTGATTCAGCATCAAGAAACAATTGATATGAAAAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGGTTCTTCTGCTCCGTGCAGCTTCTCACTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCA
>EE435034
TTATATTGAGGAGATCCCCCGGTGGAGGATGTTAACCGAGATATCTCGGAGAAATATTCGTGCGGGATTTAGCATGATAACGAGGCTCAAAAAATGAAAAGTAGTCTAGCAATTGTAAGCTGAGGACTCTTTCACACAGGAAATAGCTATGATCATGATCATCGCTGGAGATTCTGCTCTGAGGATCACATTGTTTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGGTAGATTTCGAGGAAGTCAACAAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGTTCACAGGGGTCA
When I run this through CAP3, I get a contig of EV005953+, EV115686-, and ES981358+. However, I saw somewhere else that actually EV115773 should also align with the other sequences. To test this I gave just the relevant 4 sequences as the input:
>ES981358
GTTACTGTCCAGAGATTCGAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAGGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTTGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTGCCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCTTGACCTTTCAATGTGATTCAGCATCAAGAAACAATTGATATGAATAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGATTCTTCTGCTCCGTGCAGCTGTGATCTCACTTCGCAGTTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCATTAAGGCAAAAATGTTATATAACTTCATAAAAAGTTATAGGAGATGTTTTCTTCCTTTCATATAAACAAAACAAGATTGTGGAATCCATTTTAGTCCAA
>EV115773
ACTGNGGTCGCCTGCAGGTACCGGTCTCGGATTCCCGGGTCGACCCACGCGTCCGACCGTTGCCGCGCCTCGTTACTCTCTGGCCTCTCTGCAATCATCCATCACCATGTTCATGTTCATCGCTGCAGATTCTGCTCTGAGGTTCACATTGGTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGTAGATTTCGAGGAAGTCAACAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGGTCACAGGGGTCAAGAAACTGATGGAGAGATACAAAGTGTGGACATGTGGTTACTGTCCAGAGATTCAAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCNGTTGTGGAGTTGTGTGTGCAAGGTGGGTGCACCGTGCCTGACCAGTC
>EV115686
GGTATATGAAACCTTTGAACAAACTATTGGACTAAAATGGATTCCACAATCTTGTTTTGTTTATATGAAAGGAAGAAAACATCTCCTATAACTTTTTATGAAGTTATATAACATTTTTGCCTTAATGGTACAAAACCTCAACGGGAATGTTGAAATTGCATTAACCTCAAAGCTATTGAGAAGCTGCGAAACTGCGAAGTGAGATCACAGCTGCACGGAGCAGAAGAATCAAACATCTAACTATGTTTCCTAATATGTATATGGAGCTTTCGTTTATTCATATCAATTGTTTCTTGATGCTGAATCACATTGAAAGGTCAAGCGACTAGATCGACTTCATCACGTTGAGGGTAAACCACATCGAGTCTCATCATACTCTTGTACTGGTCAGGCACCGGTGCACCACCTTGCACACACAACTCCACAACCGCAGGAGCTTTCCCATAAAACCTCTGCAAGCTATTGTCAAGAGCAGAAGAATCTGGATCCCGAACATGCCAAACATAGTTTGGACCGACCACGTCGTCAATAGTCGCTTCTTGCCACGCGTGCATTCCGTCTCGCATCTGATGTTTTGTCGCTTTACACATCTTCACTTTGTGTCCCTTTGGACCCACTTGAATCTCTGGACAGTAACCACATGTCCACACTTTGTATCTCTCCATCAGTTTCTTGACCCCTGTGACCATTTCAAACCATGAGTCCATCGTCTCAACGCTTAACTCCTTCAAGCTCTTCTTCTCTGTACTGTAATGA
>EV005953
GAACTCAGAAACCGCAGTTACTAGTGTTAGCACGGCTACACCTTTACAAGAAGATGATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTAAGACAATGGATTCATGGTTTGAAATGGTCACAGGGGTCAAGGAACTGATGGAGAAATACAAAGTGTGGACTTGTGGTTACTGTCCAGAGGTTCAAGTGGGTCCCAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAGCATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTGTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTCGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTACCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCGTGACCTTTCAAGGTGATTCAGCATCAAGAAACAATTGATATGAAAAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGGTTCTTCTGCTCCGTGCAGCTTCTCACTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCA
Now, I get an alignment where all the 4 sequences align. This seems to be correct. So why did the first run not align EV115773 ut gave it as a singlet? How can we correctly run CAP3 to give consistent results? All of this can be checked using the online CAP3 page at http://pbil.univ-lyon1.fr/cap3.php or by downloading CAP3 from http://seq.cs.iastate.edu/cap3.html
I think there might be some parameter tuning need to be done there. Maybe you could play with different combination of those parameters such as identity percentage and overhang length.