Inconsistent Cap3 Assembly
0
0
Entering edit mode
10.8 years ago
hbw ▴ 90

This question is cross-posted at SeqAnswers with the alignments. I have 5 related EST sequences and I want to assemble contigs. I am using the sequence assembly program CAP3. Here is my data:

>ES981358
GTTACTGTCCAGAGATTCGAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAGGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTTGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTGCCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCTTGACCTTTCAATGTGATTCAGCATCAAGAAACAATTGATATGAATAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGATTCTTCTGCTCCGTGCAGCTGTGATCTCACTTCGCAGTTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCATTAAGGCAAAAATGTTATATAACTTCATAAAAAGTTATAGGAGATGTTTTCTTCCTTTCATATAAACAAAACAAGATTGTGGAATCCATTTTAGTCCAA
>EV115773
ACTGNGGTCGCCTGCAGGTACCGGTCTCGGATTCCCGGGTCGACCCACGCGTCCGACCGTTGCCGCGCCTCGTTACTCTCTGGCCTCTCTGCAATCATCCATCACCATGTTCATGTTCATCGCTGCAGATTCTGCTCTGAGGTTCACATTGGTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGTAGATTTCGAGGAAGTCAACAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGGTCACAGGGGTCAAGAAACTGATGGAGAGATACAAAGTGTGGACATGTGGTTACTGTCCAGAGATTCAAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCNGTTGTGGAGTTGTGTGTGCAAGGTGGGTGCACCGTGCCTGACCAGTC
>EV115686
GGTATATGAAACCTTTGAACAAACTATTGGACTAAAATGGATTCCACAATCTTGTTTTGTTTATATGAAAGGAAGAAAACATCTCCTATAACTTTTTATGAAGTTATATAACATTTTTGCCTTAATGGTACAAAACCTCAACGGGAATGTTGAAATTGCATTAACCTCAAAGCTATTGAGAAGCTGCGAAACTGCGAAGTGAGATCACAGCTGCACGGAGCAGAAGAATCAAACATCTAACTATGTTTCCTAATATGTATATGGAGCTTTCGTTTATTCATATCAATTGTTTCTTGATGCTGAATCACATTGAAAGGTCAAGCGACTAGATCGACTTCATCACGTTGAGGGTAAACCACATCGAGTCTCATCATACTCTTGTACTGGTCAGGCACCGGTGCACCACCTTGCACACACAACTCCACAACCGCAGGAGCTTTCCCATAAAACCTCTGCAAGCTATTGTCAAGAGCAGAAGAATCTGGATCCCGAACATGCCAAACATAGTTTGGACCGACCACGTCGTCAATAGTCGCTTCTTGCCACGCGTGCATTCCGTCTCGCATCTGATGTTTTGTCGCTTTACACATCTTCACTTTGTGTCCCTTTGGACCCACTTGAATCTCTGGACAGTAACCACATGTCCACACTTTGTATCTCTCCATCAGTTTCTTGACCCCTGTGACCATTTCAAACCATGAGTCCATCGTCTCAACGCTTAACTCCTTCAAGCTCTTCTTCTCTGTACTGTAATGA
>EV005953
GAACTCAGAAACCGCAGTTACTAGTGTTAGCACGGCTACACCTTTACAAGAAGATGATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTAAGACAATGGATTCATGGTTTGAAATGGTCACAGGGGTCAAGGAACTGATGGAGAAATACAAAGTGTGGACTTGTGGTTACTGTCCAGAGGTTCAAGTGGGTCCCAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAGCATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTGTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTCGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTACCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCGTGACCTTTCAAGGTGATTCAGCATCAAGAAACAATTGATATGAAAAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGGTTCTTCTGCTCCGTGCAGCTTCTCACTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCA
>EE435034
TTATATTGAGGAGATCCCCCGGTGGAGGATGTTAACCGAGATATCTCGGAGAAATATTCGTGCGGGATTTAGCATGATAACGAGGCTCAAAAAATGAAAAGTAGTCTAGCAATTGTAAGCTGAGGACTCTTTCACACAGGAAATAGCTATGATCATGATCATCGCTGGAGATTCTGCTCTGAGGATCACATTGTTTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGGTAGATTTCGAGGAAGTCAACAAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGTTCACAGGGGTCA

When I run this through CAP3, I get a contig of EV005953+, EV115686-, and ES981358+. However, I saw somewhere else that actually EV115773 should also align with the other sequences. To test this I gave just the relevant 4 sequences as the input:

>ES981358
GTTACTGTCCAGAGATTCGAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAGGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTTGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTGCCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCTTGACCTTTCAATGTGATTCAGCATCAAGAAACAATTGATATGAATAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGATTCTTCTGCTCCGTGCAGCTGTGATCTCACTTCGCAGTTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCATTAAGGCAAAAATGTTATATAACTTCATAAAAAGTTATAGGAGATGTTTTCTTCCTTTCATATAAACAAAACAAGATTGTGGAATCCATTTTAGTCCAA
>EV115773
ACTGNGGTCGCCTGCAGGTACCGGTCTCGGATTCCCGGGTCGACCCACGCGTCCGACCGTTGCCGCGCCTCGTTACTCTCTGGCCTCTCTGCAATCATCCATCACCATGTTCATGTTCATCGCTGCAGATTCTGCTCTGAGGTTCACATTGGTAAGGAAGGTCATGGAATAAGGACTTGCATTGGCCCCGGGAGCGGTTCAAGAAGTGCGACTCATGTATGGAAAAGAGGAAGAGCTAGCGACGTCGTATTGTTCCCAAAATGTTTCCACCTCTATGACCGTGCTGTCAAACCGCGAGTTATCCACGACGAGAGGTTCGCCGTCCCTAAGATCTCTGCGGTTCTTGAGCTATGCATACAGGCAGGCGTGGACATCGATAAGTTCCCAGCGAAGCGAAGATCCAAACCTGTTTACAGCATCGAAGGACGGATCGTAGATTTCGAGGAAGTCAACAACGGGAACTCAGAAACCGCAGTTACTAATTCTACTACTACGGCTACACCTTTACAAGAAGATTATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTGAGACGATGGACTCATGGTTTGAAATGGTCACAGGGGTCAAGAAACTGATGGAGAGATACAAAGTGTGGACATGTGGTTACTGTCCAGAGATTCAAGTGGGTCCAAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAACATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTTTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCNGTTGTGGAGTTGTGTGTGCAAGGTGGGTGCACCGTGCCTGACCAGTC
>EV115686
GGTATATGAAACCTTTGAACAAACTATTGGACTAAAATGGATTCCACAATCTTGTTTTGTTTATATGAAAGGAAGAAAACATCTCCTATAACTTTTTATGAAGTTATATAACATTTTTGCCTTAATGGTACAAAACCTCAACGGGAATGTTGAAATTGCATTAACCTCAAAGCTATTGAGAAGCTGCGAAACTGCGAAGTGAGATCACAGCTGCACGGAGCAGAAGAATCAAACATCTAACTATGTTTCCTAATATGTATATGGAGCTTTCGTTTATTCATATCAATTGTTTCTTGATGCTGAATCACATTGAAAGGTCAAGCGACTAGATCGACTTCATCACGTTGAGGGTAAACCACATCGAGTCTCATCATACTCTTGTACTGGTCAGGCACCGGTGCACCACCTTGCACACACAACTCCACAACCGCAGGAGCTTTCCCATAAAACCTCTGCAAGCTATTGTCAAGAGCAGAAGAATCTGGATCCCGAACATGCCAAACATAGTTTGGACCGACCACGTCGTCAATAGTCGCTTCTTGCCACGCGTGCATTCCGTCTCGCATCTGATGTTTTGTCGCTTTACACATCTTCACTTTGTGTCCCTTTGGACCCACTTGAATCTCTGGACAGTAACCACATGTCCACACTTTGTATCTCTCCATCAGTTTCTTGACCCCTGTGACCATTTCAAACCATGAGTCCATCGTCTCAACGCTTAACTCCTTCAAGCTCTTCTTCTCTGTACTGTAATGA
>EV005953
GAACTCAGAAACCGCAGTTACTAGTGTTAGCACGGCTACACCTTTACAAGAAGATGATCATTACAGTACAGAGAAGAAGAGCTTGAAGGAGTTAAGCGTTAAGACAATGGATTCATGGTTTGAAATGGTCACAGGGGTCAAGGAACTGATGGAGAAATACAAAGTGTGGACTTGTGGTTACTGTCCAGAGGTTCAAGTGGGTCCCAAGGGACACAAAGTGAAGATGTGTAAAGCGACAAAGCATCAGATGCGAGACGGAATGCACGCGTGGCAAGAAGCGACTATTGACGACGTGGTCGGTCCAAACTATGTGTGGCATGTTCGGGATCCAGATTCTTCTGCTCTTGACAATAGCTTGCAGAGGTTTTATGGGAAAGCTCCTGCGGTCGTGGAGTTGTGTGTGCAAGGTGGTGCACCGGTACCTGACCAGTACAAGAGTATGATGAGACTCGATGTGGTTTACCCTCAACGTGATGAAGTCGATCTAGTCGCGTGACCTTTCAAGGTGATTCAGCATCAAGAAACAATTGATATGAAAAAACGAAAGCTCCATATACATATTAGGAAACATAGTTAGATGTTTGGTTCTTCTGCTCCGTGCAGCTTCTCACTTCGCAGCTTCTCAATAGCTTTGAGGTTAATGCAATTTCAACATTCCCGTTGAGGTTTTGTACCA

Now, I get an alignment where all the 4 sequences align. This seems to be correct. So why did the first run not align EV115773 ut gave it as a singlet? How can we correctly run CAP3 to give consistent results? All of this can be checked using the online CAP3 page at http://pbil.univ-lyon1.fr/cap3.php or by downloading CAP3 from http://seq.cs.iastate.edu/cap3.html

est assembly • 3.0k views
ADD COMMENT
0
Entering edit mode

I think there might be some parameter tuning need to be done there. Maybe you could play with different combination of those parameters such as identity percentage and overhang length.

ADD REPLY

Login before adding your answer.

Traffic: 2921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6