What program do I use to generate a single contig given a set of short contigs and the template?
3
0
Entering edit mode
8.8 years ago
lwc628 ▴ 230

Let's say I have a set of short contigs(5 sequences) that I want to assemble into one contig.

I do not want to use multiple sequence alignment, but rather use a template sequence that will be used as a sort of hint.

If I align those 5 sequences against the template, I get a following coordinate for each sequence

blastn -task 'megablast' -query evidence.fa -subject template.fa -outfmt 6 | cut -f1,9,10
piece1  1       234
piece2  56      704
piece3  220     592
piece4  385     704
piece5  510     704

How do I generate a single contig from this? In other words, I want to stitch the contigs by borrowing the information from the template sequence. Is there software for this?

Here are the sequences

>template
CTGACCTCCTGGGAGAGACTATGGATATGATGCTGGGAGAGCCCTGTCAGAGGTAACTATTTGGACATCGGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGGGTAAACTGGACGTTTAGGATTTAGGTGGAAGAACATTTgggatttattaatttattttatataagaaatatacatgtagcatttactatgttcccagttttatttctaagcatttcaAAGATATTAATTCAGTTGAgctaaatgaaagaatactataaaaatatgataagatCAAAATAATCGTAGTCCCAGCAAGCTACTGCTGCTTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAATGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAAAGCCCAGACTCTCTCAACAACAATCAAGAGgcatgaaaggaaagagaaatggctTTCTCAGATGCCCATCACTGCCATTTGAAAATGTGTCAGTGCATCCTCTAAAATTAACTTCTGTAGAAGCAGAGAAACAAGGCACAGACCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCGCTGATGAGAAATACCTCTTGTCTGTCTGATTCACCCCCTTCTTGGAGAACCTCCCATTAC
>piece1
CTGACCTCCTGGGAGGGACTATATGGATATGATGCTGGGAGAGCCCTGTCAAGGGACTATTTGGACATCTGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGCGTAAGCTGGGTGTTTAGGATTTAGGTGGAAGAACATTTGGGATTTATTAAATTATTTTATATCAAAAATATACATGTAGCATTTACTATGTTCCCAGTTTTAATTCTAAG
>piece2
AATCAACCAGCATTAATACAGGACTATTTGGACATCTGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGCGTAAGCTGGGTGTTTAGGATTTAGGTGGAAGAACATTTGGGATTTATTAAATTATTTTCTATCAAAAATACACATGTAGCATTTACTATGTTCCCAGTTTTAATTCTAAGCATTTCAAAGATATTAGTTCACTTGGGCTAAATGAAAGAATACTATAAAAGTATGATTGGATCAAAATAATCGTGGTCCCAGCAAGCTTGTGCTGCCTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAGTGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCATCACTGCCATTTGAAAATGTGTCTGTGCATCCTCTAAAATTAACTCCTGTAGAAGCAGAGAAGCAAGGCACAGGCATTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATTCCCTCTTGTCTGTCTGATTCACCCCTTTCTTGAAGAACCTCTCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGATCCAATCAAGTTCAATTAATCAGTTTCCCTCTTATGGGAATCTGGAATAAAGACATCTTG
>piece3
AGTTTTAATTCTAAGCATTTCAAAGATATTAGTTCACTTGGGCTAAATGAAAGAATACTATAAAAGTATGATTGGATCAAAATAATCGTGGTCCCAGCAAGCTTGTGCTGCCTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAGTGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCGTCACTGCCATTTGAAAATGTGTCCGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACC
>piece4
TGGGGTGTTAAAGAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCGTCACTGCCATTTGAAAATGTGTCCGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATTCCCTCTTGTCTGTCTGATTCACCCCTTTCTTGAAGAACCTCTCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGATCCAATCAAGTTCAATTAATCAGTTTCCCTCTTATGGGAATCTGGAATAAAGACATCTTG
>piece5
CGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATACCTTCTTGTCTGATTCACCTCTTTCTTGAAGAACCTCCCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGACCCAATCAAGTTCAATTAATC
Assembly sequence alignment • 4.5k views
ADD COMMENT
1
Entering edit mode
8.8 years ago

If the sequences are error-free (meaning they all agree with each other), you could do this with a kmer-based assembler like Tadpole (part of the BBMap package) by setting the minimum depth to 1, or concatenating the sequences file to itself 20 times so that it will work with the defaults of most assemblers. However, judging by the presence of lowercase letters, perhaps these are not error-free. So, a string-graph-based assembler makes more sense; Falcon and Omega are two such programs. None of these options will use the template, but since the sequences overlap by quite a large amount and the consensus is so short, that shouldn't be a problem.

There are programs for reference-guided assembly, but I've never used one. Possibly someone else will recommend one.

Also... Mothur can generate consensus via multiple sequence alignment. I'm not really sure why you want to avoid that approach; theoretically, it should work fine. Worth a try, at least.

ADD COMMENT
0
Entering edit mode

The reason I am trying to avoid multiple alignment is because of this possible scenario. Suppose I have two contigs, and one contig shares the last 10% of its sequence with the first 10% of another contig. Multiple alignment would fail to stitch the two contigs....

ADD REPLY
1
Entering edit mode
8.8 years ago
h.mon 35k

Answering specifically what you want, MIRA has a mapped assembly mode, and you could pass your template as the reference. However, looking at your sequences, it seems similarity among them (and between them and the template) is 95%, so maybe you would need some parameter tweaking to assemble them - which makes me agree with Brian Bushnell, it seems the best approach here is the one you trying to avoid.

P. S.: did you notice your template is shorter than piece2? How did you choose the template?

ADD COMMENT
0
Entering edit mode
8.8 years ago

Try this old assembler that is easy to use and do a nice work with sequences like yours. It uses CAP3. It will allow you to create contigs and singletons from your sequence, and even get some simple control in the way to do it

EGAssembler

ADD COMMENT

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6