Question: What program do I use to generate a single contig given a set of short contigs and the template?
0
gravatar for lwc628
5.2 years ago by
lwc628220
United States
lwc628220 wrote:

Let's say I have a set of short contigs(5 sequences) that I want to assemble into one contig. 

I do not want to use multiple sequence alignment, but rather use a template sequence that will be used as a sort of hint.

If I align those 5 sequences against the template, I get a following coordinate for each sequence

blastn -task 'megablast' -query evidence.fa -subject template.fa -outfmt 6 | cut -f1,9,10
piece1  1       234
piece2  56      704
piece3  220     592
piece4  385     704
piece5  510     704

How do I generate a single contig from this? In other words, I want to stitch the contigs by borrowing the information from the template sequence. Is there sofrtware for this?

Here are the sequences

>template
CTGACCTCCTGGGAGAGACTATGGATATGATGCTGGGAGAGCCCTGTCAGAGGTAACTATTTGGACATCGGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGGGTAAACTGGACGTTTAGGATTTAGGTGGAAGAACATTTgggatttattaatttattttatataagaaatatacatgtagcatttactatgttcccagttttatttctaagcatttcaAAGATATTAATTCAGTTGAgctaaatgaaagaatactataaaaatatgataagatCAAAATAATCGTAGTCCCAGCAAGCTACTGCTGCTTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAATGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAAAGCCCAGACTCTCTCAACAACAATCAAGAGgcatgaaaggaaagagaaatggctTTCTCAGATGCCCATCACTGCCATTTGAAAATGTGTCAGTGCATCCTCTAAAATTAACTTCTGTAGAAGCAGAGAAACAAGGCACAGACCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCGCTGATGAGAAATACCTCTTGTCTGTCTGATTCACCCCCTTCTTGGAGAACCTCCCATTAC

>piece1
CTGACCTCCTGGGAGGGACTATATGGATATGATGCTGGGAGAGCCCTGTCAAGGGACTATTTGGACATCTGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGCGTAAGCTGGGTGTTTAGGATTTAGGTGGAAGAACATTTGGGATTTATTAAATTATTTTATATCAAAAATATACATGTAGCATTTACTATGTTCCCAGTTTTAATTCTAAG
>piece2
AATCAACCAGCATTAATACAGGACTATTTGGACATCTGGGAAAAGGGAGACCTCAGGCCTCCATTCAATTTGTATTCAGACTGCTTTCTGCGTAAGCTGGGTGTTTAGGATTTAGGTGGAAGAACATTTGGGATTTATTAAATTATTTTCTATCAAAAATACACATGTAGCATTTACTATGTTCCCAGTTTTAATTCTAAGCATTTCAAAGATATTAGTTCACTTGGGCTAAATGAAAGAATACTATAAAAGTATGATTGGATCAAAATAATCGTGGTCCCAGCAAGCTTGTGCTGCCTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAGTGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCATCACTGCCATTTGAAAATGTGTCTGTGCATCCTCTAAAATTAACTCCTGTAGAAGCAGAGAAGCAAGGCACAGGCATTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATTCCCTCTTGTCTGTCTGATTCACCCCTTTCTTGAAGAACCTCTCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGATCCAATCAAGTTCAATTAATCAGTTTCCCTCTTATGGGAATCTGGAATAAAGACATCTTG
>piece3
AGTTTTAATTCTAAGCATTTCAAAGATATTAGTTCACTTGGGCTAAATGAAAGAATACTATAAAAGTATGATTGGATCAAAATAATCGTGGTCCCAGCAAGCTTGTGCTGCCTGCTCAAGTTTGTGAGCCCTGCAGCTGTTCTGTCTGCTGAAGGGAAGTGAGCCAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCGTCACTGCCATTTGAAAATGTGTCCGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACC
>piece4
TGGGGTGTTAAAGAACAGCCATGGGGGTGTTAAAGATGTAGCACAGAGCCCAGACACTCTCAACGACAATCAAGAGGCAGGAAAGGAAAGAGAAACAGCTTTCTCAGATGCCCGTCACTGCCATTTGAAAATGTGTCCGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATTCCCTCTTGTCTGTCTGATTCACCCCTTTCTTGAAGAACCTCTCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGATCCAATCAAGTTCAATTAATCAGTTTCCCTCTTATGGGAATCTGGAATAAAGACATCTTG
>piece5
CGTGTATCCTCTAAAATTAATTCCTGTAGAAGAAGAGAAAGAAGGCACAGGCCTTGGGAAATAAGGGACAACTATGCAGGTACCAGGGGCCAAAGAAGCTTTTTAGCTGAGATTGTGTTTCTTAGAACCAGTAGCACTGATGACAAATACCTTCTTGTCTGATTCACCTCTTTCTTGAAGAACCTCCCATTACCCACAGGTCCTGCTAACTGGGCTGTGGGGACATTTGACCCAATCAAGTTCAATTAATC

 

 

 

alignment sequence assembly • 3.6k views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 5.2 years ago by lwc628220
1
gravatar for Brian Bushnell
5.2 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

If the sequences are error-free (meaning they all agree with each other), you could do this with a kmer-based assembler like Tadpole (part of the BBMap package) by setting the minimum depth to 1, or concatenating the sequences file to itself 20 times so that it will work with the defaults of most assemblers.   However, judging by the presence of lowercase letters, perhaps these are not error-free.  So, a string-graph-based assembler makes more sense; Falcon and Omega are two such programs.  None of these options will use the template, but since the sequences overlap by quite a large amount and the consensus is so short, that shouldn't be a problem.

There are programs for reference-guided assembly, but I've never used one.  Possibly someone else will recommend one.

Also...  Mothur can generate consensus via multiple sequence alignment.  I'm not really sure why you want to avoid that approach; theoretically, it should work fine.  Worth a try, at least.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Brian Bushnell17k

The reason I am trying to avoid multiple alignment is because of this possible scenario. Suppose I have two contigs, and one contig shares the last 10% of its sequence with the first 10% of another contig. Multiple alignment would fail to stitch the two contigs....

ADD REPLYlink written 5.2 years ago by lwc628220
1
gravatar for h.mon
5.2 years ago by
h.mon31k
Brazil
h.mon31k wrote:

Answering specifically what you want, MIRA has a mapped assembly mode, and you could pass your template as the reference. However, looking at your sequences, it seems similarity among them (and between them and the template) is 95%, so maybe you would need some parameter tweaking to assemble them - which makes me agree with Brian Bushnell, it seems the best approach here is the one you trying to avoid.

P. S.: did you notice your template is shorter than piece2? How did you choose the template?

ADD COMMENTlink written 5.2 years ago by h.mon31k
0
gravatar for Antonio R. Franco
5.2 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.5k wrote:

Try this old assembler that is easy to use and do a nice work with sequences like yours. It uses CAP3. It will allow you to create contigs and singletons from your sequence, and even get some simple control in the way to do it

EGAssembler

ADD COMMENTlink written 5.2 years ago by Antonio R. Franco4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour