Question

Command-line alternative to Geneious assemble for Sanger sequencing data

0

Entering edit mode

4.2 years ago

bio23 • 0

I am doing Sanger sequencing of a construct ~2Kb using 4 primer pairs. I get back 4 .ab1 files, each with generally around 1Kb of high quality sequence and given the relatively small size of the construct these overlap significantly.

The goal is to assemble these 4 sequences into a single contig, in .fastq format (therefore retaining the per-base quality scores), and then downstream I will align this back to the reference construct using bwa mem.

I am trying to automate this procedure for hundreds of sequenced constructs. Previously this has been done manually in Geneious, using Geneious assemble (de novo assembly). The problem is, is that it is not possible to run Geneious assemble from the command line, and other tools I have used (cap3, tracy, tadpole) either fail to generate a full length contig (whereas Geneious succeeds), and / or do not output per-base quality scores (.fastq)

I would have thought that it would be a piece of cake to find an open-source tool that can match Geneious assemble, but this is not the case!

Can someone recommend a tool that I could try, or how I can optimise a tool to equal Geneious assemble?

Any suggestions appreciated!

velvet cap3 geneious Sanger • 3.1k views

ADD COMMENT • link updated 3.9 years ago by cfos4698 ★ 1.2k • written 4.2 years ago by bio23 • 0

0

Entering edit mode

Not answering your question but thinking aloud. Unless your input data strictly conforms to a pattern it may be difficult to find a tool that does something like this without manual intervention perfectly. With sanger sequences, ends of the reads are going to be variable as the quality degrades so this is not a simple problem. Since you have very specific requirements (e.g. need to create fastq files) it may be a tall ask to find a command line tool. tracy would have been my recommendation for a recent tool but you seem to have tried it already.

ADD REPLY • link 4.2 years ago by GenoMax 154k

0

Entering edit mode

re: the read ends, I would have thought that such tools would be able to generate a consensus based on highest base quality / consensus between multiple sequences. I will continue playing around with tracy - thanks for the thoughts / info!

ADD REPLY • link 4.2 years ago by bio23 • 0

score 1 · Answer 1 · 2021-12-15

1

Entering edit mode

3.9 years ago

trausch ★ 2.0k

I try to maintain tracy as much as time permits. Thus, if you have specific feature requests or run into potential bugs then please open a tracy github issue.

ADD COMMENT • link 3.9 years ago by trausch ★ 2.0k

score 0 · Answer 2 · 2021-12-15

0

Entering edit mode

3.9 years ago

cfos4698 ★ 1.2k

I haven't used it myself, so I don't know whether it does everything you want, but perhaps try sangeranalyseR? https://sangeranalyser.readthedocs.io/en/latest/ https://github.com/roblanf/sangeranalyseR

ADD COMMENT • link 3.9 years ago by cfos4698 ★ 1.2k