Command-line alternative to Geneious assemble for Sanger sequencing data
0
0
Entering edit mode
7 days ago
bio23 • 0

I am doing Sanger sequencing of a construct ~2Kb using 4 primer pairs. I get back 4 .ab1 files, each with generally around 1Kb of high quality sequence and given the relatively small size of the construct these overlap significantly.

The goal is to assemble these 4 sequences into a single contig, in .fastq format (therefore retaining the per-base quality scores), and then downstream I will align this back to the reference construct using bwa mem.

I am trying to automate this procedure for hundreds of sequenced constructs. Previously this has been done manually in Geneious, using Geneious assemble (de novo assembly). The problem is, is that it is not possible to run Geneious assemble from the command line, and other tools I have used (cap3, tracy, tadpole) either fail to generate a full length contig (whereas Geneious succeeds), and / or do not output per-base quality scores (.fastq)

I would have thought that it would be a piece of cake to find an open-source tool that can match Geneious assemble, but this is not the case!

Can someone recommend a tool that I could try, or how I can optimise a tool to equal Geneious assemble?

Any suggestions appreciated!

velvet cap3 geneious Sanger • 114 views
0
Entering edit mode

Not answering your question but thinking aloud. Unless your input data strictly conforms to a pattern it may be difficult to find a tool that does something like this without manual intervention perfectly. With sanger sequences, ends of the reads are going to be variable as the quality degrades so this is not a simple problem. Since you have very specific requirements (e.g. need to create fastq files) it may be a tall ask to find a command line tool. tracy would have been my recommendation for a recent tool but you seem to have tried it already.

0
Entering edit mode

re: the read ends, I would have thought that such tools would be able to generate a consensus based on highest base quality / consensus between multiple sequences. I will continue playing around with tracy - thanks for the thoughts / info!