Dear all,

I got some 16s (forward and reverse) sanger sequencing data (from ABI 3730xl DNA) from our on campus sequencing facility. They came in abi format but the quality scores have not been applied. When I asked for quality scores, the staff at the sequencing facility used KB Basecaller but also warned me that those scores are inflated. The staff recommended Staden, which gives quality score close enough to true Phred score, but it is not command-line based, therefore not good for batch processing. Phred is command-line based but it is not free. It seems like the Phred score calculation is patented (https://www.google.ch/patents/US6681186) and maybe that's why it is hard to find a tool?

Ideally I would like to have the quality score applied to those trace files so I can start the trimming and merging process. I've only worked with next-generation sequencing data and was always given fastq files. Therefore I was also wondering whether it is normal to be provided with trace files without quality score applied.

Phred is command-line based but it is not free.

Only if you are a commercial user.

Out of curiosity how many files do you have and is a command line tool must? You may already have access to one of the several commercial programs that can handle .ab1 files (e.g. DNASTAR, Vector NTI, Sequencher etc) via your institution.

Thank you h.mon! Yes it's free for academic use. I should have read the Phred page more carefully (http://www.phrap.org/consed/consed.html#howToGet). Is there anything I could do to minimize future misunderstanding, like editing, or deleting my original post?

I don't have a lot samples, 50-ish, but I prefer not doing things manually.

You can use Staden at the command-line, with the -nowin flag. Scavenging my old stuff I found this:

pregap4 -nowin -config ~/seqs/pregap.conf -fofn $NAME".files" >$NAME".output"

If my memory is correct, I used pregap once with the Tk GUI, configured as needed and saved the conf. After that, I could run with the above command-line, editing the conf file as needed.

Keep in mind that, while Staden base-calling is reasonable, Phred beats it by a somewhat large margin - and Phred is free for academic use.

edit: besides, I do not think the claim KB Basecaller produces inflated scores is correct, either from my experience or from the literature: A direct comparison of the KB™ Basecaller and phred for identifying the bases from DNA sequencing using chain termination chemistry.

