Hi, I am giving a workshop of genome assembly and I would like to have the students try genome assembly for themselves. However it will not be feasible to have tens of students performing assembly on a genome on the order of megabases. This is because it will likely be on either one server or on desktop computers, and there will be a time constraint. Is there a way to simulate an SFF for something smaller like a plasmid? Or simulate an SFF based on a neighborhood of a few operons? Thank you.
Maybe you could use true data from traces archives, like SRA database (let's say a virus, like this one)? You can download fastq files (not sffs) but as far as I know Newbler can read fasta files with or without quality information (although it's possible that you would need to rescale quality scores in the first case).
The new NCBI SRA format allows you to download their SRA archives and convert it to any of the more widely vendor formats used (SFF, FASTQ, Illumina) via their SRA Toolkit, see http://www.ncbi.nlm.nih.gov/books/NBK49294/ for download and manual.
Note 1: the 1.0b10 toolkit has one "error" admonished by current gcc which is quickly fixed.
Note 2: using plasmid or virus libraries as example for assembly may be counter productive as these things tend to be really nasty as most of the time it's not one clean DNA which was sequenced but a mixture and that can confuse assemblers quite a lot.