Question

Simulate Sff File

5

Entering edit mode

13.4 years ago

Lee Katz ★ 3.1k

Hi, I am giving a workshop of genome assembly and I would like to have the students try genome assembly for themselves. However it will not be feasible to have tens of students performing assembly on a genome on the order of megabases. This is because it will likely be on either one server or on desktop computers, and there will be a time constraint. Is there a way to simulate an SFF for something smaller like a plasmid? Or simulate an SFF based on a neighborhood of a few operons? Thank you.

assembly simulation • 3.7k views

ADD COMMENT • link updated 13.4 years ago by Bach ▴ 550 • written 13.4 years ago by Lee Katz ★ 3.1k

Ram · Answer 1 · 2010-12-04

3

Entering edit mode

13.4 years ago

Istvan Albert 100k

Rather than simulating an SFF (assuming you mean the 454's Standard Flowgram Format) you might be better off simulating sequences. On that topic there were some answers here: how-to-produce-simulated-synthetic-sequences

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Istvan Albert 100k

0

Entering edit mode

I found a link to a link on that BioStar page, thank you. It shows how to simulate a genome. http://sourceforge.net/apps/mediawiki/dnaa/index.php?title=Whole_Genome_Simulation

ADD REPLY • link 13.4 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

Installation required many packages which were not listed in the documentation. After I installed everything, it gave a slew of errors in C, which I cannot debug. I'm not sure if this is the way to go.

ADD REPLY • link 13.4 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

MetaSim works.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Lee Katz ★ 3.1k

Ram · Answer 2 · 2010-12-06

3

Entering edit mode

13.4 years ago

lexnederbragt ★ 1.3k

Have you tried google? You will find at least this one:

Flowsim, http://blog.malde.org/index.php/flowsim/, paper here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935434/

(http://google.com/search?q=454+sff+simulation)

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by lexnederbragt ★ 1.3k

score 2 · Answer 3 · 2010-12-04

Maybe you could use true data from traces archives, like SRA database (let's say a virus, like this one)? You can download fastq files (not sffs) but as far as I know Newbler can read fasta files with or without quality information (although it's possible that you would need to rescale quality scores in the first case).

score 2 · Answer 4 · 2010-12-06

The new NCBI SRA format allows you to download their SRA archives and convert it to any of the more widely vendor formats used (SFF, FASTQ, Illumina) via their SRA Toolkit, see http://www.ncbi.nlm.nih.gov/books/NBK49294/ for download and manual.

So, search for "virus" or "plasmid" in the SRA (perhaps something like http://www.ncbi.nlm.nih.gov/sra/SRX025865?report=full), download the corresponding SRA, convert it to SFF and you're done.

Note 1: the 1.0b10 toolkit has one "error" admonished by current gcc which is quickly fixed. Note 2: using plasmid or virus libraries as example for assembly may be counter productive as these things tend to be really nasty as most of the time it's not one clean DNA which was sequenced but a mixture and that can confuse assemblers quite a lot.