GemSim Grinder and MetaSim to simulate 16s rRNA libraries
Entering edit mode
8.8 years ago
sym24 • 0

Hello everyone,

What we are trying to achieve is to detect the existence of interested microbial species in patients' sequence sample. In order to evaluate specificity and sensitivity of our method, I am trying to simulate a set of testing data.

I have done some search and targeted at three software: GemSIM Grinder MetaSim

We have reference 16S ITS rRNA sequence data as well as some clinical sample.

The simulation data set ideally should contain various abundance of specified species. The simulated dataset should have MiSeq error rate, which I definitely can't find anywhere.

So far I have tested Grinder. Grinder can simulate PCR amplicon sequence when user provide amplicon sequence in fasta format. I do not know where to obtain amplicon sequence.

I am also running GemSim, which seems extremely time consuming when generating large dataset (160000 reads for paired end).

I wonder does any one here have experience with similar project? What software have you used? It is really hard to find similar discussions on-line. I thought it is time to start my own.

Thank you ahead for all the inputs.

metagenomics gene simulation rRNA next-gen • 3.7k views
Entering edit mode
8.7 years ago
Josh Herr 5.8k

I'll comment as no one has responded yet -- I've simulated reads to test a project and agree that there are not a lot of resources on best practices out there. There are a fair amount of tools (you mentioned GemSIM, Grinder, MetaSim, but there's even more out there...). Some take longer than others to run.

When I was running amplicon benchmarks I ended up using wgsim - just worked for what I wanted to do. As with any tool in general, you'll have to read the fine print to see what they can do and can't do -- and you'll have to choose the best tool for what you want.

Lastly for all the tools, you will need to provide a reference dataset or sequences as input -- you mention in your question that you have "16S ITS rRNA sequence data as well as some clinical sample" but you "do not know where to obtain amplicon sequence" -- you say you have sequence data -- you can use that. Also there are many many 16S data sets available as public data sets -- just search SRA or MG-RAST, for just two examples.


Login before adding your answer.

Traffic: 966 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6