Question: GemSim Grinder and MetaSim to simulate 16s rRNA libraries
0
gravatar for sym24
4.1 years ago by
sym240
Canada
sym240 wrote:

Hello everyone,

What we are trying to achieve is to detect the existence of interested microbial species in patients' sequence sample. In order to evaluate specificity and sensitivity of our method,  I am trying to simulate a set of testing data.

I have done some search and targeted at three software: GemSIM Grinder MetaSim

We have reference 16S ITS rRNA sequence data as well as some clinical sample.

The simulation data set ideally should contain various abundance of specified species. The simulated dataset should have MiSeq error rate, which I definitely can't find anywhere.

So far I have tested Grinder. Grinder can simulate PCR amplicon sequence when user provide amplicon sequence in fasta format.  I do not know where to obtain amplicon sequence.

I am also running GemSim, which seems extremely time consuming when generating large dataset (160000 reads for paired end).

I wonder does any one here have experience with similar project? What software have you used? It is really hard to find similar discussions on-line. I thought it is time to start my own.

Thank you ahead for all the inputs.

 

ADD COMMENTlink modified 4.1 years ago by Josh Herr5.6k • written 4.1 years ago by sym240
0
gravatar for Josh Herr
4.1 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

I'll comment as no one has responded yet -- I've simulated reads to test a project and agree that there are not a lot of resources on best practices out there.  There are a fair amount of tools (you mentioned GemSIM Grinder MetaSim, but there's even more out there...).  Some take longer than others to run.  

When I was running amplicon benchmarks I ended up using wgsim - just worked for what I wanted to do.  As with any tool in general, you'll have to read the fine print to see what they can do and can't do -- and you'll have to choose the best tool for what you want.

Lastly for all the tools, you will need to provide a reference dataset or sequences as input -- you mention in your question that you have "16S ITS rRNA sequence data as well as some clinical sample" but you "do not know where to obtain amplicon sequence" -- you say you have sequence data -- you can use that.  Also there are many many 16S data sets available as public data sets -- just search SRA or MG-RAST, for just two examples.

 

ADD COMMENTlink written 4.1 years ago by Josh Herr5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour