How to pick simulation parameters for the RNASeqReadSimulator
6.7 years ago
iraun ★ 3.8k

I am using the tool RNASeqReadSimulator (http://alumni.cs.ucr.edu/~liw/rnaseqreadsimulator.html), in order to simulate RNA-seq reads. In the fist script (there are 3), it is possible to specify expression values to the genes.

python genexplvprofile.py -h
-e/--lognormal  mu,sigma        Specify the mean and variance of the lognormal distribution used to assign expression levels.  Default -4,4

I'm not very good at statistics and I would like to know, which parameters of mu and sigma should be OK if I want to have all the genes expressed in my simulated data.

When in doubt the defaults are a good start

Yes, sure. But I was thinking that if I put a HIGH mean value and LOW variance, maybe can I simulate to have all the genes expressed? But I don't know which number is "high" or "low".. and maybe it depends on the number of genes...

6.7 years ago

Since it says there that the distribution of gene expression level will be generated on a lognormal distribution the best way to evaluate these parameters is to look at the shape of the curve for various parameters:

http://en.wikipedia.org/wiki/Log-normal_distribution

In those plots imagine that the horizontal axis corresponds to the gene expression levels whereas the vertical axis corresponds to the fraction of genes that express at a given level. You can use an online calculator to display what the distributions would look like

http://distributome.org/V3/calc/LogNormalCalculator.html