Question

rsem-simulate-reads: simulating different data

0

Entering edit mode

3.7 years ago

reganhayward • 0

I'm using RSEM to simulate reads from previous experiments (rsem-simulate-reads) - I have no problems getting the basics working, but having difficulties adjusting simulation based params.

Here's an example:

Use RSEM to prepare reference (bowtie2 or STAR) rsem-prepare-reference
Calculate expression rsem-calculate-expression
Then simulate reads: rsem-simulate-reads with something similar to this:

rsem-simulate-reads Human_ref/ref expression/results.stat/results.model \

expression/results.isoforms.results 0.1 500000 simulation/sim_data

The original data is PE, 150bp, non-stranded, bulk RNA-seq

From what I can gather, it is possible to adjust the model params (https://github.com/deweylab/RSEM/blob/master/model_file_description.txt) to simulate new reads with different library options. For example, say I wanted 50bp PE reads, or 50bp SE stranded reads.

Is this possible using RSEM? I know this is possible using Polyester (https://bioconductor.org/packages/release/bioc/vignettes/polyester/inst/doc/polyester.html), which I'm using, but was hoping to use RSEM as well.

RNA-Seq simulate RSEM • 1.6k views

ADD COMMENT • link updated 3.6 years ago by jiangminghao1001 • 0 • written 3.7 years ago by reganhayward • 0

0

Entering edit mode

hello, I’d like to simulate reads from the model file generated by rsem-calculate-expression function from a pair-end data. However, the rsem-simulate-reads always output one .fa file.

I replace the model file and the isoform_results file with other random files, but the simulate function still worked and one .fa file was generated.

Here is what I did:

rsem-calculate-expression --paired-end --alignments --no-bam-output -p 50 /path/to/mybam /path/to/reference /output/prefix

rsem-simulate-reads /path/to/reference /output/prefix/prefix.stat/prefix.model isoforms.results 0.1 50000000 simulation_out_put

Did I utilise the function wrongly? I’ll appreciate if you could give me some instructions. Thank you very much!

ADD REPLY • link 3.6 years ago by jiangminghao1001 • 0

score 2 · Accepted Answer · 2020-08-25

2

Entering edit mode

3.7 years ago

Rob 6.5k

The parameters in the RSEM model file are learned by the quantifier on an experimental data set. Since parameters will differ (possibly drastically) between a 150bp PE dataset and a 50bp SE dataset, its not possible to directly use one model's parameters to simulate a dataset with different characteristics. You could always spoof the parameters, or try to set them according to a parametric model, the resulting data may then not be a good match for a specific experimental dataset.