1
0
Entering edit mode
13 months ago

I'm using RSEM to simulate reads from previous experiments (rsem-simulate-reads) - I have no problems getting the basics working, but having difficulties adjusting simulation based params.

Here's an example:

• Use RSEM to prepare reference (bowtie2 or STAR) rsem-prepare-reference
• Calculate expression rsem-calculate-expression
• Then simulate reads: rsem-simulate-reads with something similar to this:

rsem-simulate-reads Human_ref/ref expression/results.stat/results.model \

expression/results.isoforms.results 0.1 500000 simulation/sim_data

The original data is PE, 150bp, non-stranded, bulk RNA-seq

From what I can gather, it is possible to adjust the model params (https://github.com/deweylab/RSEM/blob/master/model_file_description.txt) to simulate new reads with different library options. For example, say I wanted 50bp PE reads, or 50bp SE stranded reads.

Is this possible using RSEM? I know this is possible using Polyester (https://bioconductor.org/packages/release/bioc/vignettes/polyester/inst/doc/polyester.html), which I'm using, but was hoping to use RSEM as well.

RNA-Seq simulate RSEM • 574 views
0
Entering edit mode

hello, I’d like to simulate reads from the model file generated by rsem-calculate-expression function from a pair-end data. However, the rsem-simulate-reads always output one .fa file.

I replace the model file and the isoform_results file with other random files, but the simulate function still worked and one .fa file was generated.

Here is what I did:

rsem-calculate-expression --paired-end --alignments --no-bam-output -p 50 /path/to/mybam /path/to/reference /output/prefix

rsem-simulate-reads /path/to/reference /output/prefix/prefix.stat/prefix.model isoforms.results 0.1 50000000 simulation_out_put


Did I utilise the function wrongly? I’ll appreciate if you could give me some instructions. Thank you very much!

2
Entering edit mode
13 months ago
Rob 4.9k

The parameters in the RSEM model file are learned by the quantifier on an experimental data set. Since parameters will differ (possibly drastically) between a 150bp PE dataset and a 50bp SE dataset, its not possible to directly use one model's parameters to simulate a dataset with different characteristics. You could always spoof the parameters, or try to set them according to a parametric model, the resulting data may then not be a good match for a specific experimental dataset.

0
Entering edit mode

Cool - that's what I had kinda thought, thanks for replying Rob