rsem-simulate-reads: simulating different data
1
0
Entering edit mode
3.7 years ago

I'm using RSEM to simulate reads from previous experiments (rsem-simulate-reads) - I have no problems getting the basics working, but having difficulties adjusting simulation based params.

Here's an example:

  • Use RSEM to prepare reference (bowtie2 or STAR) rsem-prepare-reference
  • Calculate expression rsem-calculate-expression
  • Then simulate reads: rsem-simulate-reads with something similar to this:

rsem-simulate-reads Human_ref/ref expression/results.stat/results.model \

expression/results.isoforms.results 0.1 500000 simulation/sim_data

The original data is PE, 150bp, non-stranded, bulk RNA-seq

From what I can gather, it is possible to adjust the model params (https://github.com/deweylab/RSEM/blob/master/model_file_description.txt) to simulate new reads with different library options. For example, say I wanted 50bp PE reads, or 50bp SE stranded reads.

Is this possible using RSEM? I know this is possible using Polyester (https://bioconductor.org/packages/release/bioc/vignettes/polyester/inst/doc/polyester.html), which I'm using, but was hoping to use RSEM as well.

RNA-Seq simulate RSEM • 1.6k views
ADD COMMENT
0
Entering edit mode

hello, I’d like to simulate reads from the model file generated by rsem-calculate-expression function from a pair-end data. However, the rsem-simulate-reads always output one .fa file.

I replace the model file and the isoform_results file with other random files, but the simulate function still worked and one .fa file was generated.

Here is what I did:

rsem-calculate-expression --paired-end --alignments --no-bam-output -p 50 /path/to/mybam /path/to/reference /output/prefix

rsem-simulate-reads /path/to/reference /output/prefix/prefix.stat/prefix.model isoforms.results 0.1 50000000 simulation_out_put

Did I utilise the function wrongly? I’ll appreciate if you could give me some instructions. Thank you very much!

ADD REPLY
2
Entering edit mode
3.7 years ago
Rob 6.5k

The parameters in the RSEM model file are learned by the quantifier on an experimental data set. Since parameters will differ (possibly drastically) between a 150bp PE dataset and a 50bp SE dataset, its not possible to directly use one model's parameters to simulate a dataset with different characteristics. You could always spoof the parameters, or try to set them according to a parametric model, the resulting data may then not be a good match for a specific experimental dataset.

ADD COMMENT
0
Entering edit mode

Cool - that's what I had kinda thought, thanks for replying Rob

ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6