Question

Rnaseq Simulation: What To Set Number Of Fragments To When Using `Rlsim`

1

Entering edit mode

11.2 years ago

johnstantongeddes ▴ 410

I'm using rlsim followed by simNGS as suggested in the rlsim examples to generate simulated paired-end RNAseq data to evaluate de novo transcriptome assembly. However, I'm confused about the number of requested fragments parameter (-n). Can someone explain this in more detail, or give recommendations to what this should be set to? [I've skimmed the manual, but honestly it goes into more detail about fragmentation methods that I need]

My naive thought is that this is simply based on sequence length. For example, if my test input sequences are an average of 800 bp, I'd want my number of fragments to = 4 * number of sequences. Thoughts?

transcriptome • 2.3k views

ADD COMMENT • link updated 11.2 years ago by Botond Sipos ★ 1.7k • written 11.2 years ago by johnstantongeddes ▴ 410

0

Entering edit mode

I don't know anything about RNA simulation, but I think that you would want to find the total number of RNA fragments that are present in a normal cell. There might be a few sets of RNA-seq in the SRA database that you can base this estimate on.

ADD REPLY • link 11.2 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

That might help, but I'm not sure knowing how many fragments there are in a cell gets at what I want. I'm only simulating RNAseq data for 10-100 transcripts, not all the transcripts in a cell.

ADD REPLY • link 11.2 years ago by johnstantongeddes ▴ 410

score 3 · Answer 1 · 2013-08-13

3

Entering edit mode

11.2 years ago

Botond Sipos ★ 1.7k

The number of fragments is essentially the number of read pairs simulated. So you should set this to match the number of read pairs in real runs/datasets.

The number of fragmented transcript molecules ("expression levels") is specified in the input Fasta file, you might have to increase them (or use the "expression level multiplier" -m flag) in order to avoid missing fragments.