Rnaseq Simulation: What To Set Number Of Fragments To When Using `Rlsim`
1
1
Entering edit mode
11.2 years ago

I'm using rlsim followed by simNGS as suggested in the rlsim examples to generate simulated paired-end RNAseq data to evaluate de novo transcriptome assembly. However, I'm confused about the number of requested fragments parameter (-n). Can someone explain this in more detail, or give recommendations to what this should be set to? [I've skimmed the manual, but honestly it goes into more detail about fragmentation methods that I need]

My naive thought is that this is simply based on sequence length. For example, if my test input sequences are an average of 800 bp, I'd want my number of fragments to = 4 * number of sequences. Thoughts?

transcriptome • 2.3k views
ADD COMMENT
0
Entering edit mode

I don't know anything about RNA simulation, but I think that you would want to find the total number of RNA fragments that are present in a normal cell. There might be a few sets of RNA-seq in the SRA database that you can base this estimate on.

ADD REPLY
0
Entering edit mode

That might help, but I'm not sure knowing how many fragments there are in a cell gets at what I want. I'm only simulating RNAseq data for 10-100 transcripts, not all the transcripts in a cell.

ADD REPLY
3
Entering edit mode
11.2 years ago
Botond Sipos ★ 1.7k

The number of fragments is essentially the number of read pairs simulated. So you should set this to match the number of read pairs in real runs/datasets.

The number of fragmented transcript molecules ("expression levels") is specified in the input Fasta file, you might have to increase them (or use the "expression level multiplier" -m flag) in order to avoid missing fragments.

ADD COMMENT
0
Entering edit mode

Thanks - I was starting to suspect so but now it's clear!

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6