I'm using rlsim
followed by simNGS
as suggested in the rlsim examples to generate simulated paired-end RNAseq data to evaluate de novo transcriptome assembly. However, I'm confused about the number of requested fragments parameter (-n
). Can someone explain this in more detail, or give recommendations to what this should be set to? [I've skimmed the manual, but honestly it goes into more detail about fragmentation methods that I need]
My naive thought is that this is simply based on sequence length. For example, if my test input sequences are an average of 800 bp, I'd want my number of fragments to = 4 * number of sequences. Thoughts?
I don't know anything about RNA simulation, but I think that you would want to find the total number of RNA fragments that are present in a normal cell. There might be a few sets of RNA-seq in the SRA database that you can base this estimate on.
That might help, but I'm not sure knowing how many fragments there are in a cell gets at what I want. I'm only simulating RNAseq data for 10-100 transcripts, not all the transcripts in a cell.