I'm attempting to simulate reads from the human transcriptome to test an RNAseq pipeline using Heng Li's wgsim (https://github.com/lh3/wgsim). This seems to be the most widely used read simulator and is pretty easy to use, however I've run into a problem which may be related to some of the reference sequences (human transcripts in this case) being quite short.
The problem is that wgsim does not produce the number of reads specified by the -N flag. I'm using the following command line to produce single-end error and mutation free reads
wgsim -e0 -N 40000 -1 250 -d1 -s1 -r 0 -R 0 -X 0 in.fa out.fq /dev/null
However in wgsim does not produce the specified 40,000 reads. Sometimes it outputs a number close to the specified total, while others it outputs much fewer.
To work around this I've been overproducing reads then subsampling using seqtk, but I was wondering if anyone had seen this behavior before and had any kind of solution?