I am currently trying to create a dataset of simulated exome reads (with simulated base qualities as well). I am currently using simNGS from EBI to create a fragment library and create reads. The main issue with this one is that with a workflow like:
fasta genome reference -> fragment library -> simulated reads.
We have simulated genomic reads instead of exomic reads. A workflow such as:
fasta exome reference -> fragment library -> simulated reads
doesnt’ sound so good, because it leaves out lots of exome-specific behaviours (e.g, off target reads).
Is there any other approach to follow in order to obtain a reliable whole-exome simulated read dataset?