Hi,
I don't have much experience with simulating sequencing, but I'm trying to benchmark the performance of another tool on the following organims.
- Eukaryota
- Saccharomyces cerevisiae
- Bacteria
- Escherichia coli
- Archaea
- Sulfolobus solfataricus
In order to assess the performance of this tool, I need Nanopore R10.4.1 and PacBio HiFi reads. I am ideally looking for genome sequencing studies that have utilized both Oxford Nanopore Sequencing and PacBio HiFi Sequencing on the same DNA sample or, at the very least, on different DNA samples derived from the same strain of organism. This is a challenging task because most studies tend to focus on a single sequencing method rather than employing both. While some benchmarking studies do exist, the ones I have come across are from a few years ago and rely on outdated Nanopore chemistry and pre-PacBio HiFi reads. Additionally, comparing Nanopore R10.4.1 and PacBio HiFi reads across different studies is not feasible, as differences in sample strains are likely to significantly skew the results.
So instead I'm wondering if I could instead simulate the read data that I need. However I don't know enough about the implementations and limitations of the current long-read simulators. I have a couple of questions. Generally how do these read simulators work? I'm sure it isn't too hard to simulate _E. coli_ or _S. cerevisiae_ reads because these are incredibly well studied organisms, but how about the efficacy of simulating reads from _Sulfolobus solfataricus_? Is it possible to feed an organisms genome file and it generates reads for that organism? Alternatively, is it possible to train a read simulator on a particular organism and then have the re-trained program output the simulated reads for me?
Any advice on how to go about this problem would be appreciated.