I want to generate NGS data to do some test and benchmark in both germline and somatic variant calling. I've read a lot of papers about different tools and different tools benchmarks but I want to know your feedback. After reading the papers, I have chosen two tools:
BAMSurgeonuses pre-existing BAM files and adds new variants to them. It's has been widely used in DREAM challenge for testing variant calling algorithms so I assume that it works really nice. Using pre-existing BAM files, the advantage is that you can real data and then introduce new variants for the benchmarking.
- For other hand,
VarSimis able to generate read files taking as input a reference genome and a set of variants. All the data here is purely simulated (well, the variants can be random or previously described ones), and the advantage is that you can somehow control different types of error (like sequencing errors and so on). And also, having
fastqfiles it is possible to test a full pipeline of Alignment+Variant_calling workflow.
At the end, What I would like to have is set of
tumor/normal pair fastq files, with a
true.vcf dataset, and then be able to play and adjust different parameters like: _clonality, heterogeneity, contamination, sequencing error.._
Sorry if the question is too open or wide. I'd like to receive suggestions and personal experiences about the best way to generate this kind of data. If its specific por Exome/Target sequencing would be even better.
Thank you in advance,