We have a few in house genomic samples where we validated a couple of hundred variants. To test new tools that are forever being published we could really use a control data set where there is a large number (thousands?) of validated variants. Ideally, the variants would be validated in a publicly available cell line so that we could run samples on our own machines and use the control data to calibrate any new tools we put in to the pipeline.
Is anyone aware of a good sample, with validated variants to use to calibrate machines and bioinformatic tools?
I realize we could use simulated data for this but right now I'm just interested in real data that we could generate with our sequencers.