My question is, given different exome sequencing batches, and say different samples per batch; what parameters/figures would you make to present the quality of data in different steps ?
There is no single tool or even data level that you can track that will catch all problems, but a good high-level metric is the % of bases with coverage > 30x (30x is a minimum) for targeted bases. This number will get you a rough sense of how effective your variant calling will be. You can get this information from picard CalculateHsMetrics.
Others that are less important for gauging the success on a per-sample basis and can be tracked over time to assess systematic sequencing problems might include:
- Insert sizes
- Error rate
- Duplication rate/effective library size
- % aligned
- Overrepresented sequences
- Adapter contamination