I am compiling a wish list of analyses that should be run for every data set coming out of a sequencing facility - regardless of whether this sequence is for RNA-Seq, SNP calling, ChIP-Seq, or possibly de novo sequencing. The goal is to scan for potential red flags that would possibly indicate something has gone awry either in the lab or downstream. I want a list of "sanity checks" that will encompass both sequence quality analysis as well as what can be gleamed from alignments.
Sequence QA - basecalling bias, read quality, yield, throughput, GC bias, 5'/3' motifs?, restriction enzyme bias
Barcode distribution (if barcoded)
- chromosome bias,
- annotational biases (whether
experimentally induced or not)
- genes, repeats, cpg islands, epigenetic markers, expression
I am sure this has already been implemented at a lot of the bigger sequencing cores - I just need a definitive list. Of course, many of these sanity checks will be triggered by the experiments themselves - the point is to develop a comprehensive checklist of analyses that will encompass both what we expect to see as well as what we don't.