I have an experimental setup where there are known global shifts in levels of our histone mark of interest due to a histone mutation. We include spike-in chromatin in each sample, but we know that the spikein levels are not truly identical across samples given the technical difficulties of quantitation/spike-in addition. However, we have inputs for each sample that we can assume do have equivalent ratios of spike-in chromatin given when the chromatin is added. This means that we can calculate the percentage of spike-in reads for each sample as spikein_input_read% and spikein_chip_read% to derive a ratio between them.
This ratio does not inherently account for the inevitable signal to noise differences present in samples with/without the mutation.
So my question is ultimately - given sample-wise values of spikein_input_read% and spikein_chip_read%, what might be potential options to account for both library size and global composition differences during normalization?
I have read the relevant sections of both the DiffBind and csaw documentation thoroughly, but both assume identical spike-in levels across all samples. Are my thoughts above folly or is there a way to normalize this dataset in a way that makes sense?
This question has also been cross-posted to the Bioconductor support site, and relevant answers provided there will be linked/summarized here.