Background (don't need help on these sections, yet): I have read depth information on ~300 whole genomes. I am aware of many pitfalls of analyzing read depth as a proxy for CNV and have taken many steps to obtain quality-controlled read depth information that I am ready to analyze.
With this read depth data, I want to look for associations between this standardized, QCed, read depth information and my phenotype of interest in a covariate-controlled analysis.
However, I have been looking at the distributions of read depth information by window. Looking across windows, these windows have a distribution, but looking within window, there are (sometimes very) different distributions per window.
If the windows were all distributed the same, I could for instance run a poisson regression 1.5M times and be done. However, they are not. As such, the generalized linear model that I select should possibly be changed depending on the window to maximize power to analyze any given window.
Does anyone have experience automating the process of model fitting? Or is this inappropriate? Another method would of course be to use nonparametric analysis, but then I lose potentially very interesting information on the distribution of a given window.