I am analyzing EPIC methylation array and did necessary filtering for cross-reactive probes, common snps, excluded XY chr. ~10% of my samples cluster separately (which I am calling "outliers" for now) than the rest. Since these samples are collected from human brain with similar pathological conditions, I did not expect such differences. I tried to check other quality measures for this, nothing seems off the chart (good detection p values, bisulphite conversion rate etc.) I have other phenotypic data which I checked to see if any correlate with these samples, they don't. I looked at the overall beta distribution of all samples (top raw vs bottom quantile normalized), but the outliers kind of overlap with other samples at the lower side, so I guess the distribution is not off from the rest.
I took a subset of the outlier samples and age/sex matched with the other cluster and looked at their distribution, and see below for raw(top) and quantile normalized(bottom). For outlier samples, I see higher bumps for both unmethylated and methylated signals. .
This is where my confusion is, what could be technical issues that may lead to such patterns in the data? Can you suggest anything to check what's going on here?
There are two possibilities: 1. It's all due to some technical error. or 2. There could be some biological meaning? Testing #1 should be easier, but I am kind of stuck on what/where to look for. And I don't think it's worth diving into #2 until I exclude all possible reasons (that can be tested) for #1. Any help is appreciated!