I'm writing here today because I'm looking for a strategy to analyse my dataset.
Rapidly, we have a dataset of WGBS, with methylation scores. In that subset, we're looking for methylation scores for 20 positions, which are low (between 0 and 5%) over >50 samples. I'd like to know whether it is the same samples which are always the "most" methylated.
I have a dataset with columns : position sample methylation_score
I don't really have an idea on how to carry my analysis on. This is where I stopped : - the scores are low (but probably have a functional role for what we are looking for) - the distribution may not be homogeneous amongst samples and positions - I thought about using a ranking test (like a correlation ranking test like spearman, but I'm blocked by the fact I have there two qualitative data : position and samples. I though about PCA, but I only have one quantitative dimension. I thought about Kruskall-Wallis, that gives me a significant p-value Then I tried to rank all the scores and give a score based on the normalised rank of the methylation score, I'm not really sure of this approach.
So, how would you set up a strategy to know basically if it's the same samples that are likely to be amongst "the most methylated" positions ?
I hope I'm clear enough, otherwise, please tell me how can I refine what I'd like to achieve