I am working on a dataset containing 700 samples and 50000 genes, which has been rank normalized. For my purposes, I calculated the Pearson's residuals of each gene including covariates and technical confounders is the model. The data I produced has not a normal distribution though. Bear in mind that I checked the normality for each gene with Shapiro test, which is very sensitive in case of a big dataset (do you think that 700 observations is a big dataset in this case?) and could detect deviations from normality which do not actually influence the results. Therefore the data might also be fine after all. I was wondering if it advisable to normalize this data again, or if it is not necessary. I searched the internet looking for examples or an explanation on the use of a second normalization step, but I could not find anything useful.
I would really appreciate any answer and comment on this.
Rank normalization itself is very stringent ,so it should take care of everything. (By Rank Normalization,I am assuming every gene in a sample is forced for a value between 0 and 1).
thanks for your comment! Actually the values are not between 0 and 1 but between -3 and +3, I am not sure how the normalization has been done exactly as I got this file as it is...