I am working on a dataset containing 700 samples and 50000 genes, which has been rank normalized. For my purposes, I calculated the Pearson's residuals of each gene including covariates and technical confounders is the model. The data I produced has not a normal distribution though. Bear in mind that I checked the normality for each gene with Shapiro test, which is very sensitive in case of a big dataset (do you think that 700 observations is a big dataset in this case?) and could detect deviations from normality which do not actually influence the results. Therefore the data might also be fine after all. I was wondering if it advisable to normalize this data again, or if it is not necessary. I searched the internet looking for examples or an explanation on the use of a second normalization step, but I could not find anything useful.
I would really appreciate any answer and comment on this.