I have a data frame with various SNPs and other columns with some information and values such as the following:
SNP ID. location chromosome values column A. values column B rs8662689 78654 1 0.6432 0.2458 rs753279 1009753 7 -1.6434 1.9876 rs4331780 2086433 22 4.521 -3.743 and so on...... ..... ......
I would like to standardise the values in column B. I understand I have to divide the column B in various frequency bins ( I have 20x10^6 rows so I guess I need to set a bin value by which to divide, for instance 1000). Then I would like to calculate the mean of each bin and the standard deviation and divide each value in column B by that particular mean and standard deviation calculated previously in each bin and create a new column. Anybody knows how to do this by a R code?
I have written something like this but it does not seem to work, might be wrong:
library(dplyr) n_bins = 1000 outscore = df %>% mutate(bin=ntile(mean(df$valuesB),n_bins)) %>% group_by(bin) %>% mutate(zscore=scale(mean()),outlier=abs(zscore)>1.7)
Any help highly appreciated. Thanks