How to determine low and high variable row and column in a table?

0

Entering edit mode

4.5 years ago

star ▴ 350

I have a big table, its rows are genomic coordinates and columns are the genomic features (like below). I would like to separate rows and columns based on the variability, I have tried to use some basic statistics like below codes, but I like to know is it the right way or is there an alternative (statistical) way that would be more accurate?

DF:

          Feature_A     Feature_B    Feature_C    Feature_D

cord_1         0.9              1           0.8           1  
cord_2         0.6              0.1         0.9         0.5
cord_3           0              0             0           0
cord_4         0.1              0             0         0.2

codes:

DF$skew<-rowSkewness(DF)
DF$var <-rowVars(DF)
DF$sd <-rowSds(DF)
DF$IQR <- rowIQRs(DF))
DF$mean <- rowMeans(DF)
DF$coef.var <- DF$sd /DF$mean

I would like to consider cord_2 (as more variable) and ignore cord_1,3 and 4 in my output, so based on that, which statistic element is more better?

biostatistics methematics basic_statistics • 912 views

ADD COMMENT • link updated 4.5 years ago by Biostar 20 • written 4.5 years ago by star ▴ 350