Question: Calculating ratios by group in R
1
Sam20 wrote:

Hi there,

I would like to calculate ratio of NN (total markers/ total NN) from different groups (here 6 sample A and B-groups) in R. It must be easy but I couldn't any example online.

I have found some similar answer but as a newbie to R, I couldn't annotate the code. https://stackoverflow.com/questions/48555851/adding-a-row-for-the-ratio-of-two-variables

``````        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6
M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC
M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA
M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN
M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN
``````

expected output

``````    A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6   A-ratio B-ratio A+B-ratio
M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC  -   -   -
M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  0   -   11
M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN  1.5 0.7 1.4
M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN  0   4   1.0
``````

snp R • 421 views
modified 17 months ago by zx87549.9k • written 17 months ago by Sam20
4

I don't understand what A-ratio, B-ratio, A+B-ratio are supposed to represent. Can you spell out how you arrived at the values in the last three columns of the second row?

3
zx87549.9k wrote:

I am guessing we are trying to get missingness per sample and overall, try this example:

``````# example data
df1 <- read.table(text = "        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6
M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC
M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA
M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN
M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN", header = TRUE, stringsAsFactors = FALSE)

x <- colnames(df1)
cbind(df1,
sapply(c("A", "B"), function(i){
d <- df1[ grepl(paste0("^", i), x) ]
rowSums(d == "NN")/ncol(d)
}),
AB = rowSums(df1 == "NN")/ncol(df1)
)
#    A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6         A         B         AB
# M1 CC CC AC AA CC CC CC AA AC CC CC CC 0.0000000 0.0000000 0.00000000
# M2 NN AA AA AC AA AA AA AA AA AA AA AA 0.1666667 0.0000000 0.08333333
# M3 AA AA NN NN AA AA GG NN GG GG NN NN 0.3333333 0.5000000 0.41666667
# M4 NN NN NN AA AA NN AA AA AA AA NN NN 0.6666667 0.3333333 0.50000000
``````
2

That's what I was thinking also but the output of this doesn't match the presented expected output. Waiting for OP to clarify.

Awesome...thanks a lot @zx8754, it worked well. I will make to use dput format for future requests.

Friederike and Jean-Karim Heriche - sorry that the ratios were wrong (for missingness) in my expected output because it was from my entire data set (180x35000) as I subsampled but forgot to calculate the ratio for this subsample.