Question: Calculating ratios by group in R
1
gravatar for Sam
11 months ago by
Sam20
canada
Sam20 wrote:

Hi there,

I would like to calculate ratio of NN (total markers/ total NN) from different groups (here 6 sample A and B-groups) in R. It must be easy but I couldn't any example online.

I have found some similar answer but as a newbie to R, I couldn't annotate the code. https://stackoverflow.com/questions/48555851/adding-a-row-for-the-ratio-of-two-variables

        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN

expected output

    A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6   A-ratio B-ratio A+B-ratio
 M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC  -   -   - 
 M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  0   -   11                                  
 M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN  1.5 0.7 1.4                                
 M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN  0   4   1.0

Thanks for your help.

snp R • 359 views
ADD COMMENTlink modified 11 months ago by zx87549.4k • written 11 months ago by Sam20
4

I don't understand what A-ratio, B-ratio, A+B-ratio are supposed to represent. Can you spell out how you arrived at the values in the last three columns of the second row?

ADD REPLYlink written 11 months ago by Friederike6.0k
3
gravatar for zx8754
11 months ago by
zx87549.4k
London
zx87549.4k wrote:

I am guessing we are trying to get missingness per sample and overall, try this example:

# example data
df1 <- read.table(text = "        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN", header = TRUE, stringsAsFactors = FALSE)

x <- colnames(df1)
cbind(df1, 
      sapply(c("A", "B"), function(i){
        d <- df1[ grepl(paste0("^", i), x) ]
        rowSums(d == "NN")/ncol(d)
        }),
      AB = rowSums(df1 == "NN")/ncol(df1)
      )
#    A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6         A         B         AB
# M1 CC CC AC AA CC CC CC AA AC CC CC CC 0.0000000 0.0000000 0.00000000
# M2 NN AA AA AC AA AA AA AA AA AA AA AA 0.1666667 0.0000000 0.08333333
# M3 AA AA NN NN AA AA GG NN GG GG NN NN 0.3333333 0.5000000 0.41666667
# M4 NN NN NN AA AA NN AA AA AA AA NN NN 0.6666667 0.3333333 0.50000000
ADD COMMENTlink modified 11 months ago • written 11 months ago by zx87549.4k
2

That's what I was thinking also but the output of this doesn't match the presented expected output. Waiting for OP to clarify.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche23k

Awesome...thanks a lot @zx8754, it worked well. I will make to use dput format for future requests.

Friederike and Jean-Karim Heriche - sorry that the ratios were wrong (for missingness) in my expected output because it was from my entire data set (180x35000) as I subsampled but forgot to calculate the ratio for this subsample.

ADD REPLYlink written 11 months ago by Sam20

If it was helpful consider accepting as an answer - "tick".

ADD REPLYlink written 11 months ago by zx87549.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 650 users visited in the last hour