Question: Calculating ratios by group in R
1
gravatar for Sam
4 weeks ago by
Sam10
canada
Sam10 wrote:

Hi there,

I would like to calculate ratio of NN (total markers/ total NN) from different groups (here 6 sample A and B-groups) in R. It must be easy but I couldn't any example online.

I have found some similar answer but as a newbie to R, I couldn't annotate the code. https://stackoverflow.com/questions/48555851/adding-a-row-for-the-ratio-of-two-variables

        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN

expected output

    A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6   A-ratio B-ratio A+B-ratio
 M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC  -   -   - 
 M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  0   -   11                                  
 M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN  1.5 0.7 1.4                                
 M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN  0   4   1.0

Thanks for your help.

snp R • 200 views
ADD COMMENTlink modified 29 days ago by zx87548.2k • written 4 weeks ago by Sam10
4

I don't understand what A-ratio, B-ratio, A+B-ratio are supposed to represent. Can you spell out how you arrived at the values in the last three columns of the second row?

ADD REPLYlink written 4 weeks ago by Friederike5.1k
3
gravatar for zx8754
29 days ago by
zx87548.2k
London
zx87548.2k wrote:

I am guessing we are trying to get missingness per sample and overall, try this example:

# example data
df1 <- read.table(text = "        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN", header = TRUE, stringsAsFactors = FALSE)

x <- colnames(df1)
cbind(df1, 
      sapply(c("A", "B"), function(i){
        d <- df1[ grepl(paste0("^", i), x) ]
        rowSums(d == "NN")/ncol(d)
        }),
      AB = rowSums(df1 == "NN")/ncol(df1)
      )
#    A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6         A         B         AB
# M1 CC CC AC AA CC CC CC AA AC CC CC CC 0.0000000 0.0000000 0.00000000
# M2 NN AA AA AC AA AA AA AA AA AA AA AA 0.1666667 0.0000000 0.08333333
# M3 AA AA NN NN AA AA GG NN GG GG NN NN 0.3333333 0.5000000 0.41666667
# M4 NN NN NN AA AA NN AA AA AA AA NN NN 0.6666667 0.3333333 0.50000000
ADD COMMENTlink modified 29 days ago • written 29 days ago by zx87548.2k
2

That's what I was thinking also but the output of this doesn't match the presented expected output. Waiting for OP to clarify.

ADD REPLYlink written 29 days ago by Jean-Karim Heriche20k

Awesome...thanks a lot @zx8754, it worked well. I will make to use dput format for future requests.

Friederike and Jean-Karim Heriche - sorry that the ratios were wrong (for missingness) in my expected output because it was from my entire data set (180x35000) as I subsampled but forgot to calculate the ratio for this subsample.

ADD REPLYlink written 29 days ago by Sam10

If it was helpful consider accepting as an answer - "tick".

ADD REPLYlink written 28 days ago by zx87548.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 744 users visited in the last hour