Question

Divide the dataframe based on conditions in R

0

Entering edit mode

22 months ago

KABILAN ▴ 70

I have to divide my dataframe based on the group conditions. And then I have to extract the top 3 minimum values from particular columns.

Suppose for two groups of data I will get the result like,

structure(list(Type = c("knn_vsn", "knn_vsn", "knn_loess", "knn_loess", 
"knn_rlr", "knn_rlr", "lls_vsn", "lls_vsn", "lls_loess", "lls_loess", 
"lls_rlr", "lls_rlr", "svd_vsn", "svd_vsn", "svd_loess", "svd_loess", 
"svd_rlr", "svd_rlr"), PCV = c(0.00510741446572374, 0.00705765780896556, 
0.00509233659481246, 0.00696732302441824, 0.00509225712407119, 
0.00696173227550932, 0.00492983133396127, 0.00669466376079551, 
0.00491874477556813, 0.0066283342182998, 0.00493450413250135, 
0.00663684901164831, 0.00731828997356189, 0.0106867134410024, 
0.00729635842702563, 0.0105680795904369, 0.00730343601772899, 
0.0105334181341163)), class = "data.frame", row.names = c(NA, 
-18L))

Then I will divide the dataframe by using the following code,

row_odd <- seq_len(nrow(total_pcv))%%2
      data_row_odd <- total_pcv[row_odd == 1, ]
      data_row_even <- total_pcv[row_odd == 0, ]
      total_pcv_all <- cbind (data_row_odd, data_row_even)
      colnames(total_pcv_all) <- c("Group1", "PCV", "Group2", "PCV")
      rownames(total_pcv_all) <- NULL

#Extracting top 3 minimum values in particular column

total_pcv_all <- total_pcv_all%>%slice_min(PCV1, n=3)%>%slice_min(PCV2, n=3)%>%slice_min(PCV3, n=3)%>%slice_min(PCV4, n=3)

And the output will be like,

structure(list(Group1 = c("lls_loess", "lls_rlr", "lls_vsn"), 
    PCV1 = c(0.00491874477556813, 0.00493450413250135, 0.00492983133396127
    ), Group2 = c("lls_loess", "lls_rlr", "lls_vsn"), PCV2 = c(0.0066283342182998, 
    0.00663684901164831, 0.00669466376079551)), class = "data.frame", row.names = c(NA, 
-3L))

This is for two groups of data.

Suppose if the group value of the data will be increase more than two. How to modify the above codes or any other useful way is available for this problem.

For example I have attached the four groups dataframe below,

structure(list(Type = c("knn_vsn", "knn_vsn", "knn_vsn", "knn_vsn", 
"knn_loess", "knn_loess", "knn_loess", "knn_loess", "knn_rlr", 
"knn_rlr", "knn_rlr", "knn_rlr", "lls_vsn", "lls_vsn", "lls_vsn", 
"lls_vsn", "lls_loess", "lls_loess", "lls_loess", "lls_loess", 
"lls_rlr", "lls_rlr", "lls_rlr", "lls_rlr", "svd_vsn", "svd_vsn", 
"svd_vsn", "svd_vsn", "svd_loess", "svd_loess", "svd_loess", 
"svd_loess", "svd_rlr", "svd_rlr", "svd_rlr", "svd_rlr"), PCV = c(0.00318368971435714, 
0.0056588221783197, 0.00418838138878096, 0.0039811913527127, 
0.00317086486813191, 0.00560933517836751, 0.00417201215938804, 
0.00394649435912413, 0.00317086486813191, 0.00560933517836751, 
0.00417201215938804, 0.00394649435912413, 0.00312821095645019, 
0.00550114679857588, 0.00398819978362592, 0.00397059873107098, 
0.00311632537571597, 0.00548316209864631, 0.00397093259462351, 
0.00393840233766712, 0.00313568333628438, 0.00550230673346083, 
0.00398827962107259, 0.00396385071387178, 0.00394831935666465, 
0.00737865310351839, 0.00424157479553304, 0.0041077267588457, 
0.00393605637633005, 0.0073411154394253, 0.00422638750183658, 
0.00407577176849463, 0.00395599132474446, 0.00735748595511963, 
0.00424175886713471, 0.00410191492380459)), class = "data.frame", row.names = c(NA, 
-36L))

And for this I can modify the above code like,

row_odd <- seq_len(nrow(total_pcv))%%4
  data_row_odd0 <- total_pcv[row_odd == 1, ]
  data_row_odd1 <- total_pcv[row_odd == 3, ]
  data_row_even0 <- total_pcv[row_odd == 2, ]
  data_row_even1 <- total_pcv[row_odd == 0, ]

  total_pcv_all <- cbind (data_row_odd0, data_row_odd1, data_row_even0,data_row_even1)
  colnames(total_pcv_all) <- c("Group1", "PCV1", "Group2", "PCV2", "Group3", "PCV3", "Group4", "PCV4")
  rownames(total_pcv_all) <- NULL 
  total_pcv_all <- total_pcv_all%>%slice_min(PCV1, n=3)%>%slice_min(PCV2, n=3)%>%slice_min(PCV3, n=3)%>%slice_min(PCV4, n=3)

And the output will be like,

structure(list(Group1 = c("lls_loess", "lls_rlr", "lls_vsn"), 
    PCV1 = c(0.00311632537571597, 0.00313568333628438, 0.00312821095645019
    ), Group2 = c("lls_loess", "lls_rlr", "lls_vsn"), PCV2 = c(0.00548316209864631, 
    0.00550230673346083, 0.00550114679857588), Group3 = c("lls_loess", 
    "lls_rlr", "lls_vsn"), PCV3 = c(0.00397093259462351, 0.00398827962107259, 
    0.00398819978362592), Group4 = c("lls_loess", "lls_rlr", 
    "lls_vsn"), PCV4 = c(0.00393840233766712, 0.00396385071387178, 
    0.00397059873107098)), class = "data.frame", row.names = c(NA, 
-3L))

Kindly suggest some code to automate this operation based on 'n' number of group data.

R dividing data-frame function • 820 views

ADD COMMENT • link 22 months ago by KABILAN ▴ 70

0

Entering edit mode

Is there any biological context to this question?

ADD REPLY • link 22 months ago by Ram 43k

0

Entering edit mode

Yes. It is related to proteomics expression data analysis.

ADD REPLY • link 22 months ago by KABILAN ▴ 70

score 3 · Accepted Answer · 2022-07-13

Is this suitable? This will work any any "n" group number.

library(tidyverse)
total_pcv %>%
    group_by(Type) %>%
    mutate(Group = paste0("Group", 1:n())) %>%
    pivot_wider(names_from = Group, values_from = PCV)
    ungroup()

Type       Group1  Group2  Group3  Group4
  <chr>       <dbl>   <dbl>   <dbl>   <dbl>
1 knn_vsn   0.00318 0.00566 0.00419 0.00398
2 knn_loess 0.00317 0.00561 0.00417 0.00395
3 knn_rlr   0.00317 0.00561 0.00417 0.00395
4 lls_vsn   0.00313 0.00550 0.00399 0.00397
5 lls_loess 0.00312 0.00548 0.00397 0.00394
6 lls_rlr   0.00314 0.00550 0.00399 0.00396
7 svd_vsn   0.00395 0.00738 0.00424 0.00411
8 svd_loess 0.00394 0.00734 0.00423 0.00408
9 svd_rlr   0.00396 0.00736 0.00424 0.00410