How to create a new column with for and if/else
1
0
Entering edit mode
9 weeks ago

Hi guys,

I'd like to create a new column of a dataframe using for and if/else functions. My dataframe ds1 has the following 4 columns:

Sample_Name                Sample_Well               

pool_women                    A01                     
 213141                       B01                      
pool_men                      C01                    
253141                        D01                      
202196                        E01                      
200569                        F01                      
242196                        G01                      

Sentrix_ID  Sentrix_Position
    2,04426E+11    R01C01
    2,04426E+11    R02C01
    2,04426E+11    R03C01
    2,04426E+11    R04C01
    2,04426E+11    R05C01
    2,04426E+11    R06C01
    2,04426E+11    R07C01

Now, I want to create a new column Sample_group, in which I should find 0 for samples starting with "21" and "20" in Sample_Name, 1 for samples starting with "25" and "24" in Sample_Names and 2 for the others (pool_women and pool_men), as following:

Sample_Name                Sample_Well               

pool_women                    A01                     
 213141                       B01                      
pool_men                      C01                    
253141                        D01                      
202196                        E01                      
200569                        F01                      
242196                        G01                      

Sentrix_ID  Sentrix_Position   Sample_group
    2,04426E+11    R01C01               2
    2,04426E+11    R02C01               0
    2,04426E+11    R03C01               2
    2,04426E+11    R04C01               1
    2,04426E+11    R05C01               0
    2,04426E+11    R06C01               0
    2,04426E+11    R07C01               1

I wrote the following code:

variables <- colnames(ds1[,which(colnames(ds1)=="Sample_Name")])

for(i in variables){
if(gsub("(^\\d{2}).*", "\\1", i) == "21" | gsub("(^\\d{2}).*", "\\1", i) == "20") {ds1$Sample_group1 <- 0}

if(gsub("(^\\d{2}).*", "\\1", i) == "24" | gsub("(^\\d{2}).*", "\\1", i) == "25") {ds1$Sample_group1 <- 1}

else {ds1$Sample_group1 <- 2}

}

However, I found only 1 at the Sample_group column for all samples.

What's wrong with my code?

Thank u!

R loop for • 175 views
ADD COMMENT
1
Entering edit mode

input:

> df

  Sample_Name
1  pool_women
2      213141
3    pool_men
4      253141
5      202196
6      200569
7      242196

output:

> df$group = with (df, 
+            ifelse (grepl("^20|^21", Sample_Name),0, 
+            ifelse(grepl("^25|^24",  Sample_Name),1,2 )))

> df
  Sample_Name group
1  pool_women     2
2      213141     0
3    pool_men     2
4      253141     1
5      202196     0
6      200569     0
7      242196     1

with dplyr:

df %>%
    mutate(across(
        .cols = Sample_Name,
        ~ ifelse (grepl("^20|^21", .),0, ifelse(grepl("^25|^24", .),1,2 )),
        .names = "group"
    ))
ADD REPLY
5
Entering edit mode
9 weeks ago
ATpoint 52k

Don't use for loops, that will access every single row, which will take ages if you have large data.frames.

#/ Example data:
df <- data.frame(Sample_Name=c("200", "211", 
                               "240", "251", 
                               "345", "456"))

#----------------------------------------------------------
# BASE R:
#----------------------------------------------------------
#/ new column with "2" (so other):
df$Sample_Group <- rep("2", nrow(df))

#/ and now replace 20/21s with 0 and 24/25s with 1:
df$Sample_Group[grep("^20|^21", df$Sample_Name)] <- 0
df$Sample_Group[grep("^24|^25", df$Sample_Name)] <- 1

> df
Sample_Name Sample_Group
1         200            0
2         211            0
3         240            1
4         251            1
5         345            2
6         456            2

#----------------------------------------------------------
# TIDYVERSE
#----------------------------------------------------------
library(tidyverse)
df %>%
  mutate(Sample_Group = 
           case_when(str_detect(Sample_Name, "^20|^21") ~ "0",
                     str_detect(Sample_Name, "^24|^25") ~ "1",
                     !str_detect(Sample_Name, "^20|^21|^24|^25") ~ "2")
)

Sample_Name Sample_Group
1         200            0
2         211            0
3         240            1
4         251            1
5         345            2
6         456            2

The ^ means "starts with". The first solution is with only base R functions, the second one uses the tidyverse packages.

Next time please provide example data, e.g. via dput. If you have a data.frame named df you can run dput(df) and it will print an ASCII representation of the data which you can provide. That makes it easy to copy/paste your data rather than typing down things. Also try to provide a small but representative selection of the data or dummy data to keep the post short and readable.

ADD COMMENT

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6