dpylr filter and for loop
3
0
Entering edit mode
3.7 years ago
siu ▴ 160

Dear all,

I have a data frame (expression_data) having many rows and 16 columns.

              dark_11        dark_9       dark_7        dark_5
Gene1     0.41              0.58              1                 0.91
Gene2     0.62               0.56             0.89            0.36
Gene3     0.89              0.41             0.67             0.76
Gene4     0.31              0.56              0.12            0.32

I want to subset rows having values >= 0.5 in different columns using dpylr filter function.

dark_11 <- filter (expression_data,  dark_11 >=  0.5 )

Which returns:

             dark_11
Gene2     0.62 
Gene3     0.89

I want to use it in a for loop and want to make 16 files as there are 16 columns. I have tried but not getting the desired results.

for (i in names(expression_data)) {
              test <- filter(expression_data, i >= 0.5
              write.csv(test, paste0(i,".csv"))
}

Any help will be highly appreciated.

Thanks in advance

R • 10k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
5
Entering edit mode
3.7 years ago
Ram 43k

Scoping doesn't work that way in R. You're using i as both a string and a name - there needs to be some enquo() or !! or something, maybe rpolicastro can give you an imap solution.

Hang on, there is no way dplyr::filter reduces the number of columns you get. We are missing something. Are you giving us all the steps you're using?

Easier than using dplyr AND a loop, you can simply do this:

for (i in names(expression_data)) {
              test <- expression_data[expression_data[,i] >= 0.5, ]
              write.csv(test, paste0(i,".csv"))
}
ADD COMMENT
0
Entering edit mode

Thanks for you suggestion! I have used the filter function as given in "https://dplyr.tidyverse.org/reference/filter.html" .

"for" loop that you have given worked well.

Many thanks

ADD REPLY
0
Entering edit mode

No, you could not have used that, as the documentation specifically states that columns are not altered. There is no way you went down to a 1-column data.frame using just filter.

ADD REPLY
0
Entering edit mode

For posterity functions like lapply and imap would iterate column wise over a data.frame. However, they return a vector for each column which you would have to convert back to a data.frame again if you wanted to save a table from within the function. So in this case just looping over the column names for a for loop or any apply function of choice works perfectly fine.

ADD REPLY
0
Entering edit mode

Does OP's code make sense to you though? How'd they drop columns using filter?

ADD REPLY
2
Entering edit mode

Their code wouldn't work. If i was a variable storing the name of the column, to use filter on that variable directly you would need to do this filter(expression_data, !!as.name(i) >= 0.5). Without doing this no error will be returned, but at the same time no filter would be applied. This wouldn't return a one column table though as the original poster stated.

Edit: I just wanted to add that the above is an illustrative example of using a variable directly. The same can be accomplished in dplyr 1.0.0 in a more tidy way (which I think is kind of funny looking too): filter(across(all_of(i), ~.x >= 0.5)).

ADD REPLY
0
Entering edit mode

That's what I am saying - OP is leaving some detail out. I understand as.name, but I cannot wrap my head around !!, quo(), etc.

ADD REPLY
1
Entering edit mode

as.name (or as.symbol) are base R functions that will return a symbol object of the character value saved in the variable. !! is an rlang package function specific to the tidyverse that lets you use that symbol object in functions like filter and mutate. It's pretty ugly so I usually try to avoid it when possible.

ADD REPLY
0
Entering edit mode

This has also worked well.

Thanks

ADD REPLY
4
Entering edit mode
3.7 years ago

It's unclear in their post, but they may want a one column data.frame. If so the code would just need a slight modification.

for (i in names(expression_data)) {
  test <- expression_data[expression_data[,i] >= 0.5, i, drop=FALSE]
  write.csv(test, paste0(i,".csv"))
}
ADD COMMENT
1
Entering edit mode

I am confused. How is following output is one column output (copy/pasted from one of the outputs from OP input data):

$ cat dark_11.csv 
"","dark_11","dark_9","dark_7","dark_5"
"Gene2",0.62,0.56,0.89,0.36
"Gene3",0.89,0.41,0.67,0.76

I think the script is writing all the columns to each file, but each column is filtered as per the value mentioned in loop for each column name. If it is a single column to be printed in each file, 'group_walk' function prints the values in single column.

output from dplyr group_walk:

$ cat dark_11.csv 
"","rowname","value"
"1","Gene2",0.62
"2","Gene3",0.89

Edit: OP Code is working fine. I am mistaken. 'group_walk" does the exact function as the above loop function. as per OP's code, output is (which is correct):

$ cat dark_11.csv 
"","dark_11"
"Gene2",0.62
"Gene3",0.89
ADD REPLY
0
Entering edit mode

Yes, filter function is printing all the columns with the column filtered based on dark_11. Sorry for the confusion.

I will definitely try group_walk.

Thanks for your kind response.

ADD REPLY
0
Entering edit mode

RamRS's solution (and my slight modification) loops over the column names to subset and filter by that column. Adding drop=FALSE ensures that the one column data.frame isn't coerced to a vector.

ADD REPLY
0
Entering edit mode

Great! this is what I want.

Thanks

ADD REPLY
1
Entering edit mode

Please accept the answer(s) that worked for you.

Upvote|Bookmark|Accept

ADD REPLY
2
Entering edit mode
3.7 years ago

Loop may not be necessary:

input:

> df
      dark_11 dark_9 dark_7 dark_5
Gene1    0.41   0.58   1.00   0.91
Gene2    0.62   0.56   0.89   0.36
Gene3    0.89   0.41   0.67   0.76
Gene4    0.31   0.56   0.12   0.32

output:

> df %>%
+     rownames_to_column() %>%
+     pivot_longer(names_to = "condition", values_to="value",-rowname) %>%
+     filter(value>0.5) %>%
+     pivot_wider(names_from = "condition", values_from="value", names_sort=T) %>%
+     column_to_rownames() 
      dark_11 dark_5 dark_7 dark_9
Gene1      NA   0.91   1.00   0.58
Gene2    0.62     NA   0.89   0.56
Gene3    0.89   0.76   0.67     NA
Gene4      NA     NA     NA   0.56

per group values:

> df1 %>%
+     rownames_to_column() %>%
+     pivot_longer(names_to = "condition", values_to="value",-rowname) %>%
+     filter(value>0.5) %>%
+     group_by(condition) %>%
+     group_walk(~print(.x))
# A tibble: 2 x 2
  rowname value
  <chr>   <dbl>
1 Gene2    0.62
2 Gene3    0.89
# A tibble: 2 x 2
  rowname value
  <chr>   <dbl>
1 Gene1    0.91
2 Gene3    0.76
# A tibble: 3 x 2
  rowname value
  <chr>   <dbl>
1 Gene1    1   
2 Gene2    0.89
3 Gene3    0.67
# A tibble: 3 x 2
  rowname value
  <chr>   <dbl>
1 Gene1   0.580
2 Gene2   0.56 
3 Gene4   0.56
ADD COMMENT
0
Entering edit mode

This is awesome!

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6