Question

dpylr filter and for loop

0

Entering edit mode

3.7 years ago

siu ▴ 160

Dear all,

I have a data frame (expression_data) having many rows and 16 columns.

              dark_11        dark_9       dark_7        dark_5
Gene1     0.41              0.58              1                 0.91
Gene2     0.62               0.56             0.89            0.36
Gene3     0.89              0.41             0.67             0.76
Gene4     0.31              0.56              0.12            0.32

I want to subset rows having values >= 0.5 in different columns using dpylr filter function.

dark_11 <- filter (expression_data,  dark_11 >=  0.5 )

Which returns:

             dark_11
Gene2     0.62 
Gene3     0.89

I want to use it in a for loop and want to make 16 files as there are 16 columns. I have tried but not getting the desired results.

for (i in names(expression_data)) {
              test <- filter(expression_data, i >= 0.5
              write.csv(test, paste0(i,".csv"))
}

Any help will be highly appreciated.

Thanks in advance

R • 10k views

ADD COMMENT • link updated 3.7 years ago by cpad0112 21k • written 3.7 years ago by siu ▴ 160

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY • link 3.7 years ago by Ram 43k

2

Entering edit mode

3.7 years ago

cpad0112 21k

Loop may not be necessary:

input:

> df
      dark_11 dark_9 dark_7 dark_5
Gene1    0.41   0.58   1.00   0.91
Gene2    0.62   0.56   0.89   0.36
Gene3    0.89   0.41   0.67   0.76
Gene4    0.31   0.56   0.12   0.32

output:

> df %>%
+     rownames_to_column() %>%
+     pivot_longer(names_to = "condition", values_to="value",-rowname) %>%
+     filter(value>0.5) %>%
+     pivot_wider(names_from = "condition", values_from="value", names_sort=T) %>%
+     column_to_rownames() 
      dark_11 dark_5 dark_7 dark_9
Gene1      NA   0.91   1.00   0.58
Gene2    0.62     NA   0.89   0.56
Gene3    0.89   0.76   0.67     NA
Gene4      NA     NA     NA   0.56

per group values:

> df1 %>%
+     rownames_to_column() %>%
+     pivot_longer(names_to = "condition", values_to="value",-rowname) %>%
+     filter(value>0.5) %>%
+     group_by(condition) %>%
+     group_walk(~print(.x))
# A tibble: 2 x 2
  rowname value
  <chr>   <dbl>
1 Gene2    0.62
2 Gene3    0.89
# A tibble: 2 x 2
  rowname value
  <chr>   <dbl>
1 Gene1    0.91
2 Gene3    0.76
# A tibble: 3 x 2
  rowname value
  <chr>   <dbl>
1 Gene1    1   
2 Gene2    0.89
3 Gene3    0.67
# A tibble: 3 x 2
  rowname value
  <chr>   <dbl>
1 Gene1   0.580
2 Gene2   0.56 
3 Gene4   0.56

ADD COMMENT • link 3.7 years ago by cpad0112 21k

0

Entering edit mode

This is awesome!

Thanks

ADD REPLY • link 3.7 years ago by siu ▴ 160

score 5 · Accepted Answer · 2020-08-17

5

Entering edit mode

3.7 years ago

Ram 43k

Scoping doesn't work that way in R. You're using i as both a string and a name - there needs to be some enquo() or !! or something, maybe rpolicastro can give you an imap solution.

Hang on, there is no way dplyr::filter reduces the number of columns you get. We are missing something. Are you giving us all the steps you're using?

Easier than using dplyr AND a loop, you can simply do this:

for (i in names(expression_data)) {
              test <- expression_data[expression_data[,i] >= 0.5, ]
              write.csv(test, paste0(i,".csv"))
}

ADD COMMENT • link 3.7 years ago by Ram 43k

0

Entering edit mode

Thanks for you suggestion! I have used the filter function as given in "https://dplyr.tidyverse.org/reference/filter.html" .

"for" loop that you have given worked well.

Many thanks

ADD REPLY • link 3.7 years ago by siu ▴ 160

0

Entering edit mode

No, you could not have used that, as the documentation specifically states that columns are not altered. There is no way you went down to a 1-column data.frame using just filter.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

For posterity functions like lapply and imap would iterate column wise over a data.frame. However, they return a vector for each column which you would have to convert back to a data.frame again if you wanted to save a table from within the function. So in this case just looping over the column names for a for loop or any apply function of choice works perfectly fine.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Does OP's code make sense to you though? How'd they drop columns using filter?

ADD REPLY • link 3.7 years ago by Ram 43k

2

Entering edit mode

Their code wouldn't work. If i was a variable storing the name of the column, to use filter on that variable directly you would need to do this filter(expression_data, !!as.name(i) >= 0.5). Without doing this no error will be returned, but at the same time no filter would be applied. This wouldn't return a one column table though as the original poster stated.

Edit: I just wanted to add that the above is an illustrative example of using a variable directly. The same can be accomplished in dplyr 1.0.0 in a more tidy way (which I think is kind of funny looking too): filter(across(all_of(i), ~.x >= 0.5)).

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

That's what I am saying - OP is leaving some detail out. I understand as.name, but I cannot wrap my head around !!, quo(), etc.

ADD REPLY • link 3.7 years ago by Ram 43k

1

Entering edit mode

as.name (or as.symbol) are base R functions that will return a symbol object of the character value saved in the variable. !! is an rlang package function specific to the tidyverse that lets you use that symbol object in functions like filter and mutate. It's pretty ugly so I usually try to avoid it when possible.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

This has also worked well.

Thanks

ADD REPLY • link 3.7 years ago by siu ▴ 160

score 4 · Accepted Answer · 2020-08-17

4

Entering edit mode

3.7 years ago

rpolicastro 13k

It's unclear in their post, but they may want a one column data.frame. If so the code would just need a slight modification.

for (i in names(expression_data)) {
  test <- expression_data[expression_data[,i] >= 0.5, i, drop=FALSE]
  write.csv(test, paste0(i,".csv"))
}

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

1

Entering edit mode

I am confused. How is following output is one column output (copy/pasted from one of the outputs from OP input data):

$ cat dark_11.csv 
"","dark_11","dark_9","dark_7","dark_5"
"Gene2",0.62,0.56,0.89,0.36
"Gene3",0.89,0.41,0.67,0.76

I think the script is writing all the columns to each file, but each column is filtered as per the value mentioned in loop for each column name. If it is a single column to be printed in each file, 'group_walk' function prints the values in single column.

output from dplyr group_walk:

$ cat dark_11.csv 
"","rowname","value"
"1","Gene2",0.62
"2","Gene3",0.89

Edit: OP Code is working fine. I am mistaken. 'group_walk" does the exact function as the above loop function. as per OP's code, output is (which is correct):

$ cat dark_11.csv 
"","dark_11"
"Gene2",0.62
"Gene3",0.89

ADD REPLY • link 3.7 years ago by cpad0112 21k

0

Entering edit mode

Yes, filter function is printing all the columns with the column filtered based on dark_11. Sorry for the confusion.

I will definitely try group_walk.

Thanks for your kind response.

ADD REPLY • link 3.7 years ago by siu ▴ 160

0

Entering edit mode

RamRS's solution (and my slight modification) loops over the column names to subset and filter by that column. Adding drop=FALSE ensures that the one column data.frame isn't coerced to a vector.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Great! this is what I want.

Thanks

ADD REPLY • link 3.7 years ago by siu ▴ 160

1

Entering edit mode

Please accept the answer(s) that worked for you.

Upvote|Bookmark|Accept

ADD REPLY • link 3.7 years ago by Ram 43k