R - cleanest way to calculate mean of a columnX in dataset1, columnZ in dataset2.... column20 in dataset20 and combine the results
1
0
Entering edit mode
20 months ago
Bianca ▴ 20

I have 20 data frames. Each has 3 columns. I need to calculate the mean (per dataset) of the column named as 'value' and the sum of the numbers in the same column and them combine all of the results into one dataset. The new dataset will then have 20 rows and three columns c('file', 'mean_value', 'sum_value'). The rows are going to be the names of the file I loaded in R and the corresponding columns items will be the mean and the sum.

For example: data1 = data.frame(name = c('Mary', 'John'), location = c('north', 'south'), values = c(2,4)

data2 = data.frame(name = c('Joseph', 'Claire'), location = c('north', 'west'), values = c(20,40)

data3 = data.frame(name = c('Dan', 'Louis'), location = c('east', 'south'), values = c(12,4)

... until data20

Expected output:

result = data.frame(file=c('data1', 'data2', 'data3), mean_value = c(3, 30, 8), sum_value = c(6,60,16)

Thank you very much!

mean R combine • 1.3k views
ADD COMMENT
0
Entering edit mode

You might want to try posting on Stack Overflow or similar, since this is a general R question. This thread might help: https://stackoverflow.com/questions/57417483/loop-over-multiple-dataframes-in-a-for-loop-in-r

If you really want to keep the data.frames separate, I recommend putting them into one list to make it easier to iterate over. But best practices would be to combine them into a single large data.frame with a column to indicate which dataset a given row came from.

ADD REPLY
0
Entering edit mode

ok, I will try that, thanks

ADD REPLY
2
Entering edit mode
20 months ago
ATpoint 81k

One of many options:

data1 = data.frame(name = c('Mary', 'John'), location = c('north', 'south'), values = c(2,4))
data2 = data.frame(name = c('Joseph', 'Claire'), location = c('north', 'west'), values = c(20,40))
data3 = data.frame(name = c('Dan', 'Louis'), location = c('east', 'south'), values = c(12,4))

combined <- lapply(ls(pattern="data[1-9]"), function(x) data.frame(file=x, get(x))) |>
  do.call(what="rbind")

library(dplyr)
combined %>%
  group_by(file) %>% 
  summarize(mean_value=mean(values), sum_value=sum(values))
ADD COMMENT
0
Entering edit mode

Hello, that did not work. It returned the mean value of all the data frames combined and the sum of all the data frames combined. It returned something like this

df[1x2]

mean_value sum_value
    72/3       72
ADD REPLY
0
Entering edit mode

I found out why. I was loading plyr after dplyr, that was my problem. I removed library(plyr) from my code and only loaded (dplyr). Now it works :)

I found the answer to that here. https://stackoverflow.com/questions/27157137/dplyr-only-returning-one-row-when-using-summarize

Thank you

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6