R - cleanest way to calculate mean of a columnX in dataset1, columnZ in dataset2.... column20 in dataset20 and combine the results
1
0
Entering edit mode
5 days ago
Bianca • 0

I have 20 data frames. Each has 3 columns. I need to calculate the mean (per dataset) of the column named as 'value' and the sum of the numbers in the same column and them combine all of the results into one dataset. The new dataset will then have 20 rows and three columns c('file', 'mean_value', 'sum_value'). The rows are going to be the names of the file I loaded in R and the corresponding columns items will be the mean and the sum.

For example: data1 = data.frame(name = c('Mary', 'John'), location = c('north', 'south'), values = c(2,4)

data2 = data.frame(name = c('Joseph', 'Claire'), location = c('north', 'west'), values = c(20,40)

data3 = data.frame(name = c('Dan', 'Louis'), location = c('east', 'south'), values = c(12,4)

... until data20

Expected output:

result = data.frame(file=c('data1', 'data2', 'data3), mean_value = c(3, 30, 8), sum_value = c(6,60,16)

Thank you very much!

mean R combine • 591 views
0
Entering edit mode

You might want to try posting on Stack Overflow or similar, since this is a general R question. This thread might help: https://stackoverflow.com/questions/57417483/loop-over-multiple-dataframes-in-a-for-loop-in-r

If you really want to keep the data.frames separate, I recommend putting them into one list to make it easier to iterate over. But best practices would be to combine them into a single large data.frame with a column to indicate which dataset a given row came from.

0
Entering edit mode

ok, I will try that, thanks

2
Entering edit mode
5 days ago
ATpoint 64k

One of many options:

data1 = data.frame(name = c('Mary', 'John'), location = c('north', 'south'), values = c(2,4))
data2 = data.frame(name = c('Joseph', 'Claire'), location = c('north', 'west'), values = c(20,40))
data3 = data.frame(name = c('Dan', 'Louis'), location = c('east', 'south'), values = c(12,4))

combined <- lapply(ls(pattern="data[1-9]"), function(x) data.frame(file=x, get(x))) |>
do.call(what="rbind")

library(dplyr)
combined %>%
group_by(file) %>%
summarize(mean_value=mean(values), sum_value=sum(values))

0
Entering edit mode

Hello, that did not work. It returned the mean value of all the data frames combined and the sum of all the data frames combined. It returned something like this

df[1x2]

mean_value sum_value
72/3       72

0
Entering edit mode

I found out why. I was loading plyr after dplyr, that was my problem. I removed library(plyr) from my code and only loaded (dplyr). Now it works :)

I found the answer to that here. https://stackoverflow.com/questions/27157137/dplyr-only-returning-one-row-when-using-summarize

Thank you

Thank you!