summarize consecutive columns over a data frame
2
0
Entering edit mode
11 months ago
Assa Yeroslaviz ★ 1.9k

I am looking for a way to summarise consecutive columns in a data frame, meaning summarise col2 + col1, col3+col2+col1, col4+col3+col2+col1, etc.

the output of each new column should be the sum of all previous columns.

I'm trying to combine the across command from tidyverse, but I can't get it right.

any help would be appreciated.

thanks

Assa

tidyverse r offtopic • 1.0k views
1
Entering edit mode

This post is more suitable at SO.

0
Entering edit mode

This post does not fit the theme of this forum.

1
Entering edit mode
11 months ago
ATpoint 84k

Define starting vector i <- c(1,2) and then use a loop that in each iteration adds 1 to the largest element if the vector and appends it to the vector. Use colSums or whatever makes sense for the summarization after subsetting df[,i,drop=FALSE]. An experienced user like you will for sure get this going.

Edit: Here is one way to do this with a two-liner using base R:

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

# actual code
iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))

0
Entering edit mode

Well, i have sort of a solution, but I find it be very "clunky" (dirty).

\$test.df
X1 X2 X3
1  1  2  3
2  2  2  1
3  1  3  1
4  3  1  1


this is my solutiuon

test.df <- read.delim2(file = "test.df.txt")
i <- c(2:3)
tmp <- data.frame()
tmp2<- list()
tmp2[[1]]<- test.df[,1]
for (col_i in i) {
print(col_i)
tmp <- apply(test.df[1:col_i], 1, sum)
tmp2[[col_i]] <- tmp

}


But I was hoping to find something more elegant and slick, if possible.

0
Entering edit mode

There are many ways of how this can be done.

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))

1
Entering edit mode
11 months ago
bk11 ★ 2.8k

You can do like this-

df <- as.data.frame(matrix(round(runif(n=10, min=1, max=20), 0), nrow=5, ncol=4))
df
V1 V2 V3 V4
1 14  5 14  5
2 20  6 20  6
3 15 17 15 17
4 12 14 12 14
5 17  6 17  6

result=cbind(
df,
setNames(
as.data.frame(Reduce(+, df, accumulate = TRUE)),
Reduce(paste0, names(df), accumulate = TRUE)
)
)
result
V1 V2 V3 V4 V1 V1V2 V1V2V3 V1V2V3V4
1  2 15  2 15  2   17     19       34
2  4  1  4  1  4    5      9       10
3  5  8  5  8  5   13     18       26
4 10 11 10 11 10   21     31       42
5  5  1  5  1  5    6     11       12