summarize consecutive columns over a data frame
2
0
Entering edit mode
7 months ago
Assa Yeroslaviz ★ 1.8k

I am looking for a way to summarise consecutive columns in a data frame, meaning summarise col2 + col1, col3+col2+col1, col4+col3+col2+col1, etc.

the output of each new column should be the sum of all previous columns.

I'm trying to combine the across command from tidyverse, but I can't get it right.

any help would be appreciated.

thanks

Assa

tidyverse r offtopic • 855 views
ADD COMMENT
1
Entering edit mode

This post is more suitable at SO.

ADD REPLY
0
Entering edit mode

This post does not fit the theme of this forum.

ADD REPLY
1
Entering edit mode
7 months ago
ATpoint 82k

Define starting vector i <- c(1,2) and then use a loop that in each iteration adds 1 to the largest element if the vector and appends it to the vector. Use colSums or whatever makes sense for the summarization after subsetting df[,i,drop=FALSE]. An experienced user like you will for sure get this going.

Edit: Here is one way to do this with a two-liner using base R:

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

# actual code
iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))
ADD COMMENT
0
Entering edit mode

Well, i have sort of a solution, but I find it be very "clunky" (dirty).

$test.df
   X1 X2 X3
1  1  2  3
2  2  2  1
3  1  3  1
4  3  1  1

this is my solutiuon

test.df <- read.delim2(file = "test.df.txt")
i <- c(2:3)
tmp <- data.frame()
tmp2<- list()
tmp2[[1]]<- test.df[,1]
for (col_i in i) {
  print(col_i)
  tmp <- apply(test.df[1:col_i], 1, sum)
  tmp2[[col_i]] <- tmp

}

But I was hoping to find something more elegant and slick, if possible.

ADD REPLY
0
Entering edit mode

There are many ways of how this can be done.

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))
ADD REPLY
1
Entering edit mode
7 months ago
bk11 ★ 2.4k

You can do like this-

df <- as.data.frame(matrix(round(runif(n=10, min=1, max=20), 0), nrow=5, ncol=4))
df
V1 V2 V3 V4
1 14  5 14  5
2 20  6 20  6
3 15 17 15 17
4 12 14 12 14
5 17  6 17  6

result=cbind(
  df,
  setNames(
    as.data.frame(Reduce(`+`, df, accumulate = TRUE)),
    Reduce(paste0, names(df), accumulate = TRUE)
  )
)
result
   V1 V2 V3 V4 V1 V1V2 V1V2V3 V1V2V3V4
1  2 15  2 15  2   17     19       34
2  4  1  4  1  4    5      9       10
3  5  8  5  8  5   13     18       26
4 10 11 10 11 10   21     31       42
5  5  1  5  1  5    6     11       12
ADD COMMENT

Login before adding your answer.

Traffic: 2716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6