Question: (Closed) how to calculate Average of purchases of distinct items in the session using dplyr
0
2.0 years ago by
shameenkhan0750 wrote:

I have a data frame with 3 columns: session id, item id and class. In the class column, 0 represents 'not purchased' and 1 represents 'purchased'.

``````> data
session id        item id   class
1                 1         0
1                 1         0
1                 1         0
2                 1         1
2                 2         0
3                 1         0
3                 0         1
3                 3         1
3                 2         0
``````

I would like to to calculate the average number of purchases of distinct items for each unique session id. in SessionID 1 contain only one unique item and 0 purchase so avg is 0/1=0 session id 2 contain 1 purchase and 2 unique items so avg will be 1/2=0.5 and session id 3 contain 2 purchases and 4 unique items so avg will be 2/4=0.5. The results would look like this:

``````>  result
session id       avg
1             0/1=0
2             0.5
3             0.5
``````

I have tried this till now:

``````data %>% group_by(session_id) %>% summarise(avg = ifelse(length(Class==1))/length(unique(item_id)))
``````

but got the error:

``````Error in summarise_impl(.data, dots) :
Evaluation error: argument "yes" is missing, with no default.
``````
dplyr R • 743 views
modified 2.0 years ago • written 2.0 years ago by shameenkhan0750
1

I don't see what this has to do with bioinformatics, you are better off asking at https://stackoverflow.com/

Hello shameenkhan075!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Better ask https://stackoverflow.com/

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

:( ok .......................

2
2.0 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

You can do this with the base R function, `aggregate()`:

``````data
session item class
1       1    1     0
2       1    1     0
3       1    1     0
4       2    1     1
5       2    2     0
6       3    1     0
7       3    0     1
8       3    3     1
9       3    2     0

aggregate(data[,3], by=data[1], FUN=mean)
session   x
1       1 0.0
2       2 0.5
3       3 0.5
``````

...or indeed `dplyr`:

``````require(dplyr)

data %>% group_by(session) %>% summarise(mean(class))
# A tibble: 3 x 2
session `mean(class)`
<dbl>         <dbl>
1       1           0
2       2           0.5
3       3           0.5
``````

Kevin

sir thanks alot but this is not what i mean.... i have edited the question plz see the question above thanks

'Pure' R questions are typically frowned upon here, unless you can show a relation to bioinformatics.

Perhaps this is what you wanted:

``````itemsPerSession <- data.frame(data %>% group_by(session) %>% summarise(length(unique(item))))
itemsPerSession

session length.unique.item..
1       1                    1
2       2                    2
3       3                    4
>

countsPerSession <- data.frame(data %>% group_by(session) %>% summarise(sum(class)))
countsPerSession
session sum.class.
1       1          0
2       2          1
3       3          2

result <- data.frame(
itemsPerSession[,1],
ifelse(
countsPerSession[,2] / itemsPerSession[,2] == 0,
paste0(as.character(countsPerSession[,2]), "/", as.character(itemsPerSession[,2])),
countsPerSession[,2] / itemsPerSession[,2]
)
)

colnames(result) <- c("session id", "avg")

result
session id avg
1          1 0/1
2          2 0.5
3          3 0.5
``````