Question: (Closed) how to calculate Average of purchases of distinct items in the session using dplyr
0
16 months ago by
shameenkhan0750 wrote:

I have a data frame with 3 columns: session id, item id and class. In the class column, 0 represents 'not purchased' and 1 represents 'purchased'.

``````> data
session id        item id   class
1                 1         0
1                 1         0
1                 1         0
2                 1         1
2                 2         0
3                 1         0
3                 0         1
3                 3         1
3                 2         0
``````

I would like to to calculate the average number of purchases of distinct items for each unique session id. in SessionID 1 contain only one unique item and 0 purchase so avg is 0/1=0 session id 2 contain 1 purchase and 2 unique items so avg will be 1/2=0.5 and session id 3 contain 2 purchases and 4 unique items so avg will be 2/4=0.5. The results would look like this:

``````>  result
session id       avg
1             0/1=0
2             0.5
3             0.5
``````

I have tried this till now:

``````data %>% group_by(session_id) %>% summarise(avg = ifelse(length(Class==1))/length(unique(item_id)))
``````

but got the error:

``````Error in summarise_impl(.data, dots) :
Evaluation error: argument "yes" is missing, with no default.
``````
dplyr R • 509 views
modified 16 months ago • written 16 months ago by shameenkhan0750
1

I don't see what this has to do with bioinformatics, you are better off asking at https://stackoverflow.com/

Hello shameenkhan075!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Better ask https://stackoverflow.com/

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

:( ok .......................

2
16 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

You can do this with the base R function, `aggregate()`:

``````data
session item class
1       1    1     0
2       1    1     0
3       1    1     0
4       2    1     1
5       2    2     0
6       3    1     0
7       3    0     1
8       3    3     1
9       3    2     0

aggregate(data[,3], by=data[1], FUN=mean)
session   x
1       1 0.0
2       2 0.5
3       3 0.5
``````

...or indeed `dplyr`:

``````require(dplyr)

data %>% group_by(session) %>% summarise(mean(class))
# A tibble: 3 x 2
session `mean(class)`
<dbl>         <dbl>
1       1           0
2       2           0.5
3       3           0.5
``````

Kevin

sir thanks alot but this is not what i mean.... i have edited the question plz see the question above thanks

'Pure' R questions are typically frowned upon here, unless you can show a relation to bioinformatics.

Perhaps this is what you wanted:

``````itemsPerSession <- data.frame(data %>% group_by(session) %>% summarise(length(unique(item))))
itemsPerSession

session length.unique.item..
1       1                    1
2       2                    2
3       3                    4
>

countsPerSession <- data.frame(data %>% group_by(session) %>% summarise(sum(class)))
countsPerSession
session sum.class.
1       1          0
2       2          1
3       3          2

result <- data.frame(
itemsPerSession[,1],
ifelse(
countsPerSession[,2] / itemsPerSession[,2] == 0,
paste0(as.character(countsPerSession[,2]), "/", as.character(itemsPerSession[,2])),
countsPerSession[,2] / itemsPerSession[,2]
)
)

colnames(result) <- c("session id", "avg")

result
session id avg
1          1 0/1
2          2 0.5
3          3 0.5
``````