Question: (Closed) how to calculate Average of purchases of distinct items in the session using dplyr
0
gravatar for shameenkhan075
2.0 years ago by
shameenkhan0750 wrote:

I have a data frame with 3 columns: session id, item id and class. In the class column, 0 represents 'not purchased' and 1 represents 'purchased'.

> data
    session id        item id   class
      1                 1         0
      1                 1         0
      1                 1         0
      2                 1         1
      2                 2         0
      3                 1         0
      3                 0         1
      3                 3         1
      3                 2         0

I would like to to calculate the average number of purchases of distinct items for each unique session id. in SessionID 1 contain only one unique item and 0 purchase so avg is 0/1=0 session id 2 contain 1 purchase and 2 unique items so avg will be 1/2=0.5 and session id 3 contain 2 purchases and 4 unique items so avg will be 2/4=0.5. The results would look like this:

>  result   
session id       avg
   1             0/1=0
   2             0.5
   3             0.5

I have tried this till now:

data %>% group_by(session_id) %>% summarise(avg = ifelse(length(Class==1))/length(unique(item_id)))

but got the error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: argument "yes" is missing, with no default.
dplyr R • 743 views
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by shameenkhan0750
1

I don't see what this has to do with bioinformatics, you are better off asking at https://stackoverflow.com/

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Benn8.0k

Hello shameenkhan075!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Better ask https://stackoverflow.com/

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 2.0 years ago by Nicolas Rosewick9.0k

:( ok .......................

ADD REPLYlink written 2.0 years ago by shameenkhan0750
2
gravatar for Kevin Blighe
2.0 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

You can do this with the base R function, aggregate():

data
  session item class
1       1    1     0
2       1    1     0
3       1    1     0
4       2    1     1
5       2    2     0
6       3    1     0
7       3    0     1
8       3    3     1
9       3    2     0

aggregate(data[,3], by=data[1], FUN=mean)
  session   x
1       1 0.0
2       2 0.5
3       3 0.5

...or indeed dplyr:

require(dplyr)

data %>% group_by(session) %>% summarise(mean(class))
# A tibble: 3 x 2
  session `mean(class)`
    <dbl>         <dbl>
1       1           0  
2       2           0.5
3       3           0.5

Kevin

ADD COMMENTlink modified 2.0 years ago by zx87549.6k • written 2.0 years ago by Kevin Blighe65k

sir thanks alot but this is not what i mean.... i have edited the question plz see the question above thanks

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by shameenkhan0750

'Pure' R questions are typically frowned upon here, unless you can show a relation to bioinformatics.

Perhaps this is what you wanted:

itemsPerSession <- data.frame(data %>% group_by(session) %>% summarise(length(unique(item))))
itemsPerSession

  session length.unique.item..
1       1                    1
2       2                    2
3       3                    4
> 


countsPerSession <- data.frame(data %>% group_by(session) %>% summarise(sum(class)))
countsPerSession
  session sum.class.
1       1          0
2       2          1
3       3          2

result <- data.frame(
              itemsPerSession[,1],
              ifelse(
                 countsPerSession[,2] / itemsPerSession[,2] == 0,
                 paste0(as.character(countsPerSession[,2]), "/", as.character(itemsPerSession[,2])),
                 countsPerSession[,2] / itemsPerSession[,2]
              )
          )

colnames(result) <- c("session id", "avg")

result
  session id avg
1          1 0/1
2          2 0.5
3          3 0.5
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Kevin Blighe65k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 756 users visited in the last hour