Question: (Closed) how to calculate Average of purchases of distinct items in the session using dplyr
0
gravatar for shameenkhan075
16 months ago by
shameenkhan0750 wrote:

I have a data frame with 3 columns: session id, item id and class. In the class column, 0 represents 'not purchased' and 1 represents 'purchased'.

> data
    session id        item id   class
      1                 1         0
      1                 1         0
      1                 1         0
      2                 1         1
      2                 2         0
      3                 1         0
      3                 0         1
      3                 3         1
      3                 2         0

I would like to to calculate the average number of purchases of distinct items for each unique session id. in SessionID 1 contain only one unique item and 0 purchase so avg is 0/1=0 session id 2 contain 1 purchase and 2 unique items so avg will be 1/2=0.5 and session id 3 contain 2 purchases and 4 unique items so avg will be 2/4=0.5. The results would look like this:

>  result   
session id       avg
   1             0/1=0
   2             0.5
   3             0.5

I have tried this till now:

data %>% group_by(session_id) %>% summarise(avg = ifelse(length(Class==1))/length(unique(item_id)))

but got the error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: argument "yes" is missing, with no default.
dplyr R • 509 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by shameenkhan0750
1

I don't see what this has to do with bioinformatics, you are better off asking at https://stackoverflow.com/

ADD REPLYlink modified 16 months ago • written 16 months ago by Benn7.9k

Hello shameenkhan075!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Better ask https://stackoverflow.com/

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 16 months ago by Nicolas Rosewick8.6k

:( ok .......................

ADD REPLYlink written 16 months ago by shameenkhan0750
2
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

You can do this with the base R function, aggregate():

data
  session item class
1       1    1     0
2       1    1     0
3       1    1     0
4       2    1     1
5       2    2     0
6       3    1     0
7       3    0     1
8       3    3     1
9       3    2     0

aggregate(data[,3], by=data[1], FUN=mean)
  session   x
1       1 0.0
2       2 0.5
3       3 0.5

...or indeed dplyr:

require(dplyr)

data %>% group_by(session) %>% summarise(mean(class))
# A tibble: 3 x 2
  session `mean(class)`
    <dbl>         <dbl>
1       1           0  
2       2           0.5
3       3           0.5

Kevin

ADD COMMENTlink modified 16 months ago by zx87548.9k • written 16 months ago by Kevin Blighe53k

sir thanks alot but this is not what i mean.... i have edited the question plz see the question above thanks

ADD REPLYlink modified 16 months ago • written 16 months ago by shameenkhan0750

'Pure' R questions are typically frowned upon here, unless you can show a relation to bioinformatics.

Perhaps this is what you wanted:

itemsPerSession <- data.frame(data %>% group_by(session) %>% summarise(length(unique(item))))
itemsPerSession

  session length.unique.item..
1       1                    1
2       2                    2
3       3                    4
> 


countsPerSession <- data.frame(data %>% group_by(session) %>% summarise(sum(class)))
countsPerSession
  session sum.class.
1       1          0
2       2          1
3       3          2

result <- data.frame(
              itemsPerSession[,1],
              ifelse(
                 countsPerSession[,2] / itemsPerSession[,2] == 0,
                 paste0(as.character(countsPerSession[,2]), "/", as.character(itemsPerSession[,2])),
                 countsPerSession[,2] / itemsPerSession[,2]
              )
          )

colnames(result) <- c("session id", "avg")

result
  session id avg
1          1 0/1
2          2 0.5
3          3 0.5
ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe53k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour