Question: (Closed) how to calculate Average of purchases of distinct items in the session using dplyr
0
gravatar for shameenkhan075
19 months ago by
shameenkhan0750 wrote:

I have a data frame with 3 columns: session id, item id and class. In the class column, 0 represents 'not purchased' and 1 represents 'purchased'.

> data
    session id        item id   class
      1                 1         0
      1                 1         0
      1                 1         0
      2                 1         1
      2                 2         0
      3                 1         0
      3                 0         1
      3                 3         1
      3                 2         0

I would like to to calculate the average number of purchases of distinct items for each unique session id. in SessionID 1 contain only one unique item and 0 purchase so avg is 0/1=0 session id 2 contain 1 purchase and 2 unique items so avg will be 1/2=0.5 and session id 3 contain 2 purchases and 4 unique items so avg will be 2/4=0.5. The results would look like this:

>  result   
session id       avg
   1             0/1=0
   2             0.5
   3             0.5

I have tried this till now:

data %>% group_by(session_id) %>% summarise(avg = ifelse(length(Class==1))/length(unique(item_id)))

but got the error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: argument "yes" is missing, with no default.
dplyr R • 576 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by shameenkhan0750
1

I don't see what this has to do with bioinformatics, you are better off asking at https://stackoverflow.com/

ADD REPLYlink modified 19 months ago • written 19 months ago by Benn7.9k

Hello shameenkhan075!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Better ask https://stackoverflow.com/

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 19 months ago by Nicolas Rosewick8.7k

:( ok .......................

ADD REPLYlink written 19 months ago by shameenkhan0750
2
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

You can do this with the base R function, aggregate():

data
  session item class
1       1    1     0
2       1    1     0
3       1    1     0
4       2    1     1
5       2    2     0
6       3    1     0
7       3    0     1
8       3    3     1
9       3    2     0

aggregate(data[,3], by=data[1], FUN=mean)
  session   x
1       1 0.0
2       2 0.5
3       3 0.5

...or indeed dplyr:

require(dplyr)

data %>% group_by(session) %>% summarise(mean(class))
# A tibble: 3 x 2
  session `mean(class)`
    <dbl>         <dbl>
1       1           0  
2       2           0.5
3       3           0.5

Kevin

ADD COMMENTlink modified 19 months ago by zx87549.1k • written 19 months ago by Kevin Blighe56k

sir thanks alot but this is not what i mean.... i have edited the question plz see the question above thanks

ADD REPLYlink modified 19 months ago • written 19 months ago by shameenkhan0750

'Pure' R questions are typically frowned upon here, unless you can show a relation to bioinformatics.

Perhaps this is what you wanted:

itemsPerSession <- data.frame(data %>% group_by(session) %>% summarise(length(unique(item))))
itemsPerSession

  session length.unique.item..
1       1                    1
2       2                    2
3       3                    4
> 


countsPerSession <- data.frame(data %>% group_by(session) %>% summarise(sum(class)))
countsPerSession
  session sum.class.
1       1          0
2       2          1
3       3          2

result <- data.frame(
              itemsPerSession[,1],
              ifelse(
                 countsPerSession[,2] / itemsPerSession[,2] == 0,
                 paste0(as.character(countsPerSession[,2]), "/", as.character(itemsPerSession[,2])),
                 countsPerSession[,2] / itemsPerSession[,2]
              )
          )

colnames(result) <- c("session id", "avg")

result
  session id avg
1          1 0/1
2          2 0.5
3          3 0.5
ADD REPLYlink modified 19 months ago • written 19 months ago by Kevin Blighe56k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour