Question

Group by column, summarise other columns, mean

0

Entering edit mode

4.7 years ago

bgraphit ▴ 20

Hi I need some advise as to how to go from TEST_play data frame to a data frame that includes individual name, and the average of the counts.1 counts.2 for that Peak.

Example:

 >head(TEST_play)
     name counts.1 counts.2
1 Peak160       97      487
2 Peak160      425      371
3 Peak328        0      104
4 Peak328       13       20
5 Peak344        2       39
6 Peak344        7       63

Desired output

>head(average_TEST_play)
     name counts.1 counts.2
1 Peak160       261    429   
2 Peak328        6.5      62
etc,,,,

,

> sapply(TEST_play, class)
     name  counts.1  counts.2
 "factor" "numeric" "numeric"

R • 964 views

ADD COMMENT • link updated 4.7 years ago by zx8754 11k • written 4.7 years ago by bgraphit ▴ 20

0

Entering edit mode

What have you tried? Look at dplyr's group_by() and summarise() functions. There are ways to do it in base R too, but this might be easier.

ADD REPLY • link 4.7 years ago by Ram 43k

score 2 · Answer 1 · 2019-08-05

2

Entering edit mode

4.7 years ago

Ram 43k

Here's a base R way to do this:

dummy_df<-read.table(text='"name"   "counts.1"  "counts.2"
+ "Peak160" 97  487
+ "Peak160" 425 371
+ "Peak328" 0   104
+ "Peak328" 13  20
+ "Peak344" 2   39
+ "Peak344" 7   63', sep="\t", header=TRUE)

aggregate(cbind(counts.1, counts.2) ~ name, data=dummy_df, FUN = mean)
     name counts.1 counts.2
1 Peak160    261.0      429
2 Peak328      6.5       62
3 Peak344      4.5       51

Using dplyr, that'd be:

library(dplyr)
dummy_df %>% group_by(name) %>% summarise(counts.1 = mean(counts.1), counts2 = mean(counts.2))

# A tibble: 3 x 3
  name    counts.1 counts2
  <fct>      <dbl>   <dbl>
1 Peak160    261       429
2 Peak328      6.5      62
3 Peak344      4.5      51

ADD COMMENT • link 4.7 years ago by Ram 43k

2

Entering edit mode

> library(dplyr)
> test %>%
+   group_by(name) %>%
+   summarise_all(mean)
# A tibble: 3 x 3
  name    counts.1 counts.2
  <chr>      <dbl>    <dbl>
1 Peak160    261        429
2 Peak328      6.5       62
3 Peak344      4.5       51

ADD REPLY • link 4.7 years ago by cpad0112 21k

0

Entering edit mode

Thank you, TIL summarise_all.

ADD REPLY • link 4.7 years ago by Ram 43k

score 1 · Answer 2 · 2019-08-06

with package doBy:

> test=read.csv("test.txt", sep = "\t", stringsAsFactors = F, header = T)
> test
             name counts.1 counts.2
        1 Peak160       97      487
        2 Peak160      425      371
        3 Peak328        0      104
        4 Peak328       13       20
        5 Peak344        2       39
        6 Peak344        7       63
> library(doBy)
> summaryBy(test[,-1] ~ name, test, FUN = mean, keep.names = T)
     name counts.1 counts.2
1 Peak160    261.0      429
2 Peak328      6.5       62
3 Peak344      4.5       51