R, group data based on names and sum each group
1
0
Entering edit mode
4 weeks ago
mthm ▴ 40

I am trying to sum up the bp length of each category per genomic region in R, this is the dataframe

col1    col2     col3    col4    col5    col6
chr2    33739    34739    exon    SINE    69
chr2    111204    112204    exon    SINE    78
chr2    508422    509422    exon    L1    152
chr3    701525    702525   intron    LINE    84
chr3    701525    702525    intron    LINE     112
chr3    863200    864200    UTR    LINE    32

I want to sum up the length (col6) of each category in col5 per genomic region in col4, so I have 2 conditions for grouping data: col4 & col5 and the function is to sum up values of col6

I have tried this code from package "dplyr"

sum = df1 %>%
  group_by(col4, col5) %>% summarise(df1, sum(col6))

the error is

Error in quickdf(.data[names(cols)]) : length(rows) == 1 is not TRUE

I have also tried grouping by only one column, but same error

also as an alternative, I am wondering, can I use the combination of rowsums() and unique for this? and if yes, what would be the code?

manipulation R data • 386 views
ADD COMMENT
0
Entering edit mode

with R

library(dplyr)
df %>% 
  group_by(col4,col5) %>% 
  mutate(sum=sum(col6)) %>% 
  ungroup()

> aggregate(col6 ~ col5+col4, df, sum)
  col5   col4 col6
1   L1   exon  152
2 SINE   exon  147
3 LINE intron  196
4 LINE    UTR   32

with datamash:

$ datamash -sH  -g 4,5 sum 6 < test.txt

GroupBy(col4)   GroupBy(col5)   sum(col6)
UTR LINE    32
exon    L1  152
exon    SINE    147
intron  LINE    196
ADD REPLY
0
Entering edit mode

I have this error

Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "character"
ADD REPLY
1
Entering edit mode
4 weeks ago
Trivas ▴ 370

Your issue is with piping into the summarise function. You are calling your dataframe df1 a second time. If you do:

df %>% group_by(col4, col5) %>% summarise(sum(col6))

it should work perfectly.

ADD COMMENT
0
Entering edit mode
Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "character"
ADD REPLY
0
Entering edit mode

My guess is that your df1 isn't actually a dataframe or tibble object. It might have gotten converted while troubleshooting. Can you post your exact code that you used that gave you this error? Have you tried restarting R and repeating your pipeline?

ADD REPLY
0
Entering edit mode

it worked thanks

ADD REPLY

Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6