I am trying to sum up the bp length of each category per genomic region in R, this is the dataframe
col1    col2     col3    col4    col5    col6
chr2    33739    34739    exon    SINE    69
chr2    111204    112204    exon    SINE    78
chr2    508422    509422    exon    L1    152
chr3    701525    702525   intron    LINE    84
chr3    701525    702525    intron    LINE     112
chr3    863200    864200    UTR    LINE    32
I want to sum up the length (col6) of each category in col5 per genomic region in col4, so I have 2 conditions for grouping data: col4 & col5 and the function is to sum up values of col6
I have tried this code from package "dplyr"
sum = df1 %>%
  group_by(col4, col5) %>% summarise(df1, sum(col6))
the error is
Error in quickdf(.data[names(cols)]) : length(rows) == 1 is not TRUE
I have also tried grouping by only one column, but same error
also as an alternative, I am wondering, can I use the combination of rowsums() and unique for this? and if yes, what would be the code?
with R
with datamash:
I have this error