Question: How do you collapse multiple rows based on multiple columns in r?
0
gravatar for ngcatung0
10 months ago by
ngcatung020
ngcatung020 wrote:

So basically I have a dataframe that kinda looks like this:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71
Alcan Border  NA    NA         2       NA  NA    NA    NA               
Alcan Border  NA    NA         NA      NA  NA    2     NA            
Alcan Border  NA    NA         NA      NA  5     NA    NA
Ambler City   224   NA         NA      NA  NA    NA    NA
Ambler City   NA    NA         NA      17  NA    NA    NA

Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name). Ideally it would look like:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71              
Alcan Border  NA    NA         2       NA  5     2     NA            
Ambler City   224   NA         NA      17  NA    NA    NA

I tried using the following code but im getting an error:

setDT(df)[,lapply(.SD, function(x) ifelseallis.na(x)), NA_integer_, sum("x", na.rm = T))), 
    by = sample_id]
data2 <- column_to_rownames(data2, 'Community')

Error in sum("x", na.rm = T) : invalid 'type' (character) of argument
R • 581 views
ADD COMMENTlink modified 10 months ago by Kevin Blighe69k • written 10 months ago by ngcatung020

Hi, After using df %>% group_by(Community) %>% summarise_if( is.numeric, sum, na.rm = TRUE )

It gave me the following output:

Community Akutan city
Alcan Border
Alcan Border
Alcan Border Ambler City
Ambler City

Please assist!

ADD REPLYlink written 10 months ago by ngcatung020

Help me help you. Could you use dput on a subset of your data and put the result into your original question: so that I have some example data to work with

ADD REPLYlink written 10 months ago by russhh5.5k
0
gravatar for russhh
10 months ago by
russhh5.5k
UK, U. Glasgow
russhh5.5k wrote:

I believe you should be able to do the following:

library(dplyr)
df %>%
  group_by(Community) %>%
  summarise_if(
    is.numeric,
    sum,
    na.rm = TRUE
  )

But, I'm not sure I understand why you would have multiple rows for the same Community in this data-frame, and would recommend you remove the median age column from the output (because there's no sensible way to summarise the median ages unless they're identical in all rows for a given Community).

ADD COMMENTlink modified 10 months ago • written 10 months ago by russhh5.5k

Hi, After using df %>% group_by(Community) %>% summarise_if( is.numeric, sum, na.rm = TRUE ) It gave me the following output: Community Akutan city Alcan Border Alcan Border Alcan Border Ambler City Ambler City Please assist!

ADD REPLYlink written 10 months ago by ngcatung020
0
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

You can use aggregate() to indirectly get what you want, I think:

x
     Community Pop_Total Median_Age Under_5 X5.9 X10.14 X15.19 X20.24
1  Akutan city        NA         NA      NA   NA     NA     NA     71
2 Alcan Border        NA         NA       2   NA     NA     NA     NA
3 Alcan Border        NA         NA      NA   NA     NA      2     NA
4 Alcan Border        NA         NA      NA   NA      5     NA     NA
5  Ambler City       224         NA      NA   NA     NA     NA     NA
6  Ambler City        NA         NA      NA   17     NA     NA     NA

aggregate(x[,2:ncol(x)], by = x[1], function(x) sum(x, na.rm = TRUE))
     Community Pop_Total Median_Age Under_5 X5.9 X10.14 X15.19 X20.24
1  Akutan city         0          0       0    0      0      0     71
2 Alcan Border         0          0       2    0      5      2      0
3  Ambler City       224          0       0   17      0      0      0

----------

To change those zeros to NA values:

x <- aggregate(x[,2:ncol(x)], by = x[1], function(x) sum(x, na.rm = TRUE))

x[x == 0] <- NA
ADD COMMENTlink modified 10 months ago • written 10 months ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2424 users visited in the last hour
_