How do you collapse multiple rows based on multiple columns in r?
2
0
Entering edit mode
4.1 years ago
ngcatung0 ▴ 20

So basically I have a dataframe that kinda looks like this:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71
Alcan Border  NA    NA         2       NA  NA    NA    NA               
Alcan Border  NA    NA         NA      NA  NA    2     NA            
Alcan Border  NA    NA         NA      NA  5     NA    NA
Ambler City   224   NA         NA      NA  NA    NA    NA
Ambler City   NA    NA         NA      17  NA    NA    NA

Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name). Ideally it would look like:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71              
Alcan Border  NA    NA         2       NA  5     2     NA            
Ambler City   224   NA         NA      17  NA    NA    NA

I tried using the following code but im getting an error:

setDT(df)[,lapply(.SD, function(x) ifelseallis.na(x)), NA_integer_, sum("x", na.rm = T))), 
    by = sample_id]
data2 <- column_to_rownames(data2, 'Community')

Error in sum("x", na.rm = T) : invalid 'type' (character) of argument
R • 10k views
ADD COMMENT
0
Entering edit mode

Hi, After using df %>% group_by(Community) %>% summarise_if( is.numeric, sum, na.rm = TRUE )

It gave me the following output:

Community Akutan city
Alcan Border
Alcan Border
Alcan Border Ambler City
Ambler City

Please assist!

ADD REPLY
0
Entering edit mode

Help me help you. Could you use dput on a subset of your data and put the result into your original question: so that I have some example data to work with

ADD REPLY
0
Entering edit mode
4.1 years ago
russhh 5.7k

I believe you should be able to do the following:

library(dplyr)
df %>%
  group_by(Community) %>%
  summarise_if(
    is.numeric,
    sum,
    na.rm = TRUE
  )

But, I'm not sure I understand why you would have multiple rows for the same Community in this data-frame, and would recommend you remove the median age column from the output (because there's no sensible way to summarise the median ages unless they're identical in all rows for a given Community).

ADD COMMENT
0
Entering edit mode

Hi, After using df %>% group_by(Community) %>% summarise_if( is.numeric, sum, na.rm = TRUE ) It gave me the following output: Community Akutan city Alcan Border Alcan Border Alcan Border Ambler City Ambler City Please assist!

ADD REPLY
0
Entering edit mode
4.1 years ago

You can use aggregate() to indirectly get what you want, I think:

x
     Community Pop_Total Median_Age Under_5 X5.9 X10.14 X15.19 X20.24
1  Akutan city        NA         NA      NA   NA     NA     NA     71
2 Alcan Border        NA         NA       2   NA     NA     NA     NA
3 Alcan Border        NA         NA      NA   NA     NA      2     NA
4 Alcan Border        NA         NA      NA   NA      5     NA     NA
5  Ambler City       224         NA      NA   NA     NA     NA     NA
6  Ambler City        NA         NA      NA   17     NA     NA     NA

aggregate(x[,2:ncol(x)], by = x[1], function(x) sum(x, na.rm = TRUE))
     Community Pop_Total Median_Age Under_5 X5.9 X10.14 X15.19 X20.24
1  Akutan city         0          0       0    0      0      0     71
2 Alcan Border         0          0       2    0      5      2      0
3  Ambler City       224          0       0   17      0      0      0

----------

To change those zeros to NA values:

x <- aggregate(x[,2:ncol(x)], by = x[1], function(x) sum(x, na.rm = TRUE))

x[x == 0] <- NA
ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6