Question

Off topic:How do you collapse multiple rows based on multiple columns in r?

0

Entering edit mode

4.1 years ago

ngcatung0 ▴ 20

So basically I have a sample dataframe that kinda looks like this:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 

Akutan city   NA    NA         NA      NA  NA    NA    71
Alcan Border  NA    NA         2       NA  NA    NA    NA               
Alcan Border  NA    NA         NA      NA  NA    2     NA            
Alcan Border  NA    NA         NA      NA  5     NA    NA
Ambler City   224   NA         NA      NA  NA    NA    NA
Ambler City   NA    NA         NA      17  NA    NA    NA

But with gene names rather than numbers.

Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name). Ideally it would look like:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71              
Alcan Border  NA    NA         2       NA  5     2     NA            
Ambler City   224   NA         NA      17  NA    NA    NA

AGAIN with gene names rather than numbers1

The following is my code:

df <- Good1_Poor3 %>% spread(key = Gene, value = consequences)

df <- df %>%
  group_by(sample_id) %>%
  summarise_if(
    is.character,
    sum,
    na.rm = TRUE
  )

R Genomics • 2.6k views

ADD COMMENT • link updated 4.1 years ago by Kevin Blighe 87k • written 4.1 years ago by ngcatung0 ▴ 20