Entering edit mode
4.1 years ago
ngcatung0
▴
20
So basically I have a sample dataframe that kinda looks like this:
Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
Akutan city NA NA NA NA NA NA 71
Alcan Border NA NA 2 NA NA NA NA
Alcan Border NA NA NA NA NA 2 NA
Alcan Border NA NA NA NA 5 NA NA
Ambler City 224 NA NA NA NA NA NA
Ambler City NA NA NA 17 NA NA NA
But with gene names rather than numbers.
Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name). Ideally it would look like:
Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
Akutan city NA NA NA NA NA NA 71
Alcan Border NA NA 2 NA 5 2 NA
Ambler City 224 NA NA 17 NA NA NA
AGAIN with gene names rather than numbers1
The following is my code:
df <- Good1_Poor3 %>% spread(key = Gene, value = consequences)
df <- df %>%
group_by(sample_id) %>%
summarise_if(
is.character,
sum,
na.rm = TRUE
)