I have a very large dataframe with 396212 observations (rows) and 13 variables (columns), including organism name, antibiotic name, gene name and location.
I want to extract the unique observations (i.e. organism name) from variable X to populate another dataframe - essentially create a new dataframe with each unique antibiotic as the row and then a column for each unique organism and fill it with yes/No as for whether it covers that organism.
Example data frame
df <- data.frame(Organism = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
Antibiotic = c("X", "Y", "Z", "X", "X", "X", "X", "Y", "Y", "Z", "X", "Y"))
I have made a new data frame with the unique antibiotics as rows and organism names as columns and filled with NA
, but I don't know how to extract information from the first data frame to populate the second
path_abx <- data.frame(Antibiotic = unique(df$Antibiotic))
path_abx$A <- NA
path_abx$B <- NA
path_abx$C <- NA
```r
My question is which pathogen is affected/targeted by each antibiotic.
In the new dataframe (path_abx) I want to fill in the observations for each antibiotic and pathogen/organism as either 'yes' or the organism name, based on whether it appears in the original dataframe (df) in the same row as the antibiotic name. The actual dataframe has over 300,000 observations, (jncluding 39 antibiotics and 11 organisms) so I can't just do it manually.
I have tried using `unique`, `select`, `filter`, `n_distinct`, `if/then` and `for` loops but I can't get what I want and don't know what to do. I'm sure the if/then or for loop is the way to go but I don't really know where to start with this.
```r
test <- df %>% group_by(Organism) %>% filter(Antibiotic=="X" & Organism =="A", ignore.case = TRUE)
test <-
if (df$Organism(grepl("B", ignore.case = TRUE)))
{
print(df$Ecoli, "E.coli")
}
Etc - (I know the syntax is wrong)
Once I've figured this out I then need to do the same thing (which pathogen is affected?) for each gene (1400 genes).
I'm really stumped so would appreciate any pointers!
Thank you :)