Entering edit mode
4.9 years ago
xxxxxxxx
▴
20
My file is like this-
Pcol-patient
Mcol-Mutation
Pcol Mcol
P1 M1,M2,M5,M6
P2 M1,M2,M3,M5
P3 M4,M5,M7,M6
I want to find all the combination of Mcol elements and their frequency( combinatinatons that present in how many patients
) according to Pcol i,e patient.
Expected output-
Mcol freq
M1,M2 2
M1,M5 2
M1,M6 1
M2,M5 2
M2,M6 1
M5,M6 2
M1,M3 1
M2,M3 1
M4,M5 1
M4,M7 1
M4,M6 1
M7,M6 1
I have tried this-
x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame,
lapply(x$Mcol, function(i){
n <- sort(unlist(strsplit(i, ",")))
t(combn(n, 2))
}))
data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))
It doesn't give the expected output