But it gives jaccard coefficient 0 (means no similarity) but i know there is some overlap between the two text files. I am not able figure out whats the problem. Can any body suggest some solution or is there any other way to compute the jaccard coefficient? between the two text files with gene symbols.
In the example you give, it looks like the files contents are white-space separated but you're reading them as comma-separated, so first thing check that md1 and md2 contain what you expect them to.
Second, you can check intersection using the base R function intersect e.g.:
intersect(md1$V1,md2$V1)
Third, I suppose you're using the sets package. This package deals with sets of R objects so set(md1) creates a set of one R object, md1. What you probably meant is to create a set of gene names from md1, e.g.:
M1 <- as.set(md1$V1)
I think using a package here is overkill, you can easily compute the Jaccard index yourself from its definition:
I <- length(intersect(md1$V1,md2$V1))
S <- I/(length(md1$V1)+length(md2$V1)-I)
This looks good, can we promote this to answer?
Sure. No problem .
Hi Jean,
Thanks for the nice solution it worked out :)
Nitin