Question

Jaccard similarity in R

0

Entering edit mode

9.6 years ago

Nitin ▴ 170

Hello,

I have following two text files with some genes

Text file one Cd5l Mcm6 Wdhd1 Serpina4-ps1 Nop58 Ugt2b38 Prim1 Rrm1 Mcm2 Fgl1

Text file two Serpina4-ps1 Trib3 Alas1 Tsku Tnfaip2 Fgl1 Nop58 Socs2 Ppargc1b Per1 Inhba Nrep Irf1 Map3k5 Osgin1 Ugt2b37 Yod1

I want to compute jaccard similarity using R for this purpose I used sets package

md1<-read.csv("T1.csv",sep=",",header = FALSE)
M1<-set(md1)

md2<-read.csv("T2.csv",sep=",",header = FALSE)
M2<-set(md2)

Sim1<-set_similarity(M1,M2, method="Jaccard")

But it gives jaccard coefficient 0 (means no similarity) but i know there is some overlap between the two text files. I am not able figure out whats the problem. Can any body suggest some solution or is there any other way to compute the jaccard coefficient? between the two text files with gene symbols.

Thanks,

R • 17k views

ADD COMMENT • link 9.6 years ago by Nitin ▴ 170

score 5 · Accepted Answer · 2016-04-19

5

Entering edit mode

9.6 years ago

Jean-Karim Heriche 27k

In the example you give, it looks like the files contents are white-space separated but you're reading them as comma-separated, so first thing check that md1 and md2 contain what you expect them to. Second, you can check intersection using the base R function intersect e.g.:

intersect(md1$V1,md2$V1)

Third, I suppose you're using the sets package. This package deals with sets of R objects so set(md1) creates a set of one R object, md1. What you probably meant is to create a set of gene names from md1, e.g.:

M1 <- as.set(md1$V1)

I think using a package here is overkill, you can easily compute the Jaccard index yourself from its definition:

I <- length(intersect(md1$V1,md2$V1))
S <- I/(length(md1$V1)+length(md2$V1)-I)

ADD COMMENT • link 9.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

This looks good, can we promote this to answer?

ADD REPLY • link 9.6 years ago by Michael 56k

0

Entering edit mode

Sure. No problem .

ADD REPLY • link 9.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hi Jean,

Thanks for the nice solution it worked out :)

Nitin

ADD REPLY • link 9.6 years ago by Nitin ▴ 170