Find mismatch in two columns in a data frame in R
6.6 years ago
dirranrak ▴ 20

Hi all,

I am quite new with R. I looked for the answer in many website but I didn't find a clear way to solve my problem. I have a data frame with two columns with each column has a list of SNPs in more than 1000 rows but not the same number of row. SNP1 SNP2 rs3094315 rs3094315 rs3131972 rs3131972 rs11240777 rs11240777 rs6681022 rs6681049 rs4970383 rs4970383 rs7537756 rs7537756 rs13302982
I did > match(df$SNP1, df$SNP2) and find the indices of row having NA value which is the mismatch. But now, I want to get the rs# instead of the indices of the rows. How can I get this rs# instead of row indices?

Thank you

6.6 years ago
dan.shea ▴ 10

If I understand your question correctly, you want everything in df$SNP1 that is not in df$SNP2.

Small example using two vectors:

a <-c('a','b','c','d','e')
b <-c('a','b','d','e')

> a[a %in% b]
[1] "a" "b" "d" "e"
> a[!(a %in% b)]
[1] "c"


Read the R documentation on value matching found https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html

If you use %in% you will get a logical vector back of TRUE and FALSE values that you can then use to access the values in the column.

Here is the same data as a data frame if it helps visualize what is going on:

> a <-c('a','b','c','d','e')
> b <-c('a','b','d','e', NA)
> a[!(a %in% b)]
[1] "c"
> ab <- data.frame(a,b)
> ab$a[!(ab$a %in% ab\$b)]
[1] c
Levels: a b c d e
Hi dan.shea, I have similar prob. Actually i want extract unique value from ColumnA compared with B,C,D,E,F. Means the want to extract gene name which is present in columnA which is not present in any other remaining 5 columns. Thanks.

Hi dan.shea,

Thank you very much, you save my day. I was using excel and waiting almost a day for the comparison because of the huge data. And tried to find how to do it with R.

Thank you