Question: How to extract unique values between columns in R ?
0
11 months ago by
k.kathirvel93200
India
k.kathirvel93200 wrote:

I want to extract (as a file) only unique values from 6 columns( 1st column vs (2-6) column) in a data frame. The unique value from one column that does not present any other column. Thanks in advance.

``````    A         B         C         D          E       F

12         15        18        55         27      13
15         25        10        21         23      20
20         18        14        25         15      25
25         27        30        35         25      10
35                           15
``````

The output should looks like this...

``````   A          B         C         D         E          F
12          -         14        55        23        13
30        21
``````
R gene • 1.2k views
modified 11 months ago by jomo018480 • written 11 months ago by k.kathirvel93200

Could you please elaborate the expected output, because it is bit confusing.

None of the record in output is maintaining the order as input, neither rowise nor columnwise.

Sorry for the confusion i have modified the post now. Thanks.

Your output isn't a data-frame, it has different vector lengths for each entry. Could you confirm why 13 is in the output for `B` given that it was absent from the input; and why 35 was filtered out of column C given that it is absent from column A. You might be better to generate a data.frame with a column containinng the unique numbers, and a column indicating whether the number was observed in `A` .

1
11 months ago by
jomo018480
jomo018480 wrote:

Assuming the input structure is cast-ed into a data frame (`df`) by filling `NA`.

``````uu = unlist(df)[ ! duplicated(unlist(df),fromLast=T)  &  ! duplicated(unlist(df))]

apply(df,2,FUN=function(x){r=rep(NA,nrow(df)) ; d=duplicated(c(x,uu),fromLast=T)[1:nrow(df)] ; r[d]=x[d] ; return( r[order(r)] )})
``````

`uu` is a vector with unique values (exclusive unique as defined by OP). This requires `!duplicated` from left-to-right AND right-to-left.

`apply` returns items from each column which are "legal" (appear in `uu`). The final `order(r)` ensures `NA` are pushed down in each column of the resulting data frame.

0
11 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

I think a problem is that the answer to this is not unique. For instance, this is also a "correct" answer based on your criteria:

``````A       B       C       D       E       F
12      15      10      55      27      20
25      14      21      23      13
18      35
30
``````

What gets filtered can depend on the order in which elements are added to a set (and subsequently tested for membership).

In any case, with that caveat, here is a Python script you could use to possibly generate this kind of result:

``````#!/usr/bin/env python

import sys

lidx = 0
allValues = set()
perColValues = None

for line in sys.stdin:
elems = line.rstrip().split('\t')
if lidx == 0:
perColValues = { x: set() for x in headers }
else:
for elemIdx, elem in enumerate(elems):
if len(elem) == 0:
continue
if elem not in allValues:
lidx += 1

maxCount = 0
if maxCount < l:
maxCount = l

# write output
for lidx in range(maxCount):
``````

Usage:

``````\$ python uniquify.py < in.mtx > out.mtx
``````
0
11 months ago by
India
``````df1 %>%
gather(k,v) %>%
mutate(k=as.factor(k))%>%
na.omit() %>%
group_by(v) %>%
filter(n() == 1) %>%
group_by(k)%>%
mutate(g = row_number()) %>%
select(-g) %>%
as.data.frame()

A B  C  D  E  F
1 12   14 55 23 13
2      30 21
``````