Compare data from two different table using R
2
1
Entering edit mode
5.5 years ago
ersan ▴ 10

I want to compare data two different table like that using R.

Table 1  
X1     
a;b
c
d;e

Table 2
Y1
a    uuu   uu
b    vvv   vv
c    xxx   xx
d    yyy   yy
 e    zzz   zz

 Ouptput Table 3
X1     Y2
a;b    uuu|uu<\br>vvv|vv
c      xxx|xx
d;e    yyy|yy<\br>zzz|zz
R merge • 3.5k views
ADD COMMENT
5
Entering edit mode
5.5 years ago
Chirag Parsania ★ 2.0k

Here is the dplyr way. Make sure that you provide column names as given in t1 and t2

    library(tidyverse)
    t1 <- tibble(X1 = c("a;b","c","d;e"))
    t2 <- tibble(Y1 =letters[1:5] , Y2 = c("uuu","vvv","xxx","yyy","zzz") ,Y3 =c("uu","vv","xx","yy","zz"))


    t3 <- t1 %>% 
      mutate(mm = map(X1, ~(strsplit(.,split = ";"))[[1]] %>% as_tibble)) %>% 
      mutate(nn  = map(mm , function(.){
        col_binds <- left_join(.,t2, by = c("value" = "Y1")) %>% 
          unite(col = "comb",sep = "|" , Y2:Y3)
        row_binds <- col_binds %>% 
          summarise(value  = paste0(value , collapse = ";")  , times = paste0(comb , collapse = "<\br>"))
        return(row_binds)
      })) %>% 
      select(nn) %>%
      unnest() %>% 
      dplyr::rename(X1 = value , Y2 = times) 

## output
t3
# A tibble: 3 x 2
  X1    Y2                 
  <chr> <chr>              
1 a;b   "uuu|uu<\br>vvv|vv"
2 c     xxx|xx             
3 d;e   "yyy|yy<\br>zzz|zz"
ADD COMMENT
0
Entering edit mode

Very thanks Chirag, i am receiving an error : Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.

ADD REPLY
0
Entering edit mode

I am receiving another error:

Evaluation error: Can't join on 'Y1' x 'value' because of incompatible types (integer / character).
ADD REPLY
1
Entering edit mode

What is the datatype of column Y1 in t2 ? In the sample data you provided, Y1 is character. Error suggest that your Y1 is integer. can you upload full data, if possible ? Without data its hard to predict.

ADD REPLY
0
Entering edit mode

I changed datatype of Y1 integer to character. I am now giving error

Error in mutate_impl(.data, dots) : 
  Evaluation error: `by` can't contain join column `Y1` which is missing from RHS.
ADD REPLY
0
Entering edit mode

Data X1 Data Y

ADD REPLY
1
Entering edit mode

Without any error, it is working perfectly fine in my computer

t1 <- read_delim("~/Downloads/X1.txt" ,delim = "\t") %>%filter(X1 !=".") ## remove the lines containing dots
t2 <- read_delim("~/Downloads/Y.txt" ,delim = "\t") %>% mutate_all(as.character) ## convert all cols to char

t3 <- t1 %>% 
  mutate(mm = map(X1, ~(strsplit(.,split = ";"))[[1]] %>% as_tibble)) %>% 
  mutate(nn  = map(mm , function(.){
    col_binds <- left_join(.,t2, by = c("value" = "Y1")) %>% 
      unite(col = "comb",sep = "|" , Y2:Y3)
    row_binds <- col_binds %>% 
      summarise(value  = paste0(value , collapse = ";")  , times = paste0(comb , collapse = "<\br>"))
    return(row_binds)
  })) %>% 
  select(nn) %>%
  unnest() %>% 
  dplyr::rename(X1 = value , Y2 = times) 

> t3
# A tibble: 8,300 x 2
   X1     Y2                                                       
   <chr>  <chr>                                                    
 1 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 2 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 3 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 4 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 5 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 6 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 7 615120 Myasthenic syndrome, congenital, 8|[AR]                  
 8 615120 Myasthenic syndrome, congenital, 8|[AR]                  
 9 615120 Myasthenic syndrome, congenital, 8|[AR]                  
10 615120 Myasthenic syndrome, congenital, 8|[AR]                  
# ... with 8,290 more rows
ADD REPLY
1
Entering edit mode

I notice many duplicated rows. adding t3 %>% unique() will give you unique rows

ADD REPLY
0
Entering edit mode

I is worked at me, too. It is wonderfull. Thank you very very much!..

ADD REPLY
0
Entering edit mode

Could I add per X1 to Y2? Output like that:

  X1    Y2                 
  <chr> <chr>              
1 a;b   aa|uuu|uu<\br>bb|vvv|vv
2 c     cc|xxx|xx             
3 d;e   dd|yyy|yy<\br>ee|zzz|zz
ADD REPLY
0
Entering edit mode

I found a way. It is working. Thanks for all.

summarise(value  = paste0(value , collapse = ";")  , times = paste0(str_split(value, ";") comb, collapse = '<br />'))
ADD REPLY
0
Entering edit mode

I have a same problem like that? Can you help me please? Merge two comma seperated column in one frame in R

ADD REPLY

Login before adding your answer.

Traffic: 2035 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6