Question: Compare data from two different table using R
1
gravatar for ersan
4 weeks ago by
ersan10
ersan10 wrote:

I want to compare data two different table like that using R.

Table 1  
X1     
a;b
c
d;e

Table 2
Y1
a    uuu   uu
b    vvv   vv
c    xxx   xx
d    yyy   yy
 e    zzz   zz

 Ouptput Table 3
X1     Y2
a;b    uuu|uu<\br>vvv|vv
c      xxx|xx
d;e    yyy|yy<\br>zzz|zz
merge R • 261 views
ADD COMMENTlink modified 26 days ago by zx87545.7k • written 4 weeks ago by ersan10
5
gravatar for Chirag Parsania
4 weeks ago by
Chirag Parsania1.2k
University of Macau
Chirag Parsania1.2k wrote:

Here is the dplyr way. Make sure that you provide column names as given in t1 and t2

    library(tidyverse)
    t1 <- tibble(X1 = c("a;b","c","d;e"))
    t2 <- tibble(Y1 =letters[1:5] , Y2 = c("uuu","vvv","xxx","yyy","zzz") ,Y3 =c("uu","vv","xx","yy","zz"))


    t3 <- t1 %>% 
      mutate(mm = map(X1, ~(strsplit(.,split = ";"))[[1]] %>% as_tibble)) %>% 
      mutate(nn  = map(mm , function(.){
        col_binds <- left_join(.,t2, by = c("value" = "Y1")) %>% 
          unite(col = "comb",sep = "|" , Y2:Y3)
        row_binds <- col_binds %>% 
          summarise(value  = paste0(value , collapse = ";")  , times = paste0(comb , collapse = "<\br>"))
        return(row_binds)
      })) %>% 
      select(nn) %>%
      unnest() %>% 
      dplyr::rename(X1 = value , Y2 = times) 

## output
t3
# A tibble: 3 x 2
  X1    Y2                 
  <chr> <chr>              
1 a;b   "uuu|uu<\br>vvv|vv"
2 c     xxx|xx             
3 d;e   "yyy|yy<\br>zzz|zz"
ADD COMMENTlink written 4 weeks ago by Chirag Parsania1.2k

Very thanks Chirag, i am receiving an error : Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.

ADD REPLYlink written 4 weeks ago by ersan10

I am receiving another error:

Evaluation error: Can't join on 'Y1' x 'value' because of incompatible types (integer / character).
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ersan10
1

What is the datatype of column Y1 in t2 ? In the sample data you provided, Y1 is character. Error suggest that your Y1 is integer. can you upload full data, if possible ? Without data its hard to predict.

ADD REPLYlink written 4 weeks ago by Chirag Parsania1.2k

I changed datatype of Y1 integer to character. I am now giving error

Error in mutate_impl(.data, dots) : 
  Evaluation error: `by` can't contain join column `Y1` which is missing from RHS.
ADD REPLYlink written 4 weeks ago by ersan10

Data X1 Data Y

ADD REPLYlink written 4 weeks ago by ersan10
1

Without any error, it is working perfectly fine in my computer

t1 <- read_delim("~/Downloads/X1.txt" ,delim = "\t") %>%filter(X1 !=".") ## remove the lines containing dots
t2 <- read_delim("~/Downloads/Y.txt" ,delim = "\t") %>% mutate_all(as.character) ## convert all cols to char

t3 <- t1 %>% 
  mutate(mm = map(X1, ~(strsplit(.,split = ";"))[[1]] %>% as_tibble)) %>% 
  mutate(nn  = map(mm , function(.){
    col_binds <- left_join(.,t2, by = c("value" = "Y1")) %>% 
      unite(col = "comb",sep = "|" , Y2:Y3)
    row_binds <- col_binds %>% 
      summarise(value  = paste0(value , collapse = ";")  , times = paste0(comb , collapse = "<\br>"))
    return(row_binds)
  })) %>% 
  select(nn) %>%
  unnest() %>% 
  dplyr::rename(X1 = value , Y2 = times) 

> t3
# A tibble: 8,300 x 2
   X1     Y2                                                       
   <chr>  <chr>                                                    
 1 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 2 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 3 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 4 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 5 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 6 616126 Immunodeficiency 38 with basal ganglia calcification|[AR]
 7 615120 Myasthenic syndrome, congenital, 8|[AR]                  
 8 615120 Myasthenic syndrome, congenital, 8|[AR]                  
 9 615120 Myasthenic syndrome, congenital, 8|[AR]                  
10 615120 Myasthenic syndrome, congenital, 8|[AR]                  
# ... with 8,290 more rows
ADD REPLYlink written 4 weeks ago by Chirag Parsania1.2k
1

I notice many duplicated rows. adding t3 %>% unique() will give you unique rows

ADD REPLYlink written 4 weeks ago by Chirag Parsania1.2k

I is worked at me, too. It is wonderfull. Thank you very very much!..

ADD REPLYlink written 4 weeks ago by ersan10

Could I add per X1 to Y2? Output like that:

  X1    Y2                 
  <chr> <chr>              
1 a;b   aa|uuu|uu<\br>bb|vvv|vv
2 c     cc|xxx|xx             
3 d;e   dd|yyy|yy<\br>ee|zzz|zz
ADD REPLYlink written 4 weeks ago by ersan10

I found a way. It is working. Thanks for all.

summarise(value  = paste0(value , collapse = ";")  , times = paste0(str_split(value, ";") comb, collapse = '<br />'))
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ersan10

I have a same problem like that? Can you help me please? Merge two comma seperated column in one frame in R

ADD REPLYlink written 4 weeks ago by ersan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour