Finding unique and identical rows between multiple columns of a data frame in R
5
1
Entering edit mode
18 months ago
salman_96 ▴ 70

Hi,

I am working on a large data-frame that has 4 columns and each column has variable rows. I am trying to find unique and identical pathways between the 4 columns (each column represent a particular day of treatment with a drug). Here below is a small example.

Pathways_1Day <- c("blood","kidney","testis","No","bone","liver","intestine","lungs","ABC","pancreas","Yes")

Pathways_2Day <- c("blood","kidney","testis","eyes","bone","cells","intestine","cervix","ABC","pancreas","None")

Pathways_3Day <- c("blood","kidney","vessels","lymph","t-cells","liver","intestine","lungs","ABC","epidermis","None")

df<-data.frame(Pathways_1Day,Pathways_2Day,Pathways_3Day) 

I want to get a summary of the no of pathways that are common between different timepoints (1, 2 and 3 days).

Important: The no of pathways is not the same for each day.

I have tried this:

All_pathwayNames <- df%>%group_by_all%>%count

But the desired output is not what I am trying to get.

There can be different ways to address that. It will be great if I can get matching rows infront of each other across all columns.

Regards

identical unique rows R Dataframe • 1.2k views
ADD COMMENT
0
Entering edit mode

What is the expected output for this data?

ADD REPLY
1
Entering edit mode
18 months ago
zx8754 11k

Use Reduce to interesect multiple vectors:

Reduce(intersect, list(Pathways_1Day, Pathways_2Day, Pathways_3Day))
# [1] "blood"     "kidney"    "intestine" "ABC" 

Related StackOverflow post: How to find common elements from multiple vectors?

ADD COMMENT
1
Entering edit mode
18 months ago
Basti ★ 2.0k

Seems that you may need UpSet plot : https://github.com/hms-dbmi/UpSetR

ADD COMMENT
0
Entering edit mode
18 months ago
Trivas ★ 1.7k

Not the most elegant, but you can do something like this: Pathways_1Day[Pathways_1Day[Pathways_1Day %in% Pathways_2Day] %in% Pathways_3Day]

[1] "blood"  "kidney" "bone"   "liver"  "lungs"  "ABC"
ADD COMMENT
0
Entering edit mode
18 months ago

As far as your example is concerned, this will order produce a sparse data.frame with matching rows infront of each other across all columns.

pathways_combined <- sort(unique(unlist(df)))

df2 <-
  as.data.frame(apply(df, 2, function(x, y) {
    y <- factor(y,levels=c(y,NA))
    y[!is.element(y, x)] <- NA
    return(y)
  }, y = pathways_combined))

But if your pathways are of different length in the first place, you can probably not start from a data.frame, but will need to lapply instead to loop over a list.

ADD COMMENT

Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6