how do I find the genes which are common to and those which differ between multiple lists of DE genes
1
1
Entering edit mode
3.4 years ago
peter.berry5 ▴ 60

I have multiple dataframes, each one represents a different experimental condition.

What I want is a list of:

a) the names of the genes which differ between the dataframes and b) the names of the genes which are common to the dataframes.

I know I can do this using the syntax listed here "http://www.cookbook-r.com/Manipulating_data/Comparing_data_frames/" but I am already using the VennDiagram package in R to illustrate visually how many genes are

a) common to and b) differ between dataframes.

It occurred to me that what I want is probably calculated by the VennDiagram package in order to be able to draw the venn diagram.

Does anybody know if it is and can it be extracted?

Thanks

Package R VennDiagram • 4.4k views
ADD COMMENT
0
Entering edit mode

There are a ton of ways to do this in base R, but the dplyr sntax for this is a bit more simple. You can use dplyr::anti_join to return rows in one data.frame that don't have matches in a column in another data.frame. If you want to get rows in a data.frame that have matches to a column in another data.frame, you can use dplyr::semi_join.

ADD REPLY
0
Entering edit mode

@rpolicastro. Thanks for the reply. That's exactly what i ended up doing and worked perfectly. However, unless I am missing something anti join only allows me compare two dataframes. When I started analysing the full dataset I ended up comparing 4 dataframes to each other which I didn't anticipate and involved quite a bit of code and is messy.

Hence the question regarding the VennDiagram package as it must be performing these types of caomparisions this to draw the Venn diagram.

ADD REPLY
0
Entering edit mode

You can also check this very nice solution implemented in R:

https://github.com/hms-dbmi/UpSetR

which enables complex comparisons and nice visual representations

ADD REPLY
1
Entering edit mode

This is a visualization tool that comes after OP understands the basics of their own dataset. As such, I don't think this qualifies as an answer.

ADD REPLY
2
Entering edit mode
3.4 years ago
Michael 54k

You can simply use the set operation from the base package (also exist in BioGenerics and some other packages) .

see ?sets

union(x, y)
intersect(x, y)
setdiff(x, y)

Provide your genes as vectors of gene names from your data.frames. So for genes that are different between x and y use setdiff(x,y) (genes in x but not in y) or setdiff (y,x), etc. For the common genes use intersect(x,y).

> x <- c('a','b','c')
> y <- c('b','d','c')
> setdiff (x,y)
[1] "a"
> setdiff (y,x)
[1] "d"
> union (y,x)
[1] "b" "d" "c" "a"
> intersect (y,x)
[1] "b" "c"
> setdiff(union(x,y), intersect(x,y))
[1] "a" "d"

For Venn diagrams also check out the vennr package (in addition to VennDiagram), that will do most of the work for you.

ADD COMMENT
0
Entering edit mode

These are functions in base R and BioGenerics is unnecessary. If the lists are character vectors and the spelling is matched, they work beautifully. If the user has lists of mismatched IDs, gene-protein, or factor level, they won't work, and some conversion is needed specific to the situation. I think it's a good solution in the long run and may need further biinf work for the user to adapt to his or her needs.

ADD REPLY
0
Entering edit mode

I agree there can be some problems with curation and mismatched identifies, and I assumed the user has curated the input data properly. This process depends on the processing pipeline. Better to stick with unique ids like Ensembl gene or transcript IDs and not to mix different types of names and IDs, or transfer data back and forward from R to Excel sheets. Always good to raise awareness.

ADD REPLY

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6