How to perfrom a statistical test on a 3-way Venn Diagram
2
0
Entering edit mode
9 months ago
ivingan • 0

I would like to perform a statistical test on a 3 way venn diagram. I have 3 RNA Seq data sets that I wish to examine their overlap. I have generated a 3-way venn diagram for these data using the software called "VennPlex" which makes a Venn diagram stratified by up- and down-regulation (i.e. there are 2 numerical values in exclusive and overlapping sectors of the diagram, a value for upregulated genes, for downregulated genes, and in overlaping sectors a third value counter regulation of a shared gene between the two overlapping tables).

My research has lead me to think that a hypergeometric test is the best approach to perform statistics on a venn diagram. The R package "GeneOverlap" is built to do just that, but it can only perform such statistics on the overlap between two tables. That means, I can perform multiple tests on pairwise overlap between each of the 3 tables, but this will only test the hypothesis on the intersect between two tables at a time, there is no sufficient way to test the 3 way intersect sector of the diagram.

My questions are:

  1. Is the hypergeometric test the appropriate test for assessing overlap in a venn diagram 1a. If yes, what is the appropriate background value to use for this test? Is it the size of the whole mouse genome, or only the total number of significant genes identified for each comparison?
  2. If I am on the correct track how would I perform a test appropriately on all 4 possible overlapping sectors?
  3. After testing all these hypothesis do I need to also perform a p-value adjustment for multiple hypothesis testing?
Statistics Transcriptomics • 973 views
ADD COMMENT
2
Entering edit mode
9 months ago
LChart 3.9k

What you term as a "3-way venn diagram" was referred to in the early literature as a "2x2x2" contingency table, and the hypergeometric test is the appropriate statistic to use in this case (and is implemented explicitly for 2x2x2 contingency tables by the R package hypergea); and now falls under the general heading of "multi-way" or "n-way" contingency tables (if, for instance, you're interested in 2x2xK or 2x2x2x2 generalizations)

ADD COMMENT
1
Entering edit mode
9 months ago
Papyrus ★ 2.9k

Take a look at the R package SuperExactTest which implements testing for multi-set intersections. It will allow you to test for the intersection across N sets, and also give you pairwise intersections tests across individual sets.

The definition of the background is sometimes non-trivial, here probably it would be the background of genes from where the differential genes come from. Sometimes there are mild differences between the backgrounds (e.g. if you have 3 different RNA-seq datasets and you didn't filter them by common genes); you could maybe filter your sets by a common background across all datasets.

Alternatively, you could also consider an empirical solution based on permutation: e.g. to randomly sample sets of genes of equivalent size to your differential genes, and measure the 3-way intersection. You can permute this e.g. 1000 times to generate an empirical distribution to assess significance of your observed intersections. This permutation strategy also solves the problem of the genes coming from different backgrounds with different sizes.

Regarding multiple testing correction, if you're going to do pairwise comparisons across a lot of tests, IMO it is probably better to perform the correction.

ADD COMMENT

Login before adding your answer.

Traffic: 1469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6