Question

Significance between two sets of DEGs from same species

0

Entering edit mode

6.1 years ago

sakuraazalea ▴ 20

I am analyzing honeybee RNA-seq data from two different studies.

Study 1 had 15,314 genes total with 118 DEGs. Study 2 had 11,825 genes total with 740 DEGs. There was an overlap of 67 between the two sets of DEGs.

I want to test whether this overlap is significant. I see one approach is to use Fisher Exact Test (https://rdrr.io/bioc/GeneOverlap/man/GeneOverlap.html). I am pretty sure I need to set up a 2*2 table but am unclear on the values. I am especially unclear on the first value Q below. I believe Q should be equal to N-(740+118-67), but am unsure of what value N should be used as there are two different total gene numbers (15,314 and 11,825).

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

What values should I used in this case? Thank you for sharing advice.

fisher.exact RNA-Seq • 1.4k views

ADD COMMENT • link updated 6.1 years ago by Nicolas Rosewick 11k • written 6.1 years ago by sakuraazalea ▴ 20

0

Entering edit mode

The link you provided doesn't work. When doing Fisher's Exact Test we typically set up the values using a contingency table (2*2). I would suggest making sure understand that first, then looking at Fisher's Exact Test.

ADD REPLY • link 6.1 years ago by kpr ▴ 80

score 0 · Answer 1 · 2018-03-30

0

Entering edit mode

6.1 years ago

Carlo Yague 8.7k

You should first clean up each dataset by removing every gene not present in both studies. This can change the number of DEG identified in each dataset. Then, N= the number of genes tested in both studies.

ADD COMMENT • link 6.1 years ago by Carlo Yague 8.7k

score 0 · Answer 2 · 2018-03-30

You should use the total number of genes used in the annotation you used for the gene analysis. Did you redo the analysis workflow for both studies using the same analysis workflow and same annotation ? or did you just take the results from publications ? For the first solution you should then use the total number of genes in your annotation and perform a fisher test as you described in your question.

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

For the second solution, maybe you could use the union of the 15,314 and 11,825 gene list. Or better reperform the analysis to control that the datasets were analyzed in the same manner to avoid analysis bias.