Question: Significance between two sets of DEGs from same species
0
gravatar for sakuraazalea
11 months ago by
sakuraazalea10
sakuraazalea10 wrote:

I am analyzing honeybee RNA-seq data from two different studies.

Study 1 had 15,314 genes total with 118 DEGs. Study 2 had 11,825 genes total with 740 DEGs. There was an overlap of 67 between the two sets of DEGs.

I want to test whether this overlap is significant. I see one approach is to use Fisher Exact Test (https://rdrr.io/bioc/GeneOverlap/man/GeneOverlap.html). I am pretty sure I need to set up a 2*2 table but am unclear on the values. I am especially unclear on the first value Q below. I believe Q should be equal to N-(740+118-67), but am unsure of what value N should be used as there are two different total gene numbers (15,314 and 11,825).

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

What values should I used in this case? Thank you for sharing advice.

rna-seq fisher.exact • 328 views
ADD COMMENTlink modified 11 months ago by Nicolas Rosewick7.4k • written 11 months ago by sakuraazalea10

The link you provided doesn't work. When doing Fisher's Exact Test we typically set up the values using a contingency table (2*2). I would suggest making sure understand that first, then looking at Fisher's Exact Test.

ADD REPLYlink written 11 months ago by kpr60
0
gravatar for Carlo Yague
11 months ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

You should first clean up each dataset by removing every gene not present in both studies. This can change the number of DEG identified in each dataset. Then, N= the number of genes tested in both studies.

ADD COMMENTlink written 11 months ago by Carlo Yague4.4k
0
gravatar for Nicolas Rosewick
11 months ago by
Belgium, Brussels
Nicolas Rosewick7.4k wrote:

You should use the total number of genes used in the annotation you used for the gene analysis. Did you redo the analysis workflow for both studies using the same analysis workflow and same annotation ? or did you just take the results from publications ? For the first solution you should then use the total number of genes in your annotation and perform a fisher test as you described in your question.

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

For the second solution, maybe you could use the union of the 15,314 and 11,825 gene list. Or better reperform the analysis to control that the datasets were analyzed in the same manner to avoid analysis bias.

ADD COMMENTlink written 11 months ago by Nicolas Rosewick7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1449 users visited in the last hour