Confused with Fishers test for GO enrichment
3.3 years ago
Biogeek

Hi, I'm running the fisher.test function in R.

My code is:

dfx <- input_fishers
res1<- NULL
for (i in 1:nrow(dfx)){
table1 <- matrix(c(dfx[i,1], dfx[i,2], dfx[i,3], dfx[i,4]), ncol = 2, byrow = TRUE)
p1<- fisher.test(table1, alternative = "greater")$p.value res1<- c(res1,p1) } dfx$fishers <- res1
x1 <- p.adjust(dfx$fishers, method = "BH", n = length(dfx$fishers))
dfx$p.adj <- x1 y1<- dfx[dfx$p.adj<0.05,]


My confusion mainly stems from the actual input.

My input matrix is set out as follows:

GO.ID Test.Set Test.Pop Ref.Set Ref.Pop
1 GO:0000003        1      274      16   19634
2 GO:0000041        1      274      44   19634
3 GO:0000122        3      274     265   19634
4 GO:0000139       16      274     474   19634
5 GO:0000165        1      274     109   19634
6 GO:0000166       13      274    2654   19634


First column is number of differential genes that have the SPECIFIC GO term (row name: GO.ID) Second column is the total number of differentially expressed genes with ANY go term Third column is the number of genes which have the SPECIFIC GO term in the entire transcriptome (this includes DE genes; row name: GO.ID) Fourth column is total number of genes in transcriptome with ANY GO term

However, I'm having doubts, should the matrix be:

GO.ID DE.GO DE.NOTGO Exp.transcriptome.GO Exp.transcriptome.NOTGO
1 GO:0000003     1      273                    16                    19618
2 GO:0000041     1      273                    44                    19590
3 GO:0000122     3      271                   265                    19369
4 GO:0000139    16      258                   474                    19160
5 GO:0000165     1      273                   109                    19525
6 GO:0000166    13      261                  2654                    16980


Can someone also clarify if I indeed count the DE genes in the expressed transcriptome reference set?

Many thanks!

