I have bacterial genome A and genome B, each with 4,000 and 4,1000 genes. After homologous clustering, I found 1,200 core genes present in both genomes. What I am interested is which GO term is enriched in the core genes of both genome A and B. I am just curious if I am going to calculate manually, using Fisher's exact test in 2x2 contingency table, how am I going to do this? Let say I want to calculate for "Iron transport". For my best understanding in Fisher's exact test, I guess the contingency table should look like this (for genome A):
Gene annotated with "Iron transport" Gene not annotated to "Iron transport"
Core gene 50 1150
Non core gene 200 2600
I don't know is above a correct 2x2 contingency table to find the GO term (by dividing the gene set into core gene VS non core gene), because I saw from other posts, some suggested using whole gene set as background (or core gene VS whole gene set in a genome). I don't know which is the most correct and precise way to do this. I am confused since I am new to statistics and gene enrichment analysis.
I would appreciate if anyone can enlighten me about this question. Thanks in advance.