I am new to proteomics and have a question about testing for overlap between protein lists. I have 2 protein lists I would like to compare: A subset (n) of list of proteins (N) identified in my experiment and a list of proteins from literature belonging to a specific category (R).
I would like to know whether my subset list of proteins (n) is enriched for the proteins in the list from literature (R) compared to other subsets in my experiment. I want to use the hypergeometric test for determining the significance of overlap between the 2 sets (n and R).
I am not sure what to use as the background list. From my reading, I thought of using the total proteins identified (N) in the experiment as the background, however, I realized that about 50% of the proteins in the list from literature (R) were not identified in my experiment. So obviously they would not be present in my subset list (n) in which I would like to look for overlap with the literature list (R).
Under these conditions, would it be acceptable for me to filter the literature list (R) for only those proteins that were identified in my experiment (N) and then compare my subset list (n) with the subset literature protein list (r) and then use my total proteins identified as the background?
If not, what should my background list be?