If we want to get list of genes that are related to some phenotypeX, there are plenty of gene-sets. e.g. BROCA set, (more gene lists at macarthur-lab), etc.
BROCA tumour suppressor gene set designed by Walsh et al (2010), which comprises known high- and moderate-risk breast/ovarian cancer genes...
But, if we wanted the opposite gene set that are not related to phenotypeX, how do we get them?
We could get set difference between known and all other genes, but that would exclude unknown not-yet-discovered set of genes, too. I would want to get list of genes that are not related to phenotypeX based on literature(?), some database(?), possible?
From regression modeling, the ones not related to the phenotype of interest would have the lowest coefficient / estimate, no? That is, they would have a negative coefficient.
With regard to cancer, though, I suspect that virtually every protein coding gene has been linked to cancer at some point!
If we take certain cancer, let's say breast cancer, is there a list of genes that are proved (published) to have no effect on developing cancer. For example, geneXYZ that defines your hair colour has nothing to do with breast cancer type of publications?
I don't think so... Such a gene may be dysregulated and have higher or lower expression, or contain somatic mutations, but these may be indirect results of the cancer and passenger mutations, respectively. If a group even found that a particular gene had no relation to cancer, would they even publish it...?