I have experimental quantitative data of an immune phenotype (as a continuous variable) for 7 inbred mouse strains. From Immunological genome project immgen.org), I have obtained raw data for immune cell gene expression, from which I can calculate gene expression differences across any pair of mouse strains. From the 7 pairs of strains, I generate C(7,2) = 21 possible pairs. For each of the 21 pairs, I calculate the difference in phenotype (expressed as absolute difference or ratio). From the same set of 21 pairs I generate gene expression differences for each pair. How can I correlate the 21 'phenotype' measures and 21 'gene-expression' measures to come up with a potential list of genes that could be associated with the phenotype ? Is there a statistical method or gene set enrichment tool that can pick up quantitative differences in groups of gene sets such as this ?
I don't see where you're going with computing all these pairs.
Identifying the genes contributing to the phenotype can be seen as a regression problem. You want to regress your phenotype Y on a linear combination of the gene levels Xi. See for example this paper. For this, you would use the strains as individuals.