I'm really embarassed asking this question since I haven't done statistics since I was at school (and I'm a female so I'm not saying how long ago that was).
I have SNPs for 2 species of cow which have a different phenotype. The SNPs were obtained from a pooled data set of 10 individuals from each species. My theory is that the difference in phenotype arises from a 'gain of function' in one of the species and that the potentially functional SNPs are those that exist in one species and not the other, or those SNPs that are homozygous for one allele in one species and homozygous for another allele in a different species.
So, i have a set of 'pfSNPs' and I have narrowed in down to those SNPs in genes and I intend to do some pathway analysis. I am hoping to see that pathways that feature in the phenotype response will have a high level of pfSNPs compared to other pathways. My question, after all this preamble, is it possible to show that the pathway enrichment for SNPs is statistically significant and not random.
I am thinking I would need some sort of chi-squared test but I don't know how this works at the pathway level. Would I would compare the number of SNPs in my enriched pathway to the number of SNPs in all of the other pathways to show that a higher number of SNPs in this pathway has a very low probability of happening by chance? My confusion is that not all pathways are created equal in that some will have more genes than others and are more likely to have SNPs. Can you factor in the number of genes in the pathway? Or is that not necessary because using that logic you could say that genes with longer exons/introns are more likely to have SNPs as they have more 'DNA coverage'?
I'm also aware that I don't have a large number of individuals in the initial samples.
Thanks a lot for your help