Question

Hypermetric Distribution Of Snps In Pathways

2

Entering edit mode

13.2 years ago

Andrea_Bio ★ 2.8k

Hello

I'm really embarassed asking this question since I haven't done statistics since I was at school (and I'm a female so I'm not saying how long ago that was).

I have SNPs for 2 species of cow which have a different phenotype. The SNPs were obtained from a pooled data set of 10 individuals from each species. My theory is that the difference in phenotype arises from a 'gain of function' in one of the species and that the potentially functional SNPs are those that exist in one species and not the other, or those SNPs that are homozygous for one allele in one species and homozygous for another allele in a different species.

So, i have a set of 'pfSNPs' and I have narrowed in down to those SNPs in genes and I intend to do some pathway analysis. I am hoping to see that pathways that feature in the phenotype response will have a high level of pfSNPs compared to other pathways. My question, after all this preamble, is it possible to show that the pathway enrichment for SNPs is statistically significant and not random.

I am thinking I would need some sort of chi-squared test but I don't know how this works at the pathway level. Would I would compare the number of SNPs in my enriched pathway to the number of SNPs in all of the other pathways to show that a higher number of SNPs in this pathway has a very low probability of happening by chance? My confusion is that not all pathways are created equal in that some will have more genes than others and are more likely to have SNPs. Can you factor in the number of genes in the pathway? Or is that not necessary because using that logic you could say that genes with longer exons/introns are more likely to have SNPs as they have more 'DNA coverage'?

I'm also aware that I don't have a large number of individuals in the initial samples.

Thanks a lot for your help

snp pathway statistics • 3.7k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 13.2 years ago by Andrea_Bio ★ 2.8k

Ram · Answer 1 · 2011-05-24

4

Entering edit mode

13.2 years ago

David Quigley 11k

You don't have to invent this analysis yourself (although it never hurts to try). Several groups have looked at using genotypes to do pathway analysis. Check the Nature Reviews: Genetics paper by Wang. et al. (Analysing biological pathways in genome-wide association studies) for a review. The most common approach seems to be a variation on Gene Set Enrichment Analysis for SNPs; you convert your SNPs to candidate genes and then do a GSEA. There are a lot of details to consider: how do you assign a gene to a SNP (or do you avoid that entirely), do you work from raw genotypes or P values, what test do you use to identify enrichment, do you threshold the P values for your initial analysis, etc.

Unfortunately I suspect the data you describe are underpowered for this analysis, but that's usually the case.

ADD COMMENT • link updated 5.5 years ago by Ram 44k • written 13.2 years ago by David Quigley 11k

0

Entering edit mode

Thanks for your answer. I am not doing GWAS studies so I don't have any P values or a problem of assigning a SNP to a gene (the SNP is either in it or very close to be in the gene). Is the GSEA enrichment technique still relevant as it doesn't take into account the fact that a gene with multiple SNPs enriches a pathway more than a gene with one SNP