Hi. I am new to genetics research and have a basic question,so would be very grateful for any advice you can provide me. I am trying to investigate for a phenotypic genotypic correlation for CTH and a particular population. Going through the SNP database I cam across multiple reference SNPs. I then went to the HapMap database to narrow it down to proxy SNPs. (I am only allowed to run 10 SNPs by my lab)The HapMap database still gives me more than 30 SNPs. Should I now select those SNPs with the highest reference allele frequency (ie 1.0), or take those that average about 0.5 or less? Thank you for your advice.
The allele frequency refers to the relative frequency of a given variant at a locus where more than one allele has been detected. A frequency of 1.0 implies that only one variant has been detected in a given population at that locus; such a SNP would not be informative for an association study. You are looking for loci where there is more than one variant, and where that variation is associated with your phenotype. You are looking for loci that are in linkage disequilibrium with the variant that drives your phenotype. My first piece of advice would be to suggest that you do a little more background reading before you pick SNPs. If CTH refers to the gene cystathionase and you are trying to do an eQTL study, I would strongly recommend reading the Nature Reviews: Genetics articles on expression QTL studies (e.g. Rockman et al., Cheung & Spielman, Cookson et al.).
Next to the excellent answer of David you should also look into the LD structure of these 30 SNPs to optimise you chance of finding anything new. So to speak try to find the 10 out of x SNPs that are not in LD with each other (furthest away).
How many SNPs are there with an allele frequency of < 1? And are they all in sort of the same location?
As you're looking into a particular phenotype you should also check the effect size of each know associated SNP, if you have to narrow down the amount of SNPs I would pick the one with the highest know effect size as this most likely increase your chance of finding anything relevant.
Hello, when picking the best SNPs to analyze, two lines of inquery come to mind when considering their possible phenotypic implications One, does the SNP code for an amino acid mutation? If so is this change likely to affect protein folding or occur near the binding site? Two, if the SNP is not in a coding part of the gene could it possibly affect gene splicing to render a non-functional protein? These are questions that I would ask coming from a structural biology background to better understand possible changes in protein function that could lead to alterations in biological activity and thus phenotypic divergence. I hope this helps.