I am just started to understand SNP and related information. So when I searched a particular SNP rs80334247. I got lot of information like
- Chromosome No.3
- Minor allele count T=0.0038/19 (1000 Genomes) 3.Gene ID SCN5A (6331)
- Major and Minor allele count on different populations around the world.
and lot of other information
Now my question is that where to search for information like Homozygous & Heterozygous,Dominant and Recessive allele and how this information can be downloaded on all population?
For example if I want to test a particular SNP by calculating Chi-Square and P-value then I need to make a contingency table like following
AA Aa aa Subjects Control
On the other hand if I want to calculate p -values using logistic regression where predictors are SNP and response is 1 or 0 for subject and control. Then what information would be needed in SNPs like SNP1 would have what type of values?
Take Y as response variable that takes 0 or 1 for control and subject and lets say I have 3 SNPs. SNP1 SNP2 and SNP3 on 10 subjects
Y SNP1 SNP2 SNP3 1 ? ? ?` 1 0 0 1 0 1 0 1 0
I have confusion here that what will be the corresponding values in SNPs as a single SNP has lot of information like MAF, major allele count or minor allele count etc or these SNP can be encoded like 0 and 1 for example if our reference allele is A(by the way how I know this is reference?) then in each subjects either we have that major allele or not then we can encode it as 0 or 1.
So I have these number of confusions related to SNP dataset and its usage? If somebody could explain me with a small example dataset on SNP it would relieve me of much pain related to SNP dataset understanding and its usage. ?