I am a total beginner on bioinformatics, so this question might be very very trivial. Right now, I need to run a linear regression using genotype information on the expression of some genes. I have vcf files for all the chromosomes. I am having a hard time understanding how should I feed the genotype information (0s and 1s) to the regression model. Do I use the allele frequency or should I just use the 0s and 1s? Also, regarding the expression of genes, I have a list of the id of the genes, there related snp_ids, r-values, and p-values. In order to feed into the linear model, what kind of expression value should I use?
(I am having hard time understanding these because from all the stats courses, we just simply use values and numbers. But for the biology information, there are only 0s and 1s. I can't seem to figure out how to do a regression on 0s and 1s and find their association.)
Thank you so much for your helps!