Regression using genotype on expression of genes
1
0
Entering edit mode
4.8 years ago
wstla27 ▴ 20

I am a total beginner on bioinformatics, so this question might be very very trivial. Right now, I need to run a linear regression using genotype information on the expression of some genes. I have vcf files for all the chromosomes. I am having a hard time understanding how should I feed the genotype information (0s and 1s) to the regression model. Do I use the allele frequency or should I just use the 0s and 1s? Also, regarding the expression of genes, I have a list of the id of the genes, there related snp_ids, r-values, and p-values. In order to feed into the linear model, what kind of expression value should I use?

(I am having hard time understanding these because from all the stats courses, we just simply use values and numbers. But for the biology information, there are only 0s and 1s. I can't seem to figure out how to do a regression on 0s and 1s and find their association.)

Thank you so much for your helps!

SNP gene linear regression • 806 views
ADD COMMENT
2
Entering edit mode
4.8 years ago

I am not sure what you are aiming to do, exactly. However, you should attempt to get your VCF data in an 'analysis-ready' format. This will involve summarising it to allele tallies (continuous) or maintaining it as categorical variables (for Ref, Heterozygous Alt, and Homozygous Alt).

After that, you can do a multinomial logistic regression or a linear regression:

glm(Variant ~ GeneExpression, data = mydata, family = binomial(link = 'logit')) # multinomial regression
lm(GeneExpression ~ Variant, data = mydata) # linear regression

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6