Using Linear Regression on Genotype and Expression data
1
0
Entering edit mode
6 weeks ago

Hi all,

I have studied many sources like this and this that try to relate the gene expression of a gene to the variants(SNPs). but in all of them, I have a question that they didn't answer. My question is this: As we have 3 types of genotype ( "0" which refers to 0 minor allele count (ref/ref), "1" refers to 1 minor allele count (ref/alt) , "2" refers to 2 minor allele count (alt/alt) ) , and if we just considered SNPs within 100 Kbp upstream and downstream of TSS(Transcription factor site) we may have about ~20 SNPs for each gene, so there would become so colinearity between nonindependent variables( which is genotype).

this is a sample table that I will run Linear Regression ( function "lm" in R) :

            SNP1         SNP2           SNP3             SNP4    ...   Gene expression
   donor1    0            1              0                1                 3.5
   donor2    0            1              0                1                 4.5
   donor3    0            0              0                0                 3.0
   donor4    1            1              0                1                 5.5
   donor5    0            1              0                1                 1.5
   ...

I have ~400 donors and many donors are like donor1 and donor5, their genotypes in SNPs are the same. so when I run linear regression this warning arise "prediction from a rank-deficient fit may be misleading"

so what should I do? Am I doing something wrong or no?

thanks alot

Regression SNP Machine Learning Genotype • 290 views
ADD COMMENT
0
Entering edit mode

Can you show the model that you are fitting?

ADD REPLY
0
Entering edit mode

I am doing this :

model <- lm ( gene_expression ~ . , data = my_data_train)
pred_lm <- predict(model, newdata = my_data_test)
ADD REPLY
0
Entering edit mode
5 weeks ago
PeterKW ▴ 40

This is most likely a warning because you have some colinear covariates e.g. SNP2 and SNP 4 in the sample table you gave. There are various other reasons given here. I hope this will help, just give the different answers a good thought.

ADD COMMENT

Login before adding your answer.

Traffic: 1549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6