Plink logistic regression algorithm: How can it compute the standard error for each coefficient with extremly large SNPs array( >500k)?
1
0
Entering edit mode
4.6 years ago

Hello Biostars community,

I'm a little bit interested in plink under the hood. I'm very surprised that plink handles so well the logistic regression for association analysis.

For a Dataset with 500k SNPs, if I'm correct, in order to get the standard error, we would need to compute the Hessian matrix for the parameters. It would lead to a matrix of size 500k500k so something extremely large for the ram of any cluster( assuming that each element of a matrix is a float16-> 0.000004 megabytes by number and, if I am correct, it would represent 500k0.5*4 MB for the matrix which is approximatly1000GB of ram to handle the matrix).

Does anyone know what sort of algorithm/process is used to achieve the computation of the hessian and/or the standard error without killing the ram of a cluster/Personal computer ( if applicable)?

Regards

plink logitic standard error association analysis • 1.0k views
ADD COMMENT
2
Entering edit mode
4.6 years ago

You seem to be misunderstanding the computation that plink is performing. The logistic regressions for each variant are completely independent of each other; there isn't a single giant regression performed with 500k predictors.

ADD COMMENT
0
Entering edit mode

Oh, it makes sense then!

So basically, it takes 1 SNP at a time, perform a regression on it and then return the standard error?

Thank you for your previous answer. it helps a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6