Question

Plink logistic regression algorithm: How can it compute the standard error for each coefficient with extremly large SNPs array( >500k)?

0

Entering edit mode

4.6 years ago

adrien.f.badre-1 • 0

Hello Biostars community,

I'm a little bit interested in plink under the hood. I'm very surprised that plink handles so well the logistic regression for association analysis.

For a Dataset with 500k SNPs, if I'm correct, in order to get the standard error, we would need to compute the Hessian matrix for the parameters. It would lead to a matrix of size 500k500k so something extremely large for the ram of any cluster( assuming that each element of a matrix is a float16-> 0.000004 megabytes by number and, if I am correct, it would represent 500k0.5*4 MB for the matrix which is approximatly1000GB of ram to handle the matrix).

Does anyone know what sort of algorithm/process is used to achieve the computation of the hessian and/or the standard error without killing the ram of a cluster/Personal computer ( if applicable)?

Regards

plink logitic standard error association analysis • 1.0k views

ADD COMMENT • link 4.6 years ago by adrien.f.badre-1 • 0

score 2 · Answer 1 · 2019-09-26

2

Entering edit mode

4.6 years ago

chrchang523 10k

You seem to be misunderstanding the computation that plink is performing. The logistic regressions for each variant are completely independent of each other; there isn't a single giant regression performed with 500k predictors.

ADD COMMENT • link 4.6 years ago by chrchang523 10k

0

Entering edit mode

Oh, it makes sense then!

So basically, it takes 1 SNP at a time, perform a regression on it and then return the standard error?

Thank you for your previous answer. it helps a lot!

ADD REPLY • link 4.6 years ago by adrien.f.badre-1 • 0