Hello Biostars community,
I'm a little bit interested in plink under the hood. I'm very surprised that plink handles so well the logistic regression for association analysis.
For a Dataset with 500k SNPs, if I'm correct, in order to get the standard error, we would need to compute the Hessian matrix for the parameters. It would lead to a matrix of size 500k500k so something extremely large for the ram of any cluster( assuming that each element of a matrix is a float16-> 0.000004 megabytes by number and, if I am correct, it would represent 500k0.5*4 MB for the matrix which is approximatly1000GB of ram to handle the matrix).
Does anyone know what sort of algorithm/process is used to achieve the computation of the hessian and/or the standard error without killing the ram of a cluster/Personal computer ( if applicable)?
Regards
Oh, it makes sense then!
So basically, it takes 1 SNP at a time, perform a regression on it and then return the standard error?
Thank you for your previous answer. it helps a lot!