**1.1k**wrote:

Hi,

I have a gene expression microarray dataset with dimensionality 427 x ~40,000.

I wish to test if this data follows a multivariate normal distibution. Within R in the mvnormtest library the mshapiro.test() function (Shapiro-Wilkes test) only permits vectors no longer than 5000 entries.

I also attempted using the mahalanobis distance squared ( when plotted on a QQ-plot it should generate a Chi-Squared distribution if the distibution of the data is normal). However, this requires the calculation of a covariance matrix which is not feasible for a data set this large (or wide).

Do you guys have any suggestions for alternative tests of multivariate normality for a large dataset preferably but not necessarily with R.

Regards, S ;-)

**48k**• written 8.4 years ago by Darren J. Fitzpatrick •

**1.1k**

I doubt that the calculation of SW makes sense for the whole data-set. I will try to explain this in an answer later.

45k