GWAS: Same workflow and inputs, different servers, very different output (EMMAX)
0
0
Entering edit mode
3.5 years ago
michael.nagle ▴ 100

Using our server, I'm trying to replicate collaborators' results using the same workflow and dataset used on their server. This is the standard EMMAX workflow detailed here: https://genome.sph.umich.edu/wiki/EMMAX

For some reason, results are very different. One server seems to produce nothing but random noise in Manhattan plots, while another shows clear peaks.

In the plots showing nothing but noise (two examples below), the most notable problem is the lack of associations which are clearly seen in plots from the other server (not shown). Also, some don't have very low -log(p) (right), but others appear to show very strong associations across all loci (left)... and then there seems to be a memory issue or something that prevents all of the points from being plotted. There are no clear peaks or "skyscrapers" to be seen in any Manhattan plots from EMMAX run on this server.

The plots made on the other server are all fine and show clear peaks – and there's no issue with strong associations being predicted everywhere.

I've made sure that the input data and scripts to call EMMAX (below) are the same and the same version of EMMAX is being used.

emmax -v -d 10 -t [tped prefix] -p [pheno file] -k [kinship file] -o [output file]


At this point I don't know what else to do other than to try a published R implementation of EMMAX (originally in C) and see if the same problem happens. I'd appreciate any tips, even if just on general troubleshooting strategies for such a situation when there's a problem that isn't with the scripts or input files.

GWAS C EMMAX Server Cluster • 1.2k views