Question: GCTA GREML for WGS with extremely large number of samples and SNPs
gravatar for moldach
6 months ago by
McGill, Douglas Mental Health University Institute
moldach130 wrote:

I've been trying to use GCTA (a tool for Genome-wide Complex Trait Analysis) for a larger job than usual and keep getting a message saying "killed" from our cluster when trying to do the segment based LD score step on my data which makes me think it's a memory issue. We don't have a job scheduler on our cluster currently and I believe the job was getting killed because it was the biggest process we went over memory.

Now I'm running the following script on binary files (78G for the .bed and 1.1G for .bim and 184K for .fam) on another cluster with a SLURM scheduler. I've requested 250G ram across 48 threads and it's all ready been running for two days so I'm wondering if there is a way to deal with this more efficiently (maximum wall time I'm currently allowed is 5 - would need to request more)?

#SBATCH --time=5-00:00:00
#SBATCH --mem=250G
#SBATCH --cpus-per-task=48

gcta64 --bfile ./alspac_moms --ld-score-region 200 --thread-num 48 --out alspac_moms

In the basic GREML tutorial it suggests you can split up the data by chromosome, like so:

gcta64 --bfile test --chr 1 --maf 0.01 --make-grm --out test_chr1 --thread-num 10
gcta64 --bfile test --chr 2 --maf 0.01 --make-grm --out test_chr2 --thread-num 10
gcta64 --bfile test --chr 22 --maf 0.01 --make-grm --out test_chr22 --thread-num 10

Is it possible to do something similar for GREML in WGS or imputed data? For example:

gcta64 --bfile test --chr 1 --ld-score-region 200 --out test_chr1
gcta64 --bfile test --chr 2 --ld-score-region 200 --out test_chr2
lds_seg = read.table("test_ch1.score.ld",header=T,colClasses=c("character",rep("numeric",8)))
lds_seg = read.table("test_ch2.score.ld",header=T,colClasses=c("character",rep("numeric",8)))

So that I would have stratified SNPs by segment-based LD score for each chromosome and then make GRMs for each of these groups:


And then perform the REML analysis on those 88 GRMs? Just wondering if that's a valid approach or if there's some way to deal with out-of-memory issues with large GWAS/imputed data?

gcta greml genome • 267 views
ADD COMMENTlink written 6 months ago by moldach130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1796 users visited in the last hour