Any practical example/tutorial on how to simulate phenotypes from 1000 Genomes?
Entering edit mode
4.6 years ago
b.ambrozio ▴ 30

I'm looking for a way to simulate phenotypes against a real SNP data source, such as the 1000 Genomes. It must be free for commercial purpose (Eg.: MIT license). Any recommendation? I'm trying to use the GCTA64, but I couldn't get it working. The documentation doesn't help much as it doesn't have practical examples/tutorials. At the end of the day, I want: 1 - Simulate Case/Control and/or quantitative phenotype 2 - Link it with a real SNP dataset (eg: 1000 Genomes) 3 - Conduct GWAS analysis using Plink and/or Hail.

Looking at Hail's and Plink's tutorials for GWAS, I realised both use simulated phenotypes from real SNP data sources (1000 Genomes and HapMap, respectively), but how they created the datasets are beyond of the scope of the tutorials, thus not reported.

As mentioned before, I've tried gcta64, but no success. Here's what I've tried:

1 - Downloaded 1000 Genome sample from Plink page: Entire dataset as a single .tar.gz (1.12 GB) (A2 allele major, not ref, on chr3 before 15 Oct 2017)

2 - Tried to generate the simulate data by: ./gcta64 --bfile 1kGenomesP1/1kg_phase1_all/1kg_phase1_all --simu-qt --simu-causal-loci causal.snplist --simu-hsq 0.5 --simu-rep 3 --keep test .indi.list --out 1kg_phase1_all

Error: Error: --keep test.indi.list not found.

What should be the files:causal.snplist, test.indi.list? Any practical example or tutorial?

Btw - I apologise in advance if this is a too trivial question. I'm quite new at it. Appreciate your patience and help :)

GCTA 1000Genomes • 1.4k views
Entering edit mode

Have you got a chance to figure out the steps? I am doing the same thing


Login before adding your answer.

Traffic: 1106 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6