Question: Simulate Associated Phenotype From Existing Genotype Data In R
2
gravatar for jamespower371
7.1 years ago by
jamespower37140 wrote:

Hi,

I am looking for a way to simulate a phenotype associated with several loci starting from an existing genotype data in R. Is it possible to do this simulation directly in R? I am not a statistician and I cannot find a good reference to follow for calculations that can help me simulate an association with several loci and several causal SNPs, explaining a certain percentage of variance and with a certain heritability. If anybody has some idea about how to do this type of calculation directly in R I would appreciate any help.

Thank you in advance. James

gwas simulation • 4.1k views
ADD COMMENTlink modified 7.0 years ago by Genotepes950 • written 7.1 years ago by jamespower37140
0
gravatar for Josh Herr
7.1 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

I'm by no means knowledgable in this area, but one of my co-workers uses a few programs for modeling phenotypes to genetic map data estimated from SNPs.

She uses phenosim and simrare; I think simrare is compatible with R. You might also want to check out the hypred package in R, but I'm not sure if it specifically will meet your needs.

ADD COMMENTlink modified 9 weeks ago by RamRS25k • written 7.1 years ago by Josh Herr5.7k

Thank you Josh. It looks like all of the softwares either require that either the genotype data be generated with the same software or that the genotype data be manipulated as if I was using the same software. I thought there was a simpler way in R with either some commands or a R package. If there is indeed a way to specify commands in R or an R package for this please someone let me know, otherwise it looks like I would need to use a software like the ones Josh suggested.

Thank you

ADD REPLYlink written 7.0 years ago by jamespower37140
0
gravatar for Genotepes
7.0 years ago by
Genotepes950
Nantes (France)
Genotepes950 wrote:

Hi

I think GCTA can do what you want. It is devised for genome-wide data but I think it can used with a more restricted data set. The --simu-causal-loci causal.snplist option allows you to choose which SNPs are causal.

You will need to have an idea of the relationship of your effects (additive so equivalent to the +a of the biometrical model) and the total variance - to avoid a set of SNPs explaining more that 100% of the variance. But besides this problem, it looks like the program is easy to handle (and wrapped into a R command, although this means you need to call it many times).

Christian

http://www.complextraitgenomics.com/software/gcta/Simu.html

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Genotepes950

Thank you Christian. I had seen this actually, but I would need to modify the input files to be the same as MACH output since I am not using this software for imputation, but GCTA could help find the calculations to just do this in R: I guess GCTA works by generating the effect of causal variants from a standard normal distribution, and residuals are generated from a normal distribution with mean 0 and variance = sd(1/(h^2 - 1)), so maybe this is all it is needed?

ADD REPLYlink written 7.0 years ago by jamespower37140

So, does this mean you need to generate a model where imputed SNPs are causal? I think you'll need to put a threshold and give a plain genotype

As for model generation, you are right.

ADD REPLYlink written 7.0 years ago by Genotepes950

Thanks again Christian. Yes that is correct, sorry if it wasn't clear: I am trying to generate a phenotype associated with several imputed causal SNPs

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by jamespower37140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 658 users visited in the last hour