How Do I Compute The Proportion Of Cases/Controls For Each Genotype From The Penetrance Function?
1
3
Entering edit mode
12.8 years ago
Rossella ▴ 370

Hi, I have a 3x3 table of penetrance values (one for each combination of genotypes for two loci, AA-AA, AA-AB, AA-BB, AB-AA,...). I also have a 3x3 table that tells me how many samples I have for each genotypic variable. How do I compute the proportion of cases for each genotype class from the penetrance function?

I am trying to build a simulated dataset of epistatic interactions but once I obtain the penetrance values I do not know how to generate the proportion between cases and controls.

All help will be extremely appreciated

Thanks

Rossella

simulation • 4.2k views
ADD COMMENT
2
Entering edit mode
12.8 years ago
Frederic ▴ 20

=============== ADDED ==================

If your goal is to generate cases and controls, the easiest way is to use the penetrances and the genotype combination frequencies. I suspect, that's what they have done in the paper. Suppose you want Naff affected and Nunaf unaffected individuals.

METHOD 1

at step "i"

1) Draw a genotype combination using the genotype combination distribution,

2) Draw a phenotype for this genotype combination using the penetrance,

3) if you already have Naff (Nunaf respectively) in your sample and the phenotype of this step is "affected" (unaffected resp) then drop the individual else store it,

4) while Aff-sample size < Naff OR Unaff-sample size < Nunaff go back to 1) for step "i+1".

The pitfall of METHOD 1 is that you'll drop lots of individuals for some values of penetrance...

-

METHOD 2

It requires that you have P(G/D) for all values of G and D.

1) for Aff, draw a genotype combination from P(G/D=Aff)

2) while Aff-sample size < Naff go back to 1)

3) same for Unaff (Yes, I'm lazy)

I see no pitfall to this METHOD, only it requires one more mathematical step to get P(G/D). Worth it.

In this kind of design (case/control), by simple countings, you can only estimate P(G/D) for both D=Aff and D=Unaff. Then you have classic Bayes relation to compute penetrance P(D/G) = P (G/D)*P(D)/P(G).

But, you'll need the prevalence and the genotype combination frequencies.

=============== OLD answer ==================

Hi,

let G be the genotype combination, then the penetrance for G is the probability of being affected given that individuals carry genotype combination G, p = P(Affected/G). So it is the proportion of affected in the population of individuals carrying the genotype combination G.

Now, to simulate, for each individual in your sample of genotype G (let's call it "sample G"), you draw a 0 to 1 value and each time the value is below or equal to the penetrance for G, you consider the individual as affected, if not the individual is considered as unaffected.

However, depending on what you want to do, you may have to choose the size of "sample G" so that it respects the frequency of G compared to other genotype combinations. If the frequency of AA-AA is 20% (in the general population, thus whatever the affection status), then "sample AA-AA" (affected and unaffected) should represent 20% of the total simulated sample. This could be randomly done as well.

Best wishes, Frederic.

ADD COMMENT
0
Entering edit mode

Hi Fred !

ADD REPLY
0
Entering edit mode

I am a bit confused because this definition does not fit the simulated data I have. Let consider for example the genotype combination aa/bb. I have a penetrance of 0.095, 18 individuals with this genotype in the general population and 10 cases with this genotype. A total of 800 samples with 400 cases and 400 controls. I don't get how the 10 is obtained. If you could clarify it would really help me. Thanks again

ADD REPLY
0
Entering edit mode

Let's just discuss the aa/bb. If I understand well, you have 18 individuals with genotypes aa/bb and 10 cases among the 18. This is about 50% penetrance. Maybe this is related to your 400/400 design. Could you describe the way you generate the cases and controls?

ADD REPLY
0
Entering edit mode

That is the problem, I downloaded the files from the web and I was trying to reverse engineering the algorithm. The only thing I have is the penetrance table and the number of cases/controls for each genotype class. The files were taken from: http://bioinformatics.ust.hk/BOOST.html

ADD REPLY
0
Entering edit mode

Hi Pierre! How is life going in Nantes? :-)

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6