Question: Odds ratio and gene mutations association
1
19 months ago by
lu.cappelli20
lu.cappelli20 wrote:

Hi everyone,

I have a simple (yet to me still trivial) problem to submit to you.

I have a dataset of a group of patients affected by a disease, for which the presence of several genes mutations was inferred. Each gene is a variable with either 0 for negative and 1 for positive. I need to assess the presence of associations between these genes, to establish whether some tend to be co-mutated while some other tend to be mutually exclusive. For doing this, I first analyzed all the genes possible combinations in 2x2 contingency tables such as:

in this case for example, the p value is very significant, so I thought it could be useful to compute the OR to establish a relationship. Here, for example, the OR obtained from the formula (OR=A x D/ C x B) is 0.53, hence it should mean that the two genes tend to be more in opposite directions (0-1 or 1-0) compared to same directions (0-0 or 1-1). However, my concern is that in this way it is not clear whether the two genes have a positive or negative correlation. Should I just compare the double positive (1-1) against the total of discordant cases (0-1 and 1-0)? In this case it would be 69/(131+428)=69/559=0,12. Is it useful?

However, each gene has a different % of mutation within the population, so for example gene 1 here has a .18 probability of being mutated whilst gene 2 has a .46 probability. Should I take this into account? I played around and tried to see how these 4 combinations would look like if they were only due to the each of the two genes expected mutation frequencies, so something like that came out:

final numbers are the same, but if you look at it, numbers are ridistributed according to the expected frequencies (ie: total no of mut gene 1 cases is 195/1059=0.18 which is the expected mut frequency of the gene). I then computed another OR for these numbers (12.76) and compared it with the previous one using Tarone´s test of homogeneity between the two tables (in this case, p-value is significant). From the simple division of each category from the "real life" table / the "expected frequencies" table I obtained a ratio (ie: 0/0 ratio=431/542=0.79, there are less double negative than expected). Do you think this is a correct reasoning? If so, should i use the 1/1 ratio to know if the relation is positive or negative (in this case 69/172=0,4, there are less double positive than expected so the genes are inversely correlated)?

I thank you in advance and look forward for your help! Best, Luca

sequencing odds-ratio R • 821 views
modified 19 months ago • written 19 months ago by lu.cappelli20

How to add images to a Biostars post

1

In this case, simple text would be better. @lu.cappelly please paste data as text, not images.

2
19 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

I'm not sure of the exact interpretation of the way that you're doing it. Indeed, these are not odds ratios (ORs) in the typical sense that we think. Odds ratios in genetic studies are usually calculated by comparing the minor and major alleles for a given variant / SNP across cases and controls. If you want to compare the effect of 2 genes, you could just combine the total minor and major alleles for each gene together. I made a post relatively recently about calculating ORs, CIs, and a Z-score for a given variant: A: SNP dataset and Z Score

If you are looking at mutation numbers in just the disease patients, then I'm still not sure of the clinical relevance of the result.

Previously, as part of private work (unpublished), I built a Bayesian logistic regression model with multiple variants predicting case-control status. By using a Bayesian model, I was able to derive a combined 'score' for the effect of all variants combined. You may consider such an approach.

Feel free to respond and we can go back and forth here. If your ideas are based on a particular publication, then please do share.

Kevin

I´m sorry I did not specify, but this is a population of patients affected by AML (Acute Myeloid Leukemia), which I should study for certain genes mutations (somatic mutations in AML cancer cells). I am focusing on a particular entity on AML that is NPM1 mutated AML, so this population is made of NPM1 positive AML patients. I need to check which genes other than NPM1 are mutated at diagnosis in these patients and if there are any associations between them. Then as a secondary step, I would need to see if compared to NPM1 negative AMLs these associations hold true or invert or disappear. For associations I mean: positive (or direct correlation) when they are usually mutated together and negative (or inverse correlation) when they tend to exclude each other. The output I am analyzing so far has only a positive or negative outcome (either 0 or 1), since these are all mutations known to have a detrimental effect so clinically speaking as long as they are detected it does not matter how much of the allele can be found (at least in this first part of my analysis), so for now I am trying to do the analysis without focusing on the VAF.

How would you then calculate the association of these genes? I tried to use OR because I was reading this paper (https://www.ncbi.nlm.nih.gov/pubmed/27276561) and many other where they usually make heatmaps like this one: https://ibb.co/gx9Rk8 Specifically on page 2216 of the cited article you can read: "Unlike IDH2R140 mutations, which show strong co-mutation with NPM1 (odds ratio for co-mutation, 3.6; P = 5×10−10), IDH2R172 mutations are mutually exclusive with NPM1 (odds ratio for co-mutation, 0.06; P = 4×10−5) and other class-defining lesions". I would like to get to the same statements of this article.

Just another brief comment: it would be great for me to establish a regression model, but I do not really know how to start with (I am a medicine student and my only bioinformatic knowledge derives from my personal curiosity). I was trying to use the easiest and most intuitive options I had prior to dive in the complicated world of models :) thanks again Luca

Hey dude, thanks for sharing the information! Grazie mille!

That figure looks very nice - molto bello. I lookat the Supplementary Material for the manuscript and found the following for the odds ratios:

Figure S3: Genetic interactions across 55 loci in our dataset mutated in > 15 patients. (A) Lower triangle shows pairwise associations among genes mutations, cytogenetic alterations and CAs ordered by molecular class. The color of each tile reflects the odds ratio for each pair whereby brown indicates mutual exclusivity (observed, relative to the expected co-mutation based on each alteration's gene frequency) and green indicates pairs that are co-mutated (found together more frequently than would be expected by each genes individual frequency).

It's not entirely clear but it looks like they did:

``````(Number co-mutated Gene 1 + Number co-mutated Gene 2) / (Number mutually exclusive Gene 1 + Number mutually exclusive Gene2)
``````

You may want to contact the authors just to be 100% certain

Oh, ok so not the conventional OR formula (a x d/ c x b). So according to your suggestion I should just take the double positive from table 1 (n=69) and divide them by the sum of the discordant positive (131+428)= 0,12. This makes surely sense, but I am afraid that this calculation could somehow flatten my results and almost all the other genes will show a "negative" correlation: the occurrence of double positives is by definition more rare because usually one driver mutation is enough for AML clones to start proliferating..maybe is just as is this, and I should worry less. But from the picture of the heatmap of that article it seems that in the genes section some of them are positively correlated, so I don´t know If I am reasoning well. I may do some other research and eventually contact the authors... Grazie mille ancora!

1

I think that you should contact the authors to confirm what they did. Whilst that figure looks pretty, I am not sure that it adds that much, which is probably why it is in the Supplementary Material and not the main text. It is merely comparing variants on a pairwise basis and says nothing about variants that may occur in 3s, 4s, 5s, etc.

1

Ok, I think I will do that. Thanks again for your help, Kevin! Best Luca