Question: Identify Genes Harbouring More Mutations Than Expected And Their Significance
gravatar for fo3c
6.2 years ago by
fo3c430 wrote:


I am looking at mutations in exome data and would like to identify which genes harbour more mutations than would be expected given the average mutation rate of my cohort.

Currently, I model the number of mutations as a Poisson random variable with parameter lambda = average mutation rate per Mb * gene length in Mb. However, the expected number of mutations is very low, and the observations appear significant (p < 0.05) in all cases. E.g. I observe one mutation where I exepct 0.019948561 mutations, for a p-value = 1.963461e-04.

Is there a better way to do this? Should I improve the model, or is there a clever way to correct the p-values? In R, p.adjust results in a very small change to the p-values.

exome p-value statistics mutation R • 2.2k views
ADD COMMENTlink modified 6.2 years ago by Malachi Griffith17k • written 6.2 years ago by fo3c430
gravatar for Malachi Griffith
6.2 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

The tool MuSiC and accompanying paper: MuSiC: identifying mutational significance in cancer genomes has a section for identifying 'significantly mutated gene tests'. That discussion seems possibly relevant to this question. From the paper:

We use the concept of “significantly mutated genes” (SMG) to describe genes that show a significantly higher mutation rate than the background mutation rate (BMR) when multiple mutational mechanisms (coding indel and single nucleotide substitution, splice site mutation, etc.) are considered. Specialized measurements of the BMR may also be considered; BMRs in MuSiC are optionally calculated across the entire sample set, across particular subgroups of similarly mutated samples, or for each sample individually. For each BMR subgroup considered and for each category of mutational mechanism, the mutation rates are compared to the appropriate BMR, and a single P-value summarizing all considerations is generated for each gene. We refer to this summarization procedure as the significantly mutated gene (SMG) test.

We assessed multiple methods of calculating summarized P-values, including a convolution test (CT), a Fisher's combined P-value test (FCPT), and the likelihood ratio test (LRT), using a partially simulated data set (this data set and the associated test simulations are described in the Supplemental Material). By this approach, we determined that the P-value distribution obtained using the CT method most closely resembled the uniform distribution expected under the null (in this case, the null is such that no gene is truly significantly mutated), while the FCPT and LRT methods produced slightly inflated or deflated P-values, respectively (Supplemental Fig. S1). During the SMG test, a false discovery rate (FDR) also is calculated. We evaluate our SMG test results by establishing a P-value or FDR threshold (threshold typically 0.2 or less for FDR), and then appropriately filtering the test output.

ADD COMMENTlink written 6.2 years ago by Malachi Griffith17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour