**200**wrote:

I begin with a matrix of genes (~30000 rows) vs cases (columns). I have binary - TRUE or FALSE - data in each cell indicating whether there is a copy number aberration spanning the gene for that case. Next, I sum each row to get the total number of cases from the sample set that have a copy number aberration in each gene. I will call this ** c** for each gene. Next, I retrieve a list of genes from http://cbio.mskcc.org/CancerGenes/Select.action (say the Entrez Query: Stability list, which yields 1023 genes). How can I test whether

**(the total number of cases with a copy number aberration for a given gene) among the Entrez Query: Stability list is significantly greater than**

*c***among the gene population in general? Furthermore, is there some way of calculating some uncertainty measure of my result based on the fact that not all genes actually underpinning stability are in the Entrez Query: Stability list? I am using R.**

*c*NB. My attempt (probably wrong and overly complicated):

- Bootstrap with replacement 100000 times to get 100000 samples each of size 1023 from the 30000 genes. The 1023 Stability genes are excluded from the population from which the genes for each sample are drawn.
- Use
`var.test`

acrossfor each sample*c**vs*the 1023 Stability gene list. If the test**does not**produce a significant value, that sample can be used in step 3. Samples that produce a significant value are discarded. This is done to satisfy the assumption of the Wilcoxon test that the variances between the two test samples are similar. - Do a Kolmogorov–Smirnov test (
`ks.test`

) for each sample brought forward*vs*the 1023 Stability gene list. If the test**does not**produce a significant value, that sample can be used in step 4. Samples that produce a significant value are discarded. This is done to satisfy the assumption of the Wilcoxon test that the distributions between the two test samples are similar. - Do a Wilcoxon test (
`wilcox.test`

) for each sample brought forward*vs*the 1023 Stability gene list. If the mean (or median?) p-value for these tests is significant, I can say thatis significantly greater for Stability genes than it is for non-Stability genes.*c*

**1.9k**• written 8.7 years ago by user1202664 •

**200**