Question: To find p-value of compounds (pairwise)
0
2.2 years ago by
n.osennij0 wrote:

I have next `SNPs` data.frame structure

``````> str(SNPs)
'data.frame':   1703 obs. of  4 variables:
\$ group: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
\$ rs1  : Factor w/ 3 levels "D/D","I/D","I/I": 1 1 2 3 3 2 1 1 1 1 ...
\$ rs2  : Factor w/ 3 levels "a/a","a/b","b/b": 3 3 2 3 3 2 2 3 3 2 ...
\$ rs3  : Factor w/ 3 levels "G/G","G/T","T/T": 2 1 2 1 1 3 1 1 2 1 ...
...other rs

group rs1 rs2 rs3 ...other rs
1     A D/D b/b G/T
2     A D/D b/b G/G
3     A I/D a/b G/T
4     A I/I b/b G/G
5     A I/I b/b G/G
6     A I/D a/b T/T
``````

For example, I noticed that `rs5 4b/4a` and `rs6 G/G` in `group A` very often occur together (see below). In `group B` they very often occur together too. So I want to know - is it statistical regularity or not.

I can create table with all pairs in both groups

``````> SNPs\$rs5_rs6 <- paste(SNPs\$rs5, SNPs\$rs6)
> tmp <- table(SNPs\$rs5_rs6, SNPs\$group)
> tmp

A   B
4a/4a G/G   1  20
4b/4a G/G  31  83
4b/4a G/T  14  51
4b/4a T/T   1   0
4b/4b G/G  37 106
4b/4b G/T  35 119
4b/4b T/T  11  31
``````

So, now I need compare (find p-value) group A and group B: `4a/4a G/G` in group A vs `4a/4a G/G` in group B, `4b/4a G/G` in group A vs `4b/4a G/G` in group B, `4b/4a G/T` in group A vs `4b/4a G/T` in group B and etc.

How can I do that?

I need using chisq test or something else get p-values for each row in `tmp`

``````              A   B
4a/4a G/G   1  20 - p-value?
4b/4a G/G  31  83 - p-value?
4b/4a G/T  14  51 - p-value?
4b/4a T/T   1   0 - p-value?
4b/4b G/G  37 106 - p-value?
4b/4b G/T  35 119 - p-value?
4b/4b T/T  11  31 - p-value?
``````
snp R gene • 560 views
modified 2.2 years ago by scientificb0 • written 2.2 years ago by n.osennij0
0
2.2 years ago by
scientificb0 wrote:

Don't paste the SNPs together. Make a table directly as in,

``````tbl = table(rs5=SNPs\$rs5, rs6=SNPs\$rs6)
xsq = chisq.test(tbl)
xsq\$p.value
``````

This is a chisq test for independence - which seems like it might be what you want.

A p-value per row is not what you want to do (although I can't say that with 100% surety).

@scientificb Genetic analysis of people was carried out. Samples are divided into two groups - sick and healthy. It is important to find complexes of polymorphisms that differ in groups. The contribution of each gene to the disease individually is not very significant. And the contribution of several genes in the complex is much stronger. Therefore, I need to find such complexes. In my example, there are two polymorphisms - rs5 (4a/4a, 4b/4a and 4b/4b) and rs6 (G/G, G/T and G/G). I assume that the 4b / 4b G / G combination is much more common in the group of patients than in the healthy group. I want to know whether statistically significant differences are significant or not. Is this complex more common in patients than in healthy ones. For this I decided to do so: I choose two rs, I get all the combinations of genotypes and their number in each group (tmp <- table(SNPs\$rs5_rs6, SNPs\$group)). And now I'm trying to find out whether the differences are statistically significant or not thats all. I hope that explained clearly