I have next SNPs
data.frame structure
> str(SNPs)
'data.frame': 1703 obs. of 4 variables:
$ group: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
$ rs1 : Factor w/ 3 levels "D/D","I/D","I/I": 1 1 2 3 3 2 1 1 1 1 ...
$ rs2 : Factor w/ 3 levels "a/a","a/b","b/b": 3 3 2 3 3 2 2 3 3 2 ...
$ rs3 : Factor w/ 3 levels "G/G","G/T","T/T": 2 1 2 1 1 3 1 1 2 1 ...
...other rs
> head(SNPs)
group rs1 rs2 rs3 ...other rs
1 A D/D b/b G/T
2 A D/D b/b G/G
3 A I/D a/b G/T
4 A I/I b/b G/G
5 A I/I b/b G/G
6 A I/D a/b T/T
For example, I noticed that rs5 4b/4a
and rs6 G/G
in group A
very often occur together (see below). In group B
they very often occur together too. So I want to know - is it statistical regularity or not.
I can create table with all pairs in both groups
> SNPs$rs5_rs6 <- paste(SNPs$rs5, SNPs$rs6)
> tmp <- table(SNPs$rs5_rs6, SNPs$group)
> tmp
A B
4a/4a G/G 1 20
4b/4a G/G 31 83
4b/4a G/T 14 51
4b/4a T/T 1 0
4b/4b G/G 37 106
4b/4b G/T 35 119
4b/4b T/T 11 31
So, now I need compare (find p-value) group A and group B: 4a/4a G/G
in group A vs 4a/4a G/G
in group B, 4b/4a G/G
in group A vs 4b/4a G/G
in group B, 4b/4a G/T
in group A vs 4b/4a G/T
in group B and etc.
How can I do that?
I need using chisq test or something else get p-values for each row in tmp
A B
4a/4a G/G 1 20 - p-value?
4b/4a G/G 31 83 - p-value?
4b/4a G/T 14 51 - p-value?
4b/4a T/T 1 0 - p-value?
4b/4b G/G 37 106 - p-value?
4b/4b G/T 35 119 - p-value?
4b/4b T/T 11 31 - p-value?
@scientificb Genetic analysis of people was carried out. Samples are divided into two groups - sick and healthy. It is important to find complexes of polymorphisms that differ in groups. The contribution of each gene to the disease individually is not very significant. And the contribution of several genes in the complex is much stronger. Therefore, I need to find such complexes. In my example, there are two polymorphisms - rs5 (4a/4a, 4b/4a and 4b/4b) and rs6 (G/G, G/T and G/G). I assume that the 4b / 4b G / G combination is much more common in the group of patients than in the healthy group. I want to know whether statistically significant differences are significant or not. Is this complex more common in patients than in healthy ones. For this I decided to do so: I choose two rs, I get all the combinations of genotypes and their number in each group (tmp <- table(SNPs$rs5_rs6, SNPs$group)). And now I'm trying to find out whether the differences are statistically significant or not thats all. I hope that explained clearly
@scientificb is isset other solution - I will be extremely grateful