To find p-value of compounds (pairwise)
1
0
Entering edit mode
5.8 years ago
n.osennij • 0

I have next SNPs data.frame structure

> str(SNPs)
'data.frame':   1703 obs. of  4 variables:
 $ group: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ rs1  : Factor w/ 3 levels "D/D","I/D","I/I": 1 1 2 3 3 2 1 1 1 1 ...
 $ rs2  : Factor w/ 3 levels "a/a","a/b","b/b": 3 3 2 3 3 2 2 3 3 2 ...
 $ rs3  : Factor w/ 3 levels "G/G","G/T","T/T": 2 1 2 1 1 3 1 1 2 1 ...
 ...other rs

> head(SNPs)
  group rs1 rs2 rs3 ...other rs
1     A D/D b/b G/T
2     A D/D b/b G/G
3     A I/D a/b G/T
4     A I/I b/b G/G
5     A I/I b/b G/G
6     A I/D a/b T/T

For example, I noticed that rs5 4b/4a and rs6 G/G in group A very often occur together (see below). In group B they very often occur together too. So I want to know - is it statistical regularity or not.

I can create table with all pairs in both groups

> SNPs$rs5_rs6 <- paste(SNPs$rs5, SNPs$rs6)
    > tmp <- table(SNPs$rs5_rs6, SNPs$group)
    > tmp

                  A   B
      4a/4a G/G   1  20
      4b/4a G/G  31  83
      4b/4a G/T  14  51
      4b/4a T/T   1   0
      4b/4b G/G  37 106
      4b/4b G/T  35 119
      4b/4b T/T  11  31

So, now I need compare (find p-value) group A and group B: 4a/4a G/G in group A vs 4a/4a G/G in group B, 4b/4a G/G in group A vs 4b/4a G/G in group B, 4b/4a G/T in group A vs 4b/4a G/T in group B and etc.

How can I do that?

I need using chisq test or something else get p-values for each row in tmp

              A   B
  4a/4a G/G   1  20 - p-value?
  4b/4a G/G  31  83 - p-value?
  4b/4a G/T  14  51 - p-value?
  4b/4a T/T   1   0 - p-value?
  4b/4b G/G  37 106 - p-value?
  4b/4b G/T  35 119 - p-value?
  4b/4b T/T  11  31 - p-value?
R snp gene • 1.2k views
ADD COMMENT
0
Entering edit mode
5.8 years ago

Don't paste the SNPs together. Make a table directly as in,

tbl = table(rs5=SNPs$rs5, rs6=SNPs$rs6)
xsq = chisq.test(tbl)
xsq$p.value

This is a chisq test for independence - which seems like it might be what you want.

A p-value per row is not what you want to do (although I can't say that with 100% surety).

ADD COMMENT
0
Entering edit mode

@scientificb Genetic analysis of people was carried out. Samples are divided into two groups - sick and healthy. It is important to find complexes of polymorphisms that differ in groups. The contribution of each gene to the disease individually is not very significant. And the contribution of several genes in the complex is much stronger. Therefore, I need to find such complexes. In my example, there are two polymorphisms - rs5 (4a/4a, 4b/4a and 4b/4b) and rs6 (G/G, G/T and G/G). I assume that the 4b / 4b G / G combination is much more common in the group of patients than in the healthy group. I want to know whether statistically significant differences are significant or not. Is this complex more common in patients than in healthy ones. For this I decided to do so: I choose two rs, I get all the combinations of genotypes and their number in each group (tmp <- table(SNPs$rs5_rs6, SNPs$group)). And now I'm trying to find out whether the differences are statistically significant or not thats all. I hope that explained clearly

ADD REPLY
0
Entering edit mode

@scientificb is isset other solution - I will be extremely grateful

ADD REPLY

Login before adding your answer.

Traffic: 1320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6