Question

To find p-value of compounds (pairwise)

0

Entering edit mode

5.8 years ago

n.osennij • 0

I have next SNPs data.frame structure

> str(SNPs)
'data.frame':   1703 obs. of  4 variables:
 $ group: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ rs1  : Factor w/ 3 levels "D/D","I/D","I/I": 1 1 2 3 3 2 1 1 1 1 ...
 $ rs2  : Factor w/ 3 levels "a/a","a/b","b/b": 3 3 2 3 3 2 2 3 3 2 ...
 $ rs3  : Factor w/ 3 levels "G/G","G/T","T/T": 2 1 2 1 1 3 1 1 2 1 ...
 ...other rs

> head(SNPs)
  group rs1 rs2 rs3 ...other rs
1     A D/D b/b G/T
2     A D/D b/b G/G
3     A I/D a/b G/T
4     A I/I b/b G/G
5     A I/I b/b G/G
6     A I/D a/b T/T

For example, I noticed that rs5 4b/4a and rs6 G/G in group A very often occur together (see below). In group B they very often occur together too. So I want to know - is it statistical regularity or not.

I can create table with all pairs in both groups

> SNPs$rs5_rs6 <- paste(SNPs$rs5, SNPs$rs6)
    > tmp <- table(SNPs$rs5_rs6, SNPs$group)
    > tmp

                  A   B
      4a/4a G/G   1  20
      4b/4a G/G  31  83
      4b/4a G/T  14  51
      4b/4a T/T   1   0
      4b/4b G/G  37 106
      4b/4b G/T  35 119
      4b/4b T/T  11  31

So, now I need compare (find p-value) group A and group B: 4a/4a G/G in group A vs 4a/4a G/G in group B, 4b/4a G/G in group A vs 4b/4a G/G in group B, 4b/4a G/T in group A vs 4b/4a G/T in group B and etc.

How can I do that?

I need using chisq test or something else get p-values for each row in tmp

              A   B
  4a/4a G/G   1  20 - p-value?
  4b/4a G/G  31  83 - p-value?
  4b/4a G/T  14  51 - p-value?
  4b/4a T/T   1   0 - p-value?
  4b/4b G/G  37 106 - p-value?
  4b/4b G/T  35 119 - p-value?
  4b/4b T/T  11  31 - p-value?

R snp gene • 1.2k views

ADD COMMENT • link updated 5.8 years ago by scientificb • 0 • written 5.8 years ago by n.osennij • 0

score 0 · Answer 1 · 2018-07-23

0

Entering edit mode

5.8 years ago

scientificb • 0

Don't paste the SNPs together. Make a table directly as in,

tbl = table(rs5=SNPs$rs5, rs6=SNPs$rs6)
xsq = chisq.test(tbl)
xsq$p.value

This is a chisq test for independence - which seems like it might be what you want.

A p-value per row is not what you want to do (although I can't say that with 100% surety).

ADD COMMENT • link 5.8 years ago by scientificb • 0

0

Entering edit mode

@scientificb Genetic analysis of people was carried out. Samples are divided into two groups - sick and healthy. It is important to find complexes of polymorphisms that differ in groups. The contribution of each gene to the disease individually is not very significant. And the contribution of several genes in the complex is much stronger. Therefore, I need to find such complexes. In my example, there are two polymorphisms - rs5 (4a/4a, 4b/4a and 4b/4b) and rs6 (G/G, G/T and G/G). I assume that the 4b / 4b G / G combination is much more common in the group of patients than in the healthy group. I want to know whether statistically significant differences are significant or not. Is this complex more common in patients than in healthy ones. For this I decided to do so: I choose two rs, I get all the combinations of genotypes and their number in each group (tmp <- table(SNPs$rs5_rs6, SNPs$group)). And now I'm trying to find out whether the differences are statistically significant or not thats all. I hope that explained clearly