Question: To find p-value of compounds (pairwise)
gravatar for n.osennij
2.2 years ago by
n.osennij0 wrote:

I have next SNPs data.frame structure

> str(SNPs)
'data.frame':   1703 obs. of  4 variables:
 $ group: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ rs1  : Factor w/ 3 levels "D/D","I/D","I/I": 1 1 2 3 3 2 1 1 1 1 ...
 $ rs2  : Factor w/ 3 levels "a/a","a/b","b/b": 3 3 2 3 3 2 2 3 3 2 ...
 $ rs3  : Factor w/ 3 levels "G/G","G/T","T/T": 2 1 2 1 1 3 1 1 2 1 ...
 ...other rs

> head(SNPs)
  group rs1 rs2 rs3 ...other rs
1     A D/D b/b G/T
2     A D/D b/b G/G
3     A I/D a/b G/T
4     A I/I b/b G/G
5     A I/I b/b G/G
6     A I/D a/b T/T

For example, I noticed that rs5 4b/4a and rs6 G/G in group A very often occur together (see below). In group B they very often occur together too. So I want to know - is it statistical regularity or not.

I can create table with all pairs in both groups

> SNPs$rs5_rs6 <- paste(SNPs$rs5, SNPs$rs6)
    > tmp <- table(SNPs$rs5_rs6, SNPs$group)
    > tmp

                  A   B
      4a/4a G/G   1  20
      4b/4a G/G  31  83
      4b/4a G/T  14  51
      4b/4a T/T   1   0
      4b/4b G/G  37 106
      4b/4b G/T  35 119
      4b/4b T/T  11  31

So, now I need compare (find p-value) group A and group B: 4a/4a G/G in group A vs 4a/4a G/G in group B, 4b/4a G/G in group A vs 4b/4a G/G in group B, 4b/4a G/T in group A vs 4b/4a G/T in group B and etc.

How can I do that?

I need using chisq test or something else get p-values for each row in tmp

              A   B
  4a/4a G/G   1  20 - p-value?
  4b/4a G/G  31  83 - p-value?
  4b/4a G/T  14  51 - p-value?
  4b/4a T/T   1   0 - p-value?
  4b/4b G/G  37 106 - p-value?
  4b/4b G/T  35 119 - p-value?
  4b/4b T/T  11  31 - p-value?
snp R gene • 560 views
ADD COMMENTlink modified 2.2 years ago by scientificb0 • written 2.2 years ago by n.osennij0
gravatar for scientificb
2.2 years ago by
scientificb0 wrote:

Don't paste the SNPs together. Make a table directly as in,

tbl = table(rs5=SNPs$rs5, rs6=SNPs$rs6)
xsq = chisq.test(tbl)

This is a chisq test for independence - which seems like it might be what you want.

A p-value per row is not what you want to do (although I can't say that with 100% surety).

ADD COMMENTlink written 2.2 years ago by scientificb0

@scientificb Genetic analysis of people was carried out. Samples are divided into two groups - sick and healthy. It is important to find complexes of polymorphisms that differ in groups. The contribution of each gene to the disease individually is not very significant. And the contribution of several genes in the complex is much stronger. Therefore, I need to find such complexes. In my example, there are two polymorphisms - rs5 (4a/4a, 4b/4a and 4b/4b) and rs6 (G/G, G/T and G/G). I assume that the 4b / 4b G / G combination is much more common in the group of patients than in the healthy group. I want to know whether statistically significant differences are significant or not. Is this complex more common in patients than in healthy ones. For this I decided to do so: I choose two rs, I get all the combinations of genotypes and their number in each group (tmp <- table(SNPs$rs5_rs6, SNPs$group)). And now I'm trying to find out whether the differences are statistically significant or not thats all. I hope that explained clearly

ADD REPLYlink written 2.2 years ago by n.osennij0

@scientificb is isset other solution - I will be extremely grateful

ADD REPLYlink written 2.2 years ago by n.osennij0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1636 users visited in the last hour