statistics method to find the relation between multiple variables of 50 samples
0
0
Entering edit mode
4.3 years ago
maria2019 ▴ 250

Hi,

I have 50 samples and each sample has 3 variables (example data structure is as below)

     Samples   Var1  Var2  Var3
      S1       10     5     2
      S2       1      4     6
      S2       5      4     0
      .        .      .     .
      S50      2      1     10

What is the right statistics method to find the relation between theses variable and find out if there is a pattern in these variable?

statistics • 1.2k views
ADD COMMENT
0
Entering edit mode

Other than pairwise correlation?

ADD REPLY
0
Entering edit mode

yes, I thought there might be some other new methods to work on it

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I am mostly interested to see, say if variable 1 increases, what happens at variable 2, etc. I know that PCA can show the separation between the groups but I need some more detailed method

ADD REPLY
0
Entering edit mode

General simple test then - Spearman correlation, however, your distributions seems to be zero inflated and with many ties. Glm may help, but then you need to do plotting.

ADD REPLY
0
Entering edit mode

what exactly are these variables? maybe gene expression?

ADD REPLY
0
Entering edit mode

I actually am looking RNA-seq (FPKM (log2)), ATAC-seq data (Normalized RPKM ), and methylation % at 50 genes from one sample. Samples are my gene names and variables are RNA-seq, ATAC-seq, and methylation data. I wanna see the relation between these three data at these genes.

ADD REPLY
1
Entering edit mode

So you want to see if there is a linear model gene ~ ATAC + methylation? Just put the data into a data.frame and use lm()

ADD REPLY
0
Entering edit mode

Thank you very much! I just tried it

One question, I also tried cor(mydata). I am gonna read through this more in depth but do you think I should get the same result for cor and lm? (mine are different when I try these data)

ADD REPLY
1
Entering edit mode

See here about comparison of lm to cor: https://lindeloev.github.io/tests-as-linear/

ADD REPLY
0
Entering edit mode

The first method is plotting. Each variable separately and one against another. In r you should use functions plot(density(var)) and plot(var1, var2)

ADD REPLY
0
Entering edit mode

I will try it thanks

ADD REPLY
1
Entering edit mode

It may sound obvious, but it does worth time spent 100 percents - linear correlation is good only for some particular data distributions and only plotting may give you an intuition what is correct choice of the test (eg https://en.m.wikipedia.org/wiki/Anscombe%27s_quartet

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6