Question: p value from multiple columns using R??
0
gravatar for Ankit
4 weeks ago by
Ankit140
Ankit140 wrote:

Hi all,

I have data.frame of 6 columns. I want to calculate p.value for each row for all 6 columns. How to do this?

It might be possible values at the same row is positive or negative depending upon the samples type. So what is the script for this in R???

my data.frame looks like this:

                                  S1      S2         S3         S4         S5      S6     
chr10%100011600%100011900 **0.96627103 0.96794047 0.96125479 0.96429540 0.97379428 0.96927928** example p.value for this?
chr10%100155000%100155300 0.91513668 0.93109224 0.93609273 0.89549083 0.92005647 0.91510770
chr10%100174800%100175100 0.02461070 0.02286753 0.02089150 0.03185857 0.03231618 0.02724574
chr10%100206600%100206900 0.07662103 0.08028024 0.08027826 0.07767043 0.08447515 0.07026636
chr10%100993500%100993800 0.06651217 0.05231399 0.04417633 0.13046962 0.09304180 0.06435386
chr10%100995600%100995900 0.13325158 0.10427350 0.10157604 0.15673318 0.14602494 0.13762823
p.value statistical • 155 views
ADD COMMENTlink modified 4 weeks ago by Sam3.0k • written 4 weeks ago by Ankit140

What is the statistical test you want to execute?

ADD REPLYlink written 4 weeks ago by Asaf8.3k

I would like to apply unpaired two tailed t.test

ADD REPLYlink written 4 weeks ago by Ankit140

If you have other suggestions please let me know for 6 or more samples?

ADD REPLYlink written 4 weeks ago by Ankit140

Testing what against what?

ADD REPLYlink written 4 weeks ago by Asaf8.3k

Compare S1,S2,S3 (control) with respect to S4,S5,S6 (cases).

Basically before getting this matrix.

  1. I took average of all controls (S1 - S3).

  2. Subtracted all control individually from average.eg Cntrlavg -S1, Cntrlavg -S2, Cntrlavg -S3 = Control Set

  3. Subtracted all cases individually from average.eg Cntrlavg -S4, Cntrlavg -S5, Cntrlavg -S6 = Case set

What you are seeing in the above matrix are the subtracted value and they can be negative too.

Now I want to compare all controls versus all cases .

How to do this?

Am I missing something after subtraction and before statistical analysis?

Please help

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ankit140

I think unpaired t-test or mannwhitney U test may help you in comparison.

ADD REPLYlink written 4 weeks ago by cpad011213k

OK can you share the R script for that? It will be very helpful.

I tried this:

myp.values <- apply(mymatrix[,1:6], 1, function(x) t.test(x[,1:3],x[,4:6], paired=FALSE))

But it does not seems to be working.

ADD REPLYlink written 4 weeks ago by Ankit140
mymatrix$p.value= apply(mymatrix, 1, function(x) t.test(x[1:3],x[4:6])$p.value)
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by cpad011213k

It seems it gives same p.values for all the regions? Why so? Something wrong I'm doing :

My script:

t.result <- apply(mymatrix, 1, function(x) t.test(mymatrix[,24:35],mymatrix[,36:39], paired=FALSE)$p.value)

I applied this on large matrix 24:35 are all controls , 36:39 are all cases.

Is the problem happening due to subtraction of all samples from average of controls?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ankit140
apply(mymatrix, 1, function(x) t.test(x[24:35],x[36:39], paired=FALSE)$p.value
ADD REPLYlink written 4 weeks ago by cpad011213k

My first advise would be to use packages that do this sort of tests, for multiple reasons:

  1. Your normalization might be correct but it has some underlying assumptions, you should be aware to it.
  2. You have multiple hypotheses that you should correct for
  3. You don't have enough power in each row for a statistical test, there are packages that could better estimate the variance and have a more powerful test.

If your input is counts I would suggest edgeR or DESeq2. Otherwise I guess limma could be used but I'm not sure if it has assumptions about the input (unless yours is microarray)

ADD REPLYlink written 4 weeks ago by Asaf8.3k

Thanks for suggestion.

My data is a methylation array. But after the subtraction from control average, the values becomes positive or negative. Now these are neither the counts, or normalized expression values or beta values or M values. They are just integers values obtained after subtractions.

If you think sample size a limitation, I will any way increase the control to 40 samples and cases to 20 samples (using public and our old data after batch adjustment).

Since Deseq2 and edgeR require raw counts, my dataset is methylation array with beta values ranged from 0 - 1, So DEseq2 or edgeR might not be the right tool here.

You mentioned about limma package. Do you think if I can input mymatrix to limma for calculating significance region wise (row-wise)?

Some methylation array packages identufy differential methylation too but here I need to select the cases based on the particular phenotypes. So I will add or remove the case to find the most observed differentially methylated loci.

The one possibility is use the original batch corrected matrix (without any subtraction) and give the same group ID to selected cases (alternative combinations) and then search for differential methylation. But that is very tedious. So I already subtracted everything from control average and want to apply statistcal test between all control v/s all/selected cases to obtain significantly different regions.

I will appreciate any help or suggestions.

Thanks

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ankit140

Have you looked at this paper? I think limma is appropriate here, especially if you need to do batch effect correction which can be a part of the linear model.

ADD REPLYlink written 4 weeks ago by Asaf8.3k

Does that mean you want row 1 compare with all other rows, the repeat? Or are you comparing S1 vs S2, S1 vs S3, ..., S5 vs S6?

ADD REPLYlink written 4 weeks ago by Sam3.0k

Hi Sam,

Please see my above comment for comparison purposes. Basically I want to compare S1-S3 v/s S4-S6 for each row and add p.value in parallel column. Then apply benjamini hochberg for multiple correction

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ankit140
1
gravatar for Sam
4 weeks ago by
Sam3.0k
New York
Sam3.0k wrote:

Using data.table (assuming your data is called dt), you can do a t.test on each row with the following code. Though as others have suggested, you should really think about your experimental design and use the appropriate statistic model, otherwise you will just get meaningless p-values.

library(data.table)
dt$name <- row.names(dt)
dt <- as.data.table(dt)
dt[,{
    model <- t.test(x=c(S1,S2,S3),y=c(S4,S5,S6))
    list(p=model$p.value,
          z=model$statistic)
}, by="name"]
ADD COMMENTlink written 4 weeks ago by Sam3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 668 users visited in the last hour