p value from multiple columns using R??
1
1
Entering edit mode
4.3 years ago
Ankit ▴ 500

Hi all,

I have data.frame of 6 columns. I want to calculate p.value for each row for all 6 columns. How to do this?

It might be possible values at the same row is positive or negative depending upon the samples type. So what is the script for this in R???

my data.frame looks like this:

                                  S1      S2         S3         S4         S5      S6     
chr10%100011600%100011900 **0.96627103 0.96794047 0.96125479 0.96429540 0.97379428 0.96927928** example p.value for this?
chr10%100155000%100155300 0.91513668 0.93109224 0.93609273 0.89549083 0.92005647 0.91510770
chr10%100174800%100175100 0.02461070 0.02286753 0.02089150 0.03185857 0.03231618 0.02724574
chr10%100206600%100206900 0.07662103 0.08028024 0.08027826 0.07767043 0.08447515 0.07026636
chr10%100993500%100993800 0.06651217 0.05231399 0.04417633 0.13046962 0.09304180 0.06435386
chr10%100995600%100995900 0.13325158 0.10427350 0.10157604 0.15673318 0.14602494 0.13762823
p.value statistical • 7.2k views
ADD COMMENT
0
Entering edit mode

What is the statistical test you want to execute?

ADD REPLY
0
Entering edit mode

I would like to apply unpaired two tailed t.test

ADD REPLY
0
Entering edit mode

If you have other suggestions please let me know for 6 or more samples?

ADD REPLY
0
Entering edit mode

Testing what against what?

ADD REPLY
0
Entering edit mode

Compare S1,S2,S3 (control) with respect to S4,S5,S6 (cases).

Basically before getting this matrix.

  1. I took average of all controls (S1 - S3).

  2. Subtracted all control individually from average.eg Cntrlavg -S1, Cntrlavg -S2, Cntrlavg -S3 = Control Set

  3. Subtracted all cases individually from average.eg Cntrlavg -S4, Cntrlavg -S5, Cntrlavg -S6 = Case set

What you are seeing in the above matrix are the subtracted value and they can be negative too.

Now I want to compare all controls versus all cases .

How to do this?

Am I missing something after subtraction and before statistical analysis?

Please help

ADD REPLY
0
Entering edit mode

I think unpaired t-test or mannwhitney U test may help you in comparison.

ADD REPLY
0
Entering edit mode

OK can you share the R script for that? It will be very helpful.

I tried this:

myp.values <- apply(mymatrix[,1:6], 1, function(x) t.test(x[,1:3],x[,4:6], paired=FALSE))

But it does not seems to be working.

ADD REPLY
0
Entering edit mode
mymatrix$p.value= apply(mymatrix, 1, function(x) t.test(x[1:3],x[4:6])$p.value)
ADD REPLY
0
Entering edit mode

It seems it gives same p.values for all the regions? Why so? Something wrong I'm doing :

My script:

t.result <- apply(mymatrix, 1, function(x) t.test(mymatrix[,24:35],mymatrix[,36:39], paired=FALSE)$p.value)

I applied this on large matrix 24:35 are all controls , 36:39 are all cases.

Is the problem happening due to subtraction of all samples from average of controls?

ADD REPLY
0
Entering edit mode
apply(mymatrix, 1, function(x) t.test(x[24:35],x[36:39], paired=FALSE)$p.value
ADD REPLY
0
Entering edit mode

My first advise would be to use packages that do this sort of tests, for multiple reasons:

  1. Your normalization might be correct but it has some underlying assumptions, you should be aware to it.
  2. You have multiple hypotheses that you should correct for
  3. You don't have enough power in each row for a statistical test, there are packages that could better estimate the variance and have a more powerful test.

If your input is counts I would suggest edgeR or DESeq2. Otherwise I guess limma could be used but I'm not sure if it has assumptions about the input (unless yours is microarray)

ADD REPLY
0
Entering edit mode

Thanks for suggestion.

My data is a methylation array. But after the subtraction from control average, the values becomes positive or negative. Now these are neither the counts, or normalized expression values or beta values or M values. They are just integers values obtained after subtractions.

If you think sample size a limitation, I will any way increase the control to 40 samples and cases to 20 samples (using public and our old data after batch adjustment).

Since Deseq2 and edgeR require raw counts, my dataset is methylation array with beta values ranged from 0 - 1, So DEseq2 or edgeR might not be the right tool here.

You mentioned about limma package. Do you think if I can input mymatrix to limma for calculating significance region wise (row-wise)?

Some methylation array packages identufy differential methylation too but here I need to select the cases based on the particular phenotypes. So I will add or remove the case to find the most observed differentially methylated loci.

The one possibility is use the original batch corrected matrix (without any subtraction) and give the same group ID to selected cases (alternative combinations) and then search for differential methylation. But that is very tedious. So I already subtracted everything from control average and want to apply statistcal test between all control v/s all/selected cases to obtain significantly different regions.

I will appreciate any help or suggestions.

Thanks

ADD REPLY
0
Entering edit mode

Have you looked at this paper? I think limma is appropriate here, especially if you need to do batch effect correction which can be a part of the linear model.

ADD REPLY
0
Entering edit mode

Does that mean you want row 1 compare with all other rows, the repeat? Or are you comparing S1 vs S2, S1 vs S3, ..., S5 vs S6?

ADD REPLY
0
Entering edit mode

Hi Sam,

Please see my above comment for comparison purposes. Basically I want to compare S1-S3 v/s S4-S6 for each row and add p.value in parallel column. Then apply benjamini hochberg for multiple correction

ADD REPLY
1
Entering edit mode
4.3 years ago
Sam ★ 4.8k

Using data.table (assuming your data is called dt), you can do a t.test on each row with the following code. Though as others have suggested, you should really think about your experimental design and use the appropriate statistic model, otherwise you will just get meaningless p-values.

library(data.table)
dt$name <- row.names(dt)
dt <- as.data.table(dt)
dt[,{
    model <- t.test(x=c(S1,S2,S3),y=c(S4,S5,S6))
    list(p=model$p.value,
          z=model$statistic)
}, by="name"]
ADD COMMENT

Login before adding your answer.

Traffic: 1452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6