How to obtain chi-square statistics for overlaps of three GRanges objects by pair-wise?
1
1
Entering edit mode
6.0 years ago

Dear all: I want to obtain chi-square statistics for following data by element wise. My apology to ask this statistical question from this community. However, my data contains list of overlap's significance score of 3 GRanges objects, I want to get its global score by element-wise. How can I get this in R?

# This is the data that I want to get its global score by element wise:

[[1]]
NumericList of length 7
[[1]] 1e-22
[[2]] 1e-19
[[3]] 1e-18
[[4]] 1e-16
[[5]] 1e-24
[[6]] 1e-20
[[7]] 1e-15

[[2]]
NumericList of length 7
[[1]] 1e-24
[[2]] 1e-24
[[3]] 1e-20
[[4]] 1e-25
[[5]] 0.1
[[6]] 1e-19
[[7]] 1e-18

[[3]]
NumericList of length 7
[[1]] 1e-11
[[2]] 1e-11
[[3]] 1e-10
[[4]] numeric(0)
[[5]] numeric(0)
[[6]] 1e-15
[[7]] numeric(0)


if you wonder third list element contains numeric(0), which refers to non-overlapped regions, so I can replace it with zero:

li.3 <- <- lapply(li.3, function(x) {
res <- ifelse(length(x)>0, x, 0)
})


# this is reproducible example :

 data <- DataFrame(
v1=c(1e-22,1e-19,1e-18,1e-16,1e-24,1e-20, 1e-15),
v2=c(1e-24,1e-24,1e-20,1e-25,0.1,1e-19,1e-18),
v3=c(1e-11,1e-11,1e-10,numeric(0),numeric(0),1e-15,numeric(0)))


# my desired output something like (just example by element wise) :

global fisher score of  (1e-22, 1e-24, 1e-11) = ?
global fisher score of  (1e-19, 1e-24, 1e-11) = ?
...
global fisher score of  (1e-24, 1e-01, numeric(0)) = ?


I want to get global score by element wise. How can I get this in R? Alternatively, I also prefer to see fisher exact test result for above data. I will be grateful if anyone can give me any idea for doing this. Thanks a lot

R chi-square overlap DataFrame • 2.2k views
0
Entering edit mode

Where are the GRanges objects?

0
Entering edit mode

Dear Giovanni M Dall'Olio:

I afraid it would be bit of long thread if I listed all step here (from finding overlap, conditionally filtering,expand them as GRanges), so I did not show reproducible step here. However, the data that I want to get global score is from the result of some sort of filtering by element-wise. so I have to make sure its geometric property of vector. To be specific, All I want to do is to get its global Fisher scores by element wise. To be clarify, v1 refers to significant score of query, while v2, v3 are significance score (a.k.a, pvalueLog )of subjects (a.k.a, overlapped GRanges objects). I need to do element-wise operation to getting global score. I hope I would have some idea from this community.

3
Entering edit mode
6.0 years ago

You have a data frame with three columns:

> data
DataFrame with 7 rows and 3 columns
v1        v2        v3
<numeric> <numeric> <numeric>
1     1e-22     1e-24     1e-11
2     1e-19     1e-24     1e-11
3     1e-18     1e-20     1e-10
4     1e-16     1e-25     0e+00
5     1e-24     1e-01     0e+00
6     1e-20     1e-19     1e-15
7     1e-15     1e-18     0e+00


What confuses me is that this dataframe seems to contain p-values already. So what do you want to calculate exactly?

You may combine p-values, assuming they are independent, using different approaches. The simplest is just by taking their mean (see When combining p-values, why not just averaging? )

> data$global = apply(data[1:3], 1, mean) > data DataFrame with 7 rows and 4 columns v1 v2 v3 global <numeric> <numeric> <numeric> <numeric> 1 1e-22 1e-24 1e-11 3.333333e-12 2 1e-19 1e-24 1e-11 3.333333e-12 3 1e-18 1e-20 1e-10 3.333333e-11 4 1e-16 1e-25 0e+00 3.333333e-17 5 1e-24 1e-01 0e+00 3.333333e-02 6 1e-20 1e-19 1e-15 3.333700e-16 7 1e-15 1e-18 0e+00 3.336667e-16 >  More accurate methods to combine p-values would include Fisher's method. See for example http://stats.stackexchange.com/questions/168181/r-package-for-combining-p-values-using-fishers-or-stouffers-method for some R packages to do it. For example: > library(metap) > data$global = apply(data[1:3], 1,  function(df) sumlog(df)$p) Warning messages: 1: In sumlog(df) : Some studies omitted 2: In sumlog(df) : Some studies omitted 3: In sumlog(df) : Some studies omitted > data DataFrame with 7 rows and 4 columns v1 v2 v3 global <numeric> <numeric> <numeric> <numeric> 1 1e-22 1e-24 1e-11 8.745181e-54 2 1e-19 1e-24 1e-11 7.855507e-51 3 1e-18 1e-20 1e-10 6.219311e-45 4 1e-16 1e-25 0e+00 9.540599e-40 5 1e-24 1e-01 0e+00 5.856463e-24 6 1e-20 1e-19 1e-15 7.855507e-51 7 1e-15 1e-18 0e+00 7.698531e-32 > sumlog(c(1e-22, 1e-24, 1e-11)) chisq = 262.4947 with df = 6 p = 8.745181e-54 >  ADD COMMENT 0 Entering edit mode Thanks a lot for your quick respond. Maybe I wasn't state the problem much clear. I need to use fisher exact test for each row of my data to get its combined pvalue. From your results, it is very close to my desired output, but I am not sure its identical with fisher.test. Instead, if I used fisher.test methods from base packages, this is the code that might give me what I want :  fish.res <- apply(data,1, function(x) fisher.test(matrix(x,nr=2))$p.value,\$odds.ratio)


but it gave me error. is above code also yield same result like yours if error was fixed? Thank you very much.

1
Entering edit mode

I don't think you can calculate a fisher test of fisher p-values. Moreover you would need a 2X2 contingency matrix to calculate a fisher test (e.g. see this tool for an example of the input you would expect: http://graphpad.com/quickcalcs/contingency1.cfm ). Maybe you meant to use Fisher's method to combine p-values, which is a different thing than Fisher's exact test??

0
Entering edit mode

Dear Giovanni M Dall'Olio:

I am very grateful for your correction, certainly I totally misunderstood the difference between fisher method and fisher exact test. Indeed, I certainly needs combined p-value by using Fisher' method. Is your solution yield Fisher' method that obtain combined pvalue by element-wise? Thanks again for your great help here.

Jurat

1
Entering edit mode

You are welcome Jurat. Yes you can use the solution using sumlog from the metap library.

0
Entering edit mode

Dear Giovanni M Dall'Olio:

How can I add chisq as new slot for data? I mean I let data have global and chisq attributes. Thanks a lot