Question

fisher's method for p-value combination in meta-analysis

0

Entering edit mode

9 months ago

Nelo ▴ 20

My aim is to conduct a meta-analysis of RNA seq data from different plant species that belong to the same family. I did individual analysis for DEGs using DESeq2 on each dataset separately. Now I wanted to combine their p-value for MetaDEG analysis, for which I used Fisher' method.

For some reason, I am getting exactly the same gene number I got in individual differential expression analysis in metaDEG as well. To expand this statement:

Suppose I have three csv file of different plants with their p-values. csv file 1 got 10DEGs, csv file 2 got 15DEGs, csv file 3 got 12DEGs in individual analysis.

Now after combining their p-value in fisher, I am getting exactly a total of 37 DEGs (which is the total combining all the individual result). Btw, I tally every single gene Id as well.

one possible reason I find is I am using wrong code in R(metap, a package for fisher'method analysis in R).

SO my request is to please provide with me genuine code for the analysis

Differential p-value fisher expression study • 748 views

ADD COMMENT • link updated 9 months ago by dariober 14k • written 9 months ago by Nelo ▴ 20

dariober · Answer 1 · 2023-07-18

2

Entering edit mode

9 months ago

LChart 3.9k

Fisher's method uses the fact that, under the null distribution, p-values are uniformly distributed, so (negative) log p-values are exponentially distributed. This means the sum of the (negative) log p-values are Gamma distributed (in fact, a scaled Chi-square distribution).

in code:

2 * sum(-log(p)) is distributed as dchisq(x, 2*length(p))

so to test if a specific value of p is significant would be:

p.meta = pchisq(2*sum(-log(p)), 2*length(p), lower.tail=F)

ADD COMMENT • link 9 months ago by LChart 3.9k

0

Entering edit mode

Hi LChart

In the code, what is the input file here. It's getting more confusion now. Can you help me provide a code in R( as I am using metap package for fisher'method).

I used this code in R:

library(readr)
p_values_list <- list()
gene_ids_list <- list()
csv_file_names <- list()
csv_files <- c("file1.csv", "file2.csv", "file3.csv")
for (file in csv_files) {
data <- read_csv(file, col_types = cols_only(geneID = col_character(), pvalue = col_double(), log2FoldChange = col_double()))
if ("pvalue" %in% colnames(data) && "geneID" %in% colnames(data) && "log2FoldChange" %in% colnames(data)) {
p_values <- data[!is.na(data$pvalue), "pvalue"]
gene_ids <- data[!is.na(data$pvalue), "geneID"]
p_values_list[[file]] <- p_values
gene_ids_list[[file]] <- gene_ids
csv_file_names[[file]] <- file
} else {
warning("Missing or incorrect column names in ", file)
}
}
combined_p_values <- -2 * sum(log(na.omit(unlist(p_values_list))))
meta_p_value <- 1 - pchisq(combined_p_values, df = 2 * length(unlist(p_values_list)))
significance_threshold <- 0.05
metaDEGs <- unlist(gene_ids_list)[meta_p_value < significance_threshold]
metaDEGs_csv <- rep(csv_files, length.out = length(metaDEGs))
result <- data.frame(GeneID = metaDEGs, CSV_File = metaDEGs_csv)
print(result)

The output was:

GeneID  CSV_File    
file1.csv.geneID1   CmV2_2M004200   file1.csv
file1.csv.geneID2   CmV2_1M008990   file2.csv
file1.csv.geneID3   CmV2_6M054550   file3.csv

You can see here. the output was a mess, giving wrong information about csv file to which particular genes expressed belongs. Also, I need information about the direction of which expressed genes (whether they un-regulated or downregulated), which is missing here.

So, I really need some help here!!!

ADD REPLY • link updated 9 months ago by dariober 14k • written 9 months ago by Nelo ▴ 20