fisher's method for p-value combination in meta-analysis
1
0
Entering edit mode
9 months ago
Nelo ▴ 20

My aim is to conduct a meta-analysis of RNA seq data from different plant species that belong to the same family. I did individual analysis for DEGs using DESeq2 on each dataset separately. Now I wanted to combine their p-value for MetaDEG analysis, for which I used Fisher' method.

For some reason, I am getting exactly the same gene number I got in individual differential expression analysis in metaDEG as well. To expand this statement:

Suppose I have three csv file of different plants with their p-values. csv file 1 got 10DEGs, csv file 2 got 15DEGs, csv file 3 got 12DEGs in individual analysis.

Now after combining their p-value in fisher, I am getting exactly a total of 37 DEGs (which is the total combining all the individual result). Btw, I tally every single gene Id as well.

one possible reason I find is I am using wrong code in R(metap, a package for fisher'method analysis in R).

SO my request is to please provide with me genuine code for the analysis

Differential p-value fisher expression study • 748 views
ADD COMMENT
2
Entering edit mode
9 months ago
LChart 3.9k

Fisher's method uses the fact that, under the null distribution, p-values are uniformly distributed, so (negative) log p-values are exponentially distributed. This means the sum of the (negative) log p-values are Gamma distributed (in fact, a scaled Chi-square distribution).

in code:

2 * sum(-log(p)) is distributed as dchisq(x, 2*length(p))

so to test if a specific value of p is significant would be:

p.meta = pchisq(2*sum(-log(p)), 2*length(p), lower.tail=F)

ADD COMMENT
0
Entering edit mode

Hi LChart

In the code, what is the input file here. It's getting more confusion now. Can you help me provide a code in R( as I am using metap package for fisher'method).

I used this code in R:

library(readr)
p_values_list <- list()
gene_ids_list <- list()
csv_file_names <- list()
csv_files <- c("file1.csv", "file2.csv", "file3.csv")
for (file in csv_files) {
data <- read_csv(file, col_types = cols_only(geneID = col_character(), pvalue = col_double(), log2FoldChange = col_double()))
if ("pvalue" %in% colnames(data) && "geneID" %in% colnames(data) && "log2FoldChange" %in% colnames(data)) {
p_values <- data[!is.na(data$pvalue), "pvalue"]
gene_ids <- data[!is.na(data$pvalue), "geneID"]
p_values_list[[file]] <- p_values
gene_ids_list[[file]] <- gene_ids
csv_file_names[[file]] <- file
} else {
warning("Missing or incorrect column names in ", file)
}
}
combined_p_values <- -2 * sum(log(na.omit(unlist(p_values_list))))
meta_p_value <- 1 - pchisq(combined_p_values, df = 2 * length(unlist(p_values_list)))
significance_threshold <- 0.05
metaDEGs <- unlist(gene_ids_list)[meta_p_value < significance_threshold]
metaDEGs_csv <- rep(csv_files, length.out = length(metaDEGs))
result <- data.frame(GeneID = metaDEGs, CSV_File = metaDEGs_csv)
print(result)

The output was:

GeneID  CSV_File    
file1.csv.geneID1   CmV2_2M004200   file1.csv
file1.csv.geneID2   CmV2_1M008990   file2.csv
file1.csv.geneID3   CmV2_6M054550   file3.csv

You can see here. the output was a mess, giving wrong information about csv file to which particular genes expressed belongs. Also, I need information about the direction of which expressed genes (whether they un-regulated or downregulated), which is missing here.

So, I really need some help here!!!

ADD REPLY

Login before adding your answer.

Traffic: 1475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6