A major problem in meta-analysis is combining p-values from different datasets, particularly in domains like genomics where high-dimensional data is frequently handled. Here's a summary of how to use a negative binomial-generalized linear model (NB-GLM) in your case with RNA-seq data:
First, individual analysis: Typically, you want to run the NB-GLM technique individually on each of your 100 datasets first. This is due to the possibility that every dataset has distinct qualities, which you should model separately in order to obtain precise p-values. In this step, genes that are significantly expressed in each dataset are identified by differential expression (DE) analysis.
Combining P-values: The p-values must be combined following the completion of individual analysis. This can be challenging since p-values alone cannot always be trusted. P-values can be combined statistically using a number of techniques, such as:
- Fisher's Combined Probability Test: This technique applies a chi-square test after adding up the logarithms of the p-values.
- Stouffer's Method: This procedure entails accumulating Z-scores instead of p-values, then converting back to a Z-score.
- Methods for Meta-Analysis: Advanced techniques such as fixed-effects or random-effects meta-analysis may also be taken into consideration.
The approach you choose will rely on the type of hypothesis testing you plan to do as well as the assumption you wish to make regarding the homogeneity of your datasets.
Considerations for RNA-seq Data: The negative binomial distribution is usually employed since RNA-seq data frequently contain a large number of zeros (genes that are not expressed in many samples). Such data exhibit over-dispersion, which this distribution may manage. But before combining p-values, be sure the technique you use can deal with the unique features of RNA-seq data.
Implementation: These tasks can be implemented using built-in functions or packages from a number of statistical software packages and computer languages, such as R. For instance, in R, you might use functions like
metap to combine p-values and DESeq2 or edgeR for your individual NB-GLM studies.
Advice from a Statistician: Considering the intricacy and the significant consequences (your deadline), it could be prudent to seek advice from a statistician or bioinformatician experienced in RNA-seq data analysis. With consideration for the particulars of your data and research questions, they can offer more specialized guidance.