p-value combination methods
1
0
Entering edit mode
12 weeks ago

Hi everyone, I have question about P-Value combination methods. I have to use a negative binomial-generalized linear model (NB-GLM) for my RNA-seq data and I have 100 different data set, I did DE analysis for them furthermore I want to combine the p-values of those data. My question is that, Should I apply NB-GLM method by one by or I should combine p-value of the 100 data and then applying NB-GLM method. If second scenario should be applied How I can do that? Any help will be life saving cause my deadline is end of this week. and I still have no idea about it.

Thanks.

rnaseq NB-GLM p-value meta-Analysis • 748 views
1
Entering edit mode
12 weeks ago
sure ▴ 100

A major problem in meta-analysis is combining p-values from different datasets, particularly in domains like genomics where high-dimensional data is frequently handled. Here's a summary of how to use a negative binomial-generalized linear model (NB-GLM) in your case with RNA-seq data:

First, individual analysis: Typically, you want to run the NB-GLM technique individually on each of your 100 datasets first. This is due to the possibility that every dataset has distinct qualities, which you should model separately in order to obtain precise p-values. In this step, genes that are significantly expressed in each dataset are identified by differential expression (DE) analysis.

Combining P-values: The p-values must be combined following the completion of individual analysis. This can be challenging since p-values alone cannot always be trusted. P-values can be combined statistically using a number of techniques, such as:

1. Fisher's Combined Probability Test: This technique applies a chi-square test after adding up the logarithms of the p-values.
2. Stouffer's Method: This procedure entails accumulating Z-scores instead of p-values, then converting back to a Z-score.
3. Methods for Meta-Analysis: Advanced techniques such as fixed-effects or random-effects meta-analysis may also be taken into consideration.

The approach you choose will rely on the type of hypothesis testing you plan to do as well as the assumption you wish to make regarding the homogeneity of your datasets.

Considerations for RNA-seq Data: The negative binomial distribution is usually employed since RNA-seq data frequently contain a large number of zeros (genes that are not expressed in many samples). Such data exhibit over-dispersion, which this distribution may manage. But before combining p-values, be sure the technique you use can deal with the unique features of RNA-seq data.

Implementation: These tasks can be implemented using built-in functions or packages from a number of statistical software packages and computer languages, such as R. For instance, in R, you might use functions like metap to combine p-values and DESeq2 or edgeR for your individual NB-GLM studies.

Advice from a Statistician: Considering the intricacy and the significant consequences (your deadline), it could be prudent to seek advice from a statistician or bioinformatician experienced in RNA-seq data analysis. With consideration for the particulars of your data and research questions, they can offer more specialized guidance.

5
Entering edit mode

Was this response written by a LLM? It's a useful response but it doesn't really answer the initial question...

In response to the OP question, you should apply the model one-by-one then aggregate p-values. You can't apply your NB model post-aggregation because your NB model is what produces individual p-values -- it makes no sense to feed an NB model an aggregated p-value because NB doesn't work on p-values (it works on raw counts from a sequencing experiment to give you p-values).

2
Entering edit mode

singh.vijender please don't delete posts that have additional content associated with them

having assistance from the ChatGPT is ok, as long as someone clearly states it, and there is some sort of personal contribution that adds to the information,

for example, here you can state that you asked the ChatGPT and perhaps qualify the answer in some manner to say that it was a good or bad answer, and expand on it

0
Entering edit mode

Thank you so for your time and description. And now ı get an other question it can be very silly but its my first time with this analysis. Now for NB-GLM which parameters should I use? I main colnames(df1) "gene_id" "logFC" "logCPM" "PValue" "FDR" "SRR20814361.bam" "SRR20814363.bam" "SRR20814354.bam" "SRR20814352.bam""gene_type" "gene_name" "start" "end" "width" "strand" those are my data column names which one I should use?
and also I found ;

MASS package in R

is this okay for doing this analysis? I looked on internet but ı did not get it.

Thanks a lots

1
Entering edit mode

The parameters you have here are the output of an NB-GLM, not the input to an NB-GLM (most likely edgeR?)

0
Entering edit mode

its output of edgeR. I'm asking for input of NB-GLM? what should be input of it?

0
Entering edit mode

edgeR is an NB-GLM.

The input to NB-GLMs, including edgeR and DESeq is read counts from each sample.