Statistical significance in RNA-seq data analysis
0
0
Entering edit mode
7.9 years ago
hiteoh • 0

Hi to everybody. This is my first time in handling NGS data, and seems that I have quite some confusion which desperately seeking for answer. Now I was given a set differential expression data. To my understanding, the first step in identifying DE transcripts would normally be to carry out to select those with q-value smaller than predetermined cut-off value. However, if my objective would be to collect all statistically confident expression data point, which also including those that non-significantly expressed, how should I do that? When I looked into volcano plot of the differential expression data, it seems that q-value probably wouldn't hold much meaning when the differential expression value getting nearer to 0 (q-value generally increase inverse proportionally to differential expression value). Expanding from that question, how to decide what is the minimum of DE cut-off that can be safely considered as statistically confident by referring to its q-value? Thank you very much.

RNA-Seq • 2.9k views
ADD COMMENT
1
Entering edit mode

I am not quite sure what you are asking here. Indeed using the adjusted p-values or q-values is the way to go, but you state:

However, if my objective would be to collect all statistically confident expression data point, which also including those that non-significantly expressed, how should I do that?

That is a contradiction in itself, despite that statistically confident is not defined to my knowledge. Given the only interpretation of having statistical confidence is that the data are significant, this can't be done.

Hope this helps to sort out the confusion.

ADD REPLY
1
Entering edit mode

With "differential expression value getting nearer to 0" do you mean the (log) fold change? It's common to have both a cut-off for adjusted p-value and minimal abs(LFC), perhaps this gives rise to your confusion. But also a small LFC can be significant, depending on the number of samples and your data.

ADD REPLY
0
Entering edit mode

Apologize for causing some confusion here. May be I should be clearer for my question. For an example, lets say if I have a RNA-seq differential expression data from an experiment comparing between drug-treated sample and control sample; and my objectives are: First, to identify genes (hence pathways) that are differentially regulated; and second, to identify genes (hence pathways) that are non-differentially regulated. So in order to identify these two obviously very different gene set, what are the criteria to consider? For first objective, a pre-defined fold-chance and q-value cut-off (e.g. L2FC = 2, q-value = 0.05) should do. How about for second objective? Thank you very much.

ADD REPLY
0
Entering edit mode

A gene is either significantly differentially expressed (with an adj. P value below a certain cut-off) or it isn't. I have no idea what you exactly mean with "non-differentially regulated". Do you mean to say genes that stay the same in both conditions?

ADD REPLY
0
Entering edit mode

Yes. Besides to identify what genes have been affected and significantly regulated under the treatment, on the other hand, I would also interested to know what are the other genes that were not affected. So in order to do so, should I just took all the genes with L2FC lower than 2, or is there any other criteria that need to be considered? Thank you very much.

ADD REPLY
1
Entering edit mode

To clarify, the test cannot tell you which genes are not regulated or 'affected', it can only tell you about the significant genes. All others could be differentially regulated, you are just not able to see it, because of e.g. small effect size or high variance. In consequence, it it not valid to conclude anything about non-significant genes based on the test.

ADD REPLY
0
Entering edit mode

Thank you very much for the advice.

ADD REPLY

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6