Question: Confidence Interval in R
gravatar for vinayjrao
2.3 years ago by
vinayjrao170 wrote:


I am dealing with RNA Seq data, where I am studying if the expression of a set of genes is subtype specific. For this, I have plotted the expression of the genes across the different subtypes, but to make sure if the data is significant, I wish to perform a t-test on the samples where I see a difference between any two molecular subtypes.

I performed the t-test with the t.test() function in R and I have two questions regarding the output -

First, at times it gives me an accurate p-value, for example, 3.921e-14, but at times it just says p-value< 2.2e-16. Is that the minimum p-value displayed, or can I get an accurate p-value?

Secondly, the confidence interval by default is 0.95, which I changed to 0.99, 0.999 and so on. Yet, I never seem to find any difference in the p-value. To confirm there is no change, I also tried confidence intervals of 0.5 and 0.1.

Any help or advice on the two points would be greatly appreciated.


ADD COMMENTlink modified 2.3 years ago by Jean-Karim Heriche22k • written 2.3 years ago by vinayjrao170

Look up "edgeR" or "DESeq2" for differential gene expression testing on RNA-seq data. T-test is not appropriate in this situation, due to the way data is distributed (also, you probably need to normalize for sequencing depth between samples).

P-value of a test is not a function of the confidence interval, that is correct.

Google "2.2e-16".

ADD REPLYlink written 2.3 years ago by Michael Kosicki80

Thanks for the advice on edgeR and DESeq2. I will surely look into it, but why is t-test not appropriate in this case?

ADD REPLYlink written 2.3 years ago by vinayjrao170

Look here:

In short, every test functions under certain assumptions. Breaking those assumptions breaks the test. RNA-seq data breaks the assumption of the t-test that the data is drawn from a normal distribution. In practice, you'd fail to detect true differences and may "detect" false ones too.

Imagine a data with a clump of points in one corner and an outlier in the other. A t-test, assuming the data is normally distributed, will estimate the mean to lie somewhere in between the outlier and the clump, completely misrepresenting the true distribution (which is likely around the clump). T-test is really comparing this estimated distribution to another estimated distribution, so if the estimate is faulty, so will be the result of the test.

Furthermore, both edgeR and DESeq2 have good methods for normalizing the sequencing depth between samples (FPKM, ie dividing by total number of reads and gene length is not good enough ).

ADD REPLYlink written 2.3 years ago by Michael Kosicki80
gravatar for Jean-Karim Heriche
2.3 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche22k wrote:

On the first point, 2.2e-16 is the machine precision on most computers. In your R terminal, try

> .Machine$double.eps

On the second point, I think you may be confused about what p-values and confidence intervals are. The p-value is a probability which assesses the evidence against the null hypothesis, i.e. the p-value is the probability of getting the observed parameter value or a more extreme one if the null hypothesis is true. The confidence interval is a range of values that contains the true value of the parameter of interest with some level of confidence. When you select a confidence level of .95, this means that 95% of the time, the true value will be in the confidence interval.

ADD COMMENTlink written 2.3 years ago by Jean-Karim Heriche22k

Thanks for the summary. Just to clarify I understand well, you are saying that my output may read non-significant at other confidence intervals if I increase the confidence to a higher stringency without changing the p-value?

ADD REPLYlink written 2.3 years ago by vinayjrao170

Maybe this blog post can help you.

ADD REPLYlink written 2.3 years ago by Jean-Karim Heriche22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1817 users visited in the last hour