Question: discrepancy between t-test results in Excel and R
1
gravatar for nazaninhoseinkhan
4.2 years ago by
Iran, Islamic Republic Of
nazaninhoseinkhan360 wrote:

Hi

I am trying to find differentially expressed genes in microarray data. I preferred to do this using excel because I did not know how to do it in R (performing large number of t-tests simultaneusly). After performing t-test in excel and obtaining p-values, I try t-test for a few number of rows in R, but the results of R and Excel are not the same.

Does anyone any idea about it?

Regards

Nazanin

R • 6.0k views
ADD COMMENTlink modified 4.2 years ago by biogirl160 • written 4.2 years ago by nazaninhoseinkhan360
9
gravatar for Michael Dondrup
4.2 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Well, Excel is wrong obviously!

Sincerely, now:

There is not a single  (student's) 't-test' but several variants:

  • single sample,
  • two-sample with equal or unequal variance
  • two-sample with paired values
  • alternative: greater?, two-sided?, less than?

Which one did you use? the tests you used to compare were not the same.

Further side notes:

  • Excel is not a Bioinformatics application
  • Do not use t-test for differential expression analysis
  •  ....
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Michael Dondrup46k
1

Other answers contain references for my first sentence. It sounds like a joke, but it is true most of the time, I should explain that maybe a bit better. Why to trust R over Excel?

  • R is open source, Excel is closed source, you can easily inspect the code for the t.test in R if you have doubts, while in Excel you cannot
  • R uses open-source libraries, these provide algorithms with very well-understood numerical properties, Excel uses whatever 
  • R's build process contains automated tests, we do not know which test model Excel developers use
  • R's main field of application is statistics, and possibly t.test is one of the most used functions, while Excel is used for all kinds of things and t-test and statistics is a niche application.
ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Michael Dondrup46k
4
gravatar for linus
4.2 years ago by
linus330
Germany
linus330 wrote:

Did you used the same input in both tests?

If yes did you check the parameters each t-test uses?  In R and  Excel the standard parameters may differ, which causes different results. 

In R some parameters you can set are for example:

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

 

 

paired

a logical indicating whether you want a paired t-test.

var.equal

a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by linus330

Thanks a lot for your comment,

when I added var.equal=TRUE in R in t.test function, the results became the same as excel.

Regards

ADD REPLYlink written 4.2 years ago by nazaninhoseinkhan360
2
gravatar for Chris S.
4.2 years ago by
Chris S.290
United States
Chris S.290 wrote:

You should also read a few papers that discuss problems with t-tests and other statistics in Excel, for example Heiser 2006. Statistical tests, tests of significance, and tests of a hypothesis using Excel. Journal of Applied Statistical Methods, 5:155–171  or McCullough and Heiser 2008. On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis 52:4570–4578.  Here's a snippet from the second paper listing a few problems...

"Inaccurate  t-test results in the presence of missing values. Does not correctly calculate t test values when there are missing data cells in the range. Microsoft identified this fault (KBA 829252) in previous Excel versions, but did not fix it.

Inaccurate  p-values from a  t-test. Excel uses the Welch test method, which calculates a non-integer degrees-of-freedom value. The Excel algorithm uses the TTEST function, which truncates the degrees-of-freedom input to an integer. This gives an incorrect p value for the test."

 

 

ADD COMMENTlink written 4.2 years ago by Chris S.290
0
gravatar for biogirl
4.2 years ago by
biogirl160
European Union
biogirl160 wrote:

You say that you tested a few rows in R, suggesting that the input data for the t-test in R is not the same as the input for the t-test in Excel?

Another source of difference is that R will handle floating point numbers differently, which will influence the outcome of your statistic.  This is not a negative against R, however; R is better.

ADD COMMENTlink written 4.2 years ago by biogirl160

this is an interesting point actually - though personally I think the original poster simply is not using the test the same way.

back to the issue - internally Excel seems to use a much larger precision library to compute the probability distributions, and as consequence it will compute very tiny p-values well beyond what actually makes sense. This is reserved strictly to p-values for all other computation the representation double precision.

This  can be extremely annoying when communicating with some life scientists as they hate the idea of seeing 0 for p-value and much prefer something like 10^(-100)  as that looks more scientific. Educating someone on the spot in the intricacies of floating point representation is hopeless. 

Anyway just something popped into my mind - that feeling of having to do something in Excel because it is "more accurate" - thankfully I can mostly refuse doing that 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 986 users visited in the last hour