Question

how to calculate p-value when the a sample has one value and the other sample has many values

2

Entering edit mode

9.7 years ago

zero_hsy ▴ 110

Hello,I want to calculate the p-value.Two samples are like this: One:x1 The other: y1,y2,y3,y4,y5,y6,y7

Since the x sample can not calculate the mean, I can not use t-test to get the p-value? So how can I get the p-value of this condition?

p-value t-test • 8.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by zero_hsy ▴ 110

0

Entering edit mode

If you have access to Nature Methods, you may find the column "Points of significance" instructive, and this explanation of t-tests in particular:

http://www.nature.com/nmeth/journal/v10/n11/full/nmeth.2698.html

ADD REPLY • link 9.7 years ago by David Fredman ★ 1.1k

Ram · Answer 1 · 2014-08-01

Just a thought...

If you can assume your y-values come from a known distribution, e.g. normal, you could estimate the parameters of that distribution for the given values. Then see what probability you have to draw your x-value from that distribution. In R it could be done with something like this:

Generate some example values:

set.seed(1234)
y<- rnorm(n= 10, mean= 0, sd= 1)
x<- 3

Estimate the parameters of the normal distribution for the y-values:

install.packages('fitdistrplus')
library(fitdistrplus)

fit<- fitdist(y, "norm")

fit
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
       estimate Std. Error
mean -0.3831574  0.2987363
sd    0.9446870  0.2112374

Now map x to the fitted distribution:

dnorm(x= x, mean= fit$estimate[['mean']], sd= fit$estimate[['sd']])
[1] 0.0006928469 ## <- Prob of getting x=3 from fitted distr

Ram · Answer 2 · 2014-08-01

With questions like this it is useful to use mostly common sense and not just try to find tests or algorithms that would give you some numeric value but actually cannot deal with the problem at hand. I think in this case you are simply asking the wrong question. You won't be able to calculate a p-value at all, since you simply do not have enough data for the first sample.

People will likely point out that you should simply do more measurements, and although statisticians tend to always tell you that, in this case they would just be right.

Having said that, what you really want to know is whether that single measurement on your first sample might mean that it is different from the other one, right?

What you could do is look at the distribution of the measurements for the second sample. You could check whether that has a normal distribution or not. If it does you could check where your single value from the other sample falls in that distribution. You would then get an estimate how likely it would have been a priori to find a value as far away from the average with a single measurement as you did. If that likelihood is very low you would have an indication that your first sample might be different from the second. (just saw that @dariober already described how to do that in R, which is great of course, as long as you also understand what you are doing)

Note that even if you do just that you really need to be very careful. This is how they scare a lot of parents. They do some kind of measurement on a young child, like weight gain over time. Next they see that 10% of the children have a measurement outside of the 90% range for the normal distribution. Of course that is exactly what the normal distribution predicts. Yet they scare a lot of those parents telling them their child is "not normal".

Ram · Answer 3 · 2014-08-01

What are these values, and what is your hypothesis?

Are the values discrete or continuous. Counts or frequencies?

Are the two samples paired? (e.g. x is a single value after treatment to samples matched to those in y, but with poor survival)

Do you know the expected distribution of the values (e.g. can you assume normal distribution)

If the difference between x an y is great, you may just state the values and compare the samples (rather than do hypothesis testing for the underlying populations).

Alternatively, if you are testing whether the value x could come from the distribution y, you might employ the t-statistic. Calculate the mean, standard deviation and standard error of y using all values. The significance of your observation x from the mean of y is:

t-statistic = (valueX - meanY) / standard_error_Y= t stat.

Use the Excel TDIST function (DF, t stat, 1 (for one tail)) to get your p-value

Alternatively, use a single sample t-test in R: t.test(y,mu=x).

For background and basics, see e.g.