Question: simple statistics question
gravatar for odoluca
24 months ago by
Turkey/Izmir/Izmir University of Economics
odoluca20 wrote:

Hello, I am trying to do some statistics and feel stuck under piles of statistical tests.

Simply, I have a population of values with unknown mean or standard deviation. I pulled one value, let's say: 35, using method A. And I want to test if 35 is within %0.05 upper tail of the population distribution.

Now, I can not check every single value in the population, so I randomly pulled 30 values from the population with a mean of 23 and std. dev of 12. How can I test if my method A yielded a value on upper tail or not? Which statistical test should I follow?

statistics • 1.0k views
ADD COMMENTlink modified 24 months ago by Alex Reynolds28k • written 24 months ago by odoluca20

I am not sure I understand the problem. If you can draw a random sample from your population then you can estimate the distribution and its parameters. With your example, draw a large sample and find out the proportion of values that are greater than 35. You can also use the estimated mean and standard deviation if you can assume the population has a normal distribution.

ADD REPLYlink written 24 months ago by Jean-Karim Heriche20k

I don't think biostars is the best place to find an answer. Statistics-related questions are more likely to get an answer in CrossValidated

ADD REPLYlink written 24 months ago by Fabio Marroni2.3k
gravatar for Alex Reynolds
24 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Bootstrapping is a nonparametric approach that might help, if you don't know if or cannot assume that your data are normal.

Draw a sample of size n. Get the k-th percentile via quantile() or similar. Draw another sample of size n, sampling with replacement. Get the k-th percentile. Repeat some B number of times, B being the number of bootstrapped samples.

After B samples, you get a set of B measurements that estimate the k-th percentile. Get the 2.5th and 97.5th percentile of this set of measurements. This the 95% confidence interval around your original k-th percentile measurement.

Example code in R:

bootstrap_sample <- function(x, n, B, p) {
  bstrap <- vector(mode = "numeric", length = B)
  for (i in 1:B) {
    s <- sample(x, n, replace = T)
    bstrap[i] <- quantile(s, p)

# x = some vector of signal
# n = length of vector
# B = number of bootstrap samples
# kth = k-th percentile of interest
x <- unlist(read.table(...))
n <- length(x)
B <- 10000
kth <- 0.95

# bootstrap samples
x.95thPercentile_95pctCI_samples <- bootstrap_sample(x, n, B, kth)

# k-th percentile (k = 95%)
x.95thPercentile = quantile(x, kth)

# 95th percentile's 95% confidence interval
x.95thPercentile_95pctCI <- quantile(x.95thPercentile_95pctCI_samples, c(0.025, 0.975))

# bounds (asymmetric)
x.95thPercentile_95pctCI_lower_bound = x.95thPercentile - x.95thPercentile_95pctCI[1]
x.95thPercentile_95pctCI_upper_bound = x.95thPercentile + x.95thPercentile_95pctCI[2]
ADD COMMENTlink modified 24 months ago • written 24 months ago by Alex Reynolds28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour