Question

How To Calculate A Sample Size

1

Entering edit mode

12.3 years ago

Rnda ▴ 10

this is my first time doing this, so it's a little primitive but i want to know..

conducting an experiment, we will pick a sample of normal individuals, we do them some urine analysis to determine a concentration of a specific substance after having them swallow a single pill then we will sequence a specific gene for a metabolizing enzyme responsible for the clearance of that pill and correlate the result of both. we are not assuming any allele frequency or certain haplotype.

first: how can i calculate a sample size? is a "power sample size" applicable here? since i have no previous hypothesis to presume and no previous data to count on.

what is the appropriate software to handle the sequencing results for that purpose?

statistics • 3.8k views

ADD COMMENT • link updated 12.3 years ago by Neilfws 49k • written 12.3 years ago by Rnda ▴ 10

score 5 · Answer 1 · 2012-01-22

Absolutely you should do a "power analysis for sample size" before an experiment.

In order to do so, however, you do need a hypothesis. It sounds like you do have one -- you are hypothesizing there will be a difference in urine metabolite excreted by individuals that have certain sequence variants in your gene of interest compared to individuals who do not have those variants. The question is -- how much difference are you expecting between the groups, and what is the standard deviation of your test? This will determine what sample size you will need in order to detect the difference between the groups. If the expected difference is small -- but the standard deviation of your test is large, you will need more subjects to show the differences observed are not based upon chance. If the expected difference is large but the standard deviation of your test is small, you will need fewer subjects.

This is where you have to hazard a guess, and find a middle ground where you think you will be likely to detect real differences. Do you have any pilot data to draw from? Similar metabolic effects from similar compounds on other genes?

You can read more a little more about this here and there are other sites too. R has an easy way to calculate sample size, and any introductory R book that discusses power analysis for sample size will show you how.

score 4 · Answer 2 · 2012-01-23

My advice, before doing anything else is: (1) think very hard about what your final data are going to look like, (2) try to determine which statistical tests are appropriate to your data and (3) if unsure, seek professional statistical advice from colleagues. Judging by your comment above, you would benefit from (3).

For example, calculation of appropriate sample size using statistical power may or may not be appropriate to your situation. Do you have a null hypothesis? Such as: I expect no significant difference in urinary metabolite concentration between groups A and B (where groups A and B are defined by a simple metric). If so, then determining the appropriate sample size for a t-test is useful; otherwise it is not.

It seems to me that you have a multivariate problem, to which simple analyses such as t-tests or chi-squared tests are not applicable. There are 30+ haplotypes (or phenotypes), which you hypothesize will have an effect on metabolite concentration. So your data are going to look something like this:

    v1    v2   v3    v4   ..  v30    C
n1  nv11  nv12 nv13  nv14 ..  nv130  n1C
n2  nv21  nv22 nv23  nv24 ..  nv230  n2C
n3  nv31  nv32 nv33  nv34 ..  nv330  n3C
n4  nv41  nv42 nv43  nv44 ..  nv430  n4C
..

Where v1, v2...v30 are the haplotypes/phenotypes; n1, n2... are the subjects (people); nv is the observation of variable v for subject n and in column C, the concentration measurement for each subject. If you were using R, you would create a data frame or matrix to represent the data as shown above. You would then explore the data in various ways, with the aim of understanding how variables v1..v30 contribute to the outcome, C.

If these things mean little to you, again I strongly suggest that you seek advice from your local friendly statistician and spend time thinking about the structure of your data and appropriate methods.