I am performing an experiment with bacteria that involves 2 different conditions and I am trying to see if there is a statistically significant difference between the two. Now I think there is a good amount of stochasticity in the experiment that can affect the results in a major way. So, I personally don't feel confident with just three replicates, since even one replicate with way-off outcomes (probably due to the stochasticity) can affect the conclusions a lot. So, I thought of doing 5 replicates for this study. I have not performed power analysis to come up with this number, but I just feel a bit more confident with n=5 than n=3 in this situation.

My question is, do I need to justify why I am working with 5 replicates rather than 3 (which seems to be the standard in biology)? Three replicates is the minimum number you need to calculate variance at all - I am not sure how one can be confident with just 3, especially when there is a lot of variability in the data. But still, just like 3 is an arbitrary number for replicates, would anyone object if it is 5?

PS - every replicate involves taking a sample from the same bacterial freezer stock, which was generated from a single colony of that bacteria. In other words, I don't think these are biological replicates but technical.

I would personally absolutely not object if you would go for 5 . it's something I try to convince people in my own research institute myself ... the more the better. Especially since you expect to see quite some differences in your samples apparently.

Yes, 3 seems to be the 'standard' because of indeed what you mention (to be able to calculate it is the bare min) and for costs as well. "why would I pay more if I don't need to" will be often heard then. Nowadays sequencing has become so cheap that we should being to consider doing more than 3 reps for a standard experiment. Here the question is as well: do you have enough funds to afford 5 reps? (that's unfortunately often the deal-breaker)

On you biological vs technical : what you describe there still sounds like biological reps to me. Ok, they will not be very informative but still, you sample new biological material (perhaps on a different time? warm/cold? ... ) . Technical rep is things like you split your sequencing library in 2 parts and sequence them.

Thanks Lieven, it does seem now to me that these are biological replicates since I work with a different batch of bacteria in each replicate, even if they are all originally coming from the same colony.

Also, nice to know that you also feel that increasing the number of replicates is not something that needs formal justification and is just the right thing to do if possible. fortunately, in my case the experiments are pretty cheap to perform so I didn't have any problem with that.

I agree that more replicates is always better and I always welcome n=5 instead of n=3. However it is not true that n=3 is a technical minimum. It is perfectly possible to compute SDs from n=2 observations. The limma package for example is perfectly capable of undertaking a differential expression analysis even for n=1 in one group vs n=2 in another. That's not a recommendation, nor any encouragement to use such small samples, just a statement of what is mathematically possible!

Thanks for that, Gordon. I was thinking of ANOVA, but you are right, for variance calculation, it can technically be done with two samples too

What sort of experiment is this? Is it one where a power analysis is not possible? In those where there no power analysis is possible, people have often done empirical studies into replicate number.

For example for RNA-seq Schurch et al find that 6 replicates gives you 80% power to detect differences of at least 1 logFoldChange and you need 12 to have 80% power to detect any logFoldChange greater than 0, although this was in yeast.

In term of whether this techincal or biological reps - the source of the variation you are measuring is not purely measurement error - the samples really will have different numbers of molecules in them, and the source of that difference is biological, so these are biological reps. However, that doesn't mean that the variation you are measuring is the variation that is informative for your biological question. Only you can tell if that is true by examining the claim you wish to make, and whether the level at which you are measuring variation is suitable to answer that question.

Thanks for that answer, Ian. The experiment I am doing hasn't been done in our lab (or elsewhere) before, so I didn't have mean/SD values to do a power analysis. Regarding what experiment this is, its basically comparing conjugation (mating) frequencies in a particular environmental condition (manure) between a wild-type and a deletion mutant. We want to see if the conjugation frequencies are statistically different between the two strains.

In experiments like this, people in microbiology just go for n=3, since it obviously is minimally sufficient and easiest. Now I personally am not confident with that since I know that the data can vary a lot due to confounding factors (conjugation in a liquid medium is a very random process) so I want to go for n=5. I am just curious if there's anything objectionable in that, or something that I need to justify.

The only people you would ever have to justify doing more reps to for any experiment is the person providing the money. Since the experiment you are describing sounds cheap, I'd do as many replicates as you can handle at once/have time to do in batches.