Effect of different sample sizes on p-value
15 months ago
c_u ▴ 480


I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions, the second question being the main one -

  1. Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
  2. Is it problematic that the two sample sizes are different? I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from 'source 2' of 341 datapoints, to make them equal.

15 months ago

The larger your sample size the more statistical power you will tend to have, meaning you can detect differences with smaller magnitudes. The question you should ask alongside the calculation of a p-value is what magnitude difference is biologically meaningful. For example, if you have a significant p-value but an average difference in genome size of 1 kb I don't suspect that it's biologically interesting.

Going back to your test choice, your test should reflect the parameter that interests you the most. For example, a KS test is sensitive to distribution shape, so you could have the case of identical means but a significant p-value, the conclusion of which might not be terribly interesting or relevant to your question. Just make sure the test used is in line with your parameter of interest.

Thank you! The point about the test to use is very interesting. I'll probably try to do a t-test or its variant, that could tell me about mean differences too.

I was also interested in the second question I ask in the original post. Do you have any ideas about it?


