Hello,

I have created the following figure.

I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions, the second question being the main one -

- Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
- Is it problematic that the two sample sizes are different? I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values?
**If it would**, I can take a random subsample from 'source 2' of 341 datapoints, to make them equal.

Thank you

Thank you! The point about the test to use is very interesting. I'll probably try to do a t-test or its variant, that could tell me about mean differences too.

I was also interested in the second question I ask in the original post. Do you have any ideas about it?