I have created the following figure.
I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions, the second question being the main one -
- Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
- Is it problematic that the two sample sizes are different? I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from 'source 2' of 341 datapoints, to make them equal.