Entering edit mode
2.6 years ago
ramin.k2013
•
0
Can we use synthetically generated samples (derived from the distribution of the existing small dataset) for data analysis?
Dataset is comprised of various bacterial quantity units of donor samples, but it is small (around 20-30 samples). Can we get synthetical samples using statistical methods from this small-sized dataset?
You mean to increase sample size? Absolutely not!
I mean the number of samples in the dataset. But the features (bacterial quantities) would be the same.
I assumed because of the low number of samples in the dataset, any noisy sample would lead to generating new samples which are far from the realistic situation.
Could you elaborate more on your point of view?
Thank you. ATpoint
Because inventing or "making up" samples is fraud. Either you have the data and by this an honest assessment of the biological heterogeneity/variance or not. The current derived distribution (the "realistic situation") is only based on the current sample size. The outliers could well be true biological signal and indicate heterogeneity, you only know by including more biological replicates. Sorry, but in my opinion creating data in silico is not acceptable for anything but estimating the required sample size to get data significant (power analysis).