synthetic samples for small datasets in bioinformatics

0

Entering edit mode

2.6 years ago

ramin.k2013 • 0

Can we use synthetically generated samples (derived from the distribution of the existing small dataset) for data analysis?

Dataset is comprised of various bacterial quantity units of donor samples, but it is small (around 20-30 samples). Can we get synthetical samples using statistical methods from this small-sized dataset?

statistics ML • 902 views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 2.6 years ago by ramin.k2013 • 0

1

Entering edit mode

You mean to increase sample size? Absolutely not!

ADD REPLY • link 2.6 years ago by ATpoint 82k

0

Entering edit mode

I mean the number of samples in the dataset. But the features (bacterial quantities) would be the same.

I assumed because of the low number of samples in the dataset, any noisy sample would lead to generating new samples which are far from the realistic situation.

Could you elaborate more on your point of view?

Thank you. ATpoint

ADD REPLY • link 2.6 years ago by ramin.k2013 • 0

0

Entering edit mode

Because inventing or "making up" samples is fraud. Either you have the data and by this an honest assessment of the biological heterogeneity/variance or not. The current derived distribution (the "realistic situation") is only based on the current sample size. The outliers could well be true biological signal and indicate heterogeneity, you only know by including more biological replicates. Sorry, but in my opinion creating data in silico is not acceptable for anything but estimating the required sample size to get data significant (power analysis).

ADD REPLY • link 2.6 years ago by ATpoint 82k

Login before adding your answer.