synthetic samples for small datasets in bioinformatics
0
0
Entering edit mode
12 months ago

Can we use synthetically generated samples (derived from the distribution of the existing small dataset) for data analysis?

Dataset is comprised of various bacterial quantity units of donor samples, but it is small (around 20-30 samples). Can we get synthetical samples using statistical methods from this small-sized dataset?

synthetic-data ML small-dataset statistics bioinformatics • 515 views
ADD COMMENT
1
Entering edit mode

You mean to increase sample size? Absolutely not!

ADD REPLY
0
Entering edit mode

I mean the number of samples in the dataset. But the features (bacterial quantities) would be the same.

I assumed because of the low number of samples in the dataset, any noisy sample would lead to generating new samples which are far from the realistic situation.

Could you elaborate more on your point of view?

Thank you. ATpoint

ADD REPLY
0
Entering edit mode

Because inventing or "making up" samples is fraud. Either you have the data and by this an honest assessment of the biological heterogeneity/variance or not. The current derived distribution (the "realistic situation") is only based on the current sample size. The outliers could well be true biological signal and indicate heterogeneity, you only know by including more biological replicates. Sorry, but in my opinion creating data in silico is not acceptable for anything but estimating the required sample size to get data significant (power analysis).

ADD REPLY

Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6