Is it wrong to use only a random subset of a dataset?

0

Entering edit mode

2.3 years ago

Vitor1 ▴ 120

Hi guys,

I was wondering, Is it wrong to use only a random subset of a dataset?

For example: One dataset that contains 40 samples (20 control, 20 treated lets say). Is it a mistake if a person takes 20 random samples from the dataset (preserving the control/treated ratio) to do the analysis?

I know that this is not ideal by far, and I tend to think that this should not be done, but I was just wondering if this is wrong in a research perspective, etc.

Thanks

ethics research dataset • 831 views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 2.3 years ago by Vitor1 ▴ 120

1

Entering edit mode

Nothing in life is a mistake, once we learn from it, in which case the 'mistake' just becomes experience. However, what is the justification for random sub-setting?

ADD REPLY • link 2.3 years ago by Kevin Blighe 87k

0

Entering edit mode

I was just wondering the research ethics behind this, not really with a justification. Maybe for a faster analysis (alignment, etc), or hard drive space limitations, something like that.

ADD REPLY • link 2.3 years ago by Vitor1 ▴ 120

0

Entering edit mode

I don't think that that will be a valid reason for publication in a journal. Your group lacks funding?

ADD REPLY • link 2.3 years ago by Kevin Blighe 87k

Login before adding your answer.