Is there a proper way to choose bins in barplot
1
0
Entering edit mode
3 months ago
QX ▴ 70

Hi all,

I have encountered in my analysis that the difference in bin sizes of bar plot can heavily affect the type of distribution, for e.g:
enter image description here

Is there a proper way to choose a 'correct' bin size? or are there criteria to choose bin size for biological dataset?

visualization plot bar • 619 views
ADD COMMENT
0
Entering edit mode

or commonly known as the "binning problem" ;)

ADD REPLY
0
Entering edit mode

do you know if there is any review on this problem?

ADD REPLY
2
Entering edit mode
3 months ago
ATpoint 87k

These sorts of problems are everywhere in science and data analysis. What is a good cutoff, what is a good bin size, what is a good normalization? The general advise I'd give is to look at the raw data (here that could be a very small bin size or even no bins at all) and then choose something that smoothes the signal while not deleting or introducing bumps when there are visually none. For example here, 0.5 and 2.5 is basically the same in terms of the message, while 7.5 obviously deletes the bimodal pattern that apparently exists. There is no universal answer to this. Try to be fair and unbiased, look at the data by eye, don't choose cutoffs to hide or make up signal to support your story.

ADD COMMENT
0
Entering edit mode

Hi ATpoint , that's good advice from you to look that raw data and testing different bins. But you have mentioned looking by eye, is that also quite a subjective way to make decision?

In the example, let say a bin of 0.5 / 2.5 giving me bimodal pattern, that is good matching story that I want and fit with my expectation. However, how can I know that the bin = 7.5 is /is not really reflect the biological true.

ADD REPLY
1
Entering edit mode

But you have mentioned looking by eye, is that also quite a subjective way to make decision?

Yes, it does but in the end, if you blindly apply an automated method without checking back and validate that it performs well, then this imho is not proper because the automated method could introduce completely wrong results. That is why I recommend as much automatisation as possible while always checking back that results make sense. That having said, different analysis will produce different results, which is why confirmation of key analysis findings by independent experiments should be done if possible and feasible.

ADD REPLY
0
Entering edit mode

true I agree!

ADD REPLY

Login before adding your answer.

Traffic: 2001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6