Question

Counting the number of values within a range in a specific column

1

Entering edit mode

4.2 years ago

arsala521 ▴ 30

Hello everyone,

I want to use awk to count number of values within a range in a specific column, and to print ranges with the count in output. For example, 4th column of my input is like this

With bin size of 5000, I want to know the number of values in range 1 - 5000, 5001 - 10000 and so on. My values are greater than 100,000. I want to count the values as many times as they appear. For example 4000 should be counted twice. The output should be showing two columns showing ranges and the count like

1000-5000 4
5000-10000 2

I found many relevant answers on different forums but nothing exactly serves my purpose. I found one code but it is counting duplicate values only once.

Thanks

awk • 2.9k views

ADD COMMENT • link updated 4.2 years ago by zx8754 11k • written 4.2 years ago by arsala521 ▴ 30

score 4 · Accepted Answer · 2020-01-30

Using awk:

$ awk '{print (int($1/5000)+1)*5000}' temp.txt | sort -g | uniq -c
# 4 5000
# 2 10000

Using R:

# example input
x <- c(1000, 4000, 5001, 7050, 4000, 3500)

# using cut into 5K intervals
table(cut(x, seq(0, 10000, 5000)))
# (0,5e+03] (5e+03,1e+04] 
#         4             2 

# using round
table(round(x, -4))
# 0 10000 
# 4     2