Question: binning a vector in R
Assa Yeroslaviz wrote:

Hi,

I have a data.frame of 17.7M rows, which I would like to plot as a heatmap (heatmap.2).

I would first like to create a table to see how many elements I have between (0,1], (1,2], (2,3]... ,(9,10]. I know that I have ~3500 elemtns without 'NA' and ~ 700 below 1. How can I do a complete overview of the data? Can I do it with `table`?

I would than like to bin the same data into specific (factorial?) values, so that I can use my own colors legend for each range.

This is my data structure:

`>head(filtered1000)`

 `start` `start.1` `Value` `1` `876001` `1403001` `9.910803` `2` `3079001` `3081001` `9.891197` `3` `834001` `836001` `9.813543` `4` `1239001` `1241001` `9.794319` `5` `2777001` `2780001` `9.775171` `6` `1712001` `1714001` `9.727626`

I know it is probably possible to manually adjust the `cut` command into the right breaks, but is there a more "subtle", automated way of calculating the number of each elements in each category and apply them to specific bins so that I will be able to set a certain color to a specific bin?

```> breaks <- c(0,1,2,3,4,5,6,7,8,9,10) > bins<-cut( filtered1000[,3], breaks ) summary(bins)```

```(0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]  NA's 2798  366   172   26    45    5     11    18    25    31      17762728```

How can I than incorporate these breaks into my data.frame, so that I can plot (ggplot2) it with specific colors?

Devon Ryan wrote:

It would be simpler to use `findInterval()`.

Thanks. If I use `findInterval(filtered1000[,3], seq(0:10))`, I get the vector of 0-9 and a lot of NA, Can I than just plot them as if it is the third column?

How do I add it to my plot/data.frame, so that I can plot it with ggplot2?

Just add it to your data.frame using `filtered1000\$bins = findInterval(filtered1000[,3], seq(0:10))`.

Oh man, sometimes it is sooooo easy.

thanks