Question: binning a vector in R
1
gravatar for Assa Yeroslaviz
5.2 years ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

Hi,

 

I have a data.frame of 17.7M rows, which I would like to plot as a heatmap (heatmap.2).

I would first like to create a table to see how many elements I have between (0,1], (1,2], (2,3]... ,(9,10]. I know that I have ~3500 elemtns without 'NA' and ~ 700 below 1. How can I do a complete overview of the data? Can I do it with table?

I would than like to bin the same data into specific (factorial?) values, so that I can use my own colors legend for each range.

This is my data structure:

>head(filtered1000)

  start start.1 Value
1 876001 1403001 9.910803
2 3079001 3081001 9.891197
3 834001 836001 9.813543
4 1239001 1241001 9.794319
5 2777001 2780001 9.775171
6 1712001 1714001 9.727626

I know it is probably possible to manually adjust the cut command into the right breaks, but is there a more "subtle", automated way of calculating the number of each elements in each category and apply them to specific bins so that I will be able to set a certain color to a specific bin?

> breaks <- c(0,1,2,3,4,5,6,7,8,9,10)
> bins<-cut( filtered1000[,3], breaks )
summary(bins)

(0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]  NA's
2798  366   172   26    45    5     11    18    25    31      17762728

How can I than incorporate these breaks into my data.frame, so that I can plot (ggplot2) it with specific colors?

Thanks

Assa

 

ggplot binning R • 5.2k views
ADD COMMENTlink modified 5.2 years ago by Devon Ryan91k • written 5.2 years ago by Assa Yeroslaviz1.2k
3
gravatar for Devon Ryan
5.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

It would be simpler to use findInterval().

ADD COMMENTlink written 5.2 years ago by Devon Ryan91k

Thanks. If I use findInterval(filtered1000[,3], seq(0:10)), I get the vector of 0-9 and a lot of NA, Can I than just plot them as if it is the third column?

How do I add it to my plot/data.frame, so that I can plot it with ggplot2?

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by Assa Yeroslaviz1.2k
1

Just add it to your data.frame using filtered1000$bins = findInterval(filtered1000[,3], seq(0:10)).

ADD REPLYlink written 5.2 years ago by Devon Ryan91k

Oh man, sometimes it is sooooo easy.

thanks

ADD REPLYlink written 5.2 years ago by Assa Yeroslaviz1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour