Hi,
I have a data.frame of 17.7M rows, which I would like to plot as a heatmap (heatmap.2).
I would first like to create a table to see how many elements I have between (0,1], (1,2], (2,3]... ,(9,10]. I know that I have ~3500 elements without 'NA' and ~ 700 below 1. How can I do a complete overview of the data? Can I do it with table
?
I would than like to bin the same data into specific (factorial?) values, so that I can use my own colors legend for each range.
This is my data structure:
> head(filtered1000)
start start.1 Value
1 876001 1403001 9.910803
2 3079001 3081001 9.891197
3 834001 836001 9.813543
4 1239001 1241001 9.794319
5 2777001 2780001 9.775171
6 1712001 1714001 9.727626
I know it is probably possible to manually adjust the cut
command into the right breaks, but is there a more "subtle", automated way of calculating the number of each elements in each category and apply them to specific bins so that I will be able to set a certain color to a specific bin?
> breaks <- c(0,1,2,3,4,5,6,7,8,9,10)
> bins<-cut( filtered1000[,3], breaks )
summary(bins)
(0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10] NA's
2798 366 172 26 45 5 11 18 25 31 17762728
How can I than incorporate these breaks into my data.frame, so that I can plot (ggplot2) it with specific colors?
Thanks
Assa
Thanks. If I use
findInterval(filtered1000[,3], seq(0:10))
, I get the vector of 0-9 and a lot of NA, Can I than just plot them as if it is the third column?How do I add it to my plot/data.frame, so that I can plot it with ggplot2?
Just add it to your data.frame using
filtered1000$bins = findInterval(filtered1000[,3], seq(0:10))
.Oh man, sometimes it is sooooo easy.
Thanks