Multiple Histograms In One Plot
4
-1
Entering edit mode
10.8 years ago
Assa Yeroslaviz ★ 1.8k

Hi there,

I would like to combine several histograms into one plot, but keep the conditional coloring i am using in the single histograms.

This is how I create the single histograms:

tmp =hist(temp$insertSize, breaks = 100, plot=F); 
hist(temp$insertSize, breaks=100, col=ifelse(tmp$breaks<=600, "blue", "red"), labels =T, main = as.name(i))

This is how it looks like: enter image description here

I would like to plot two data sets into one plot, but keep somehow the conditional coloring I am using in the one above.

Is there something similar using barplot in R? thanks for the help

Assa

r • 68k views
ADD COMMENT
4
Entering edit mode

I prefer to use density plots, try this:

#Make dummy data
dat <- rnorm(1000)
extra_dat <- rnorm(1000)
#Plot
plot(density(dat),col="blue")
lines(density(extra_dat),col="red")
ADD REPLY
0
Entering edit mode

This is by far not what I want to show. I don't need the distribution, but the actual numbers

ADD REPLY
1
Entering edit mode

The title of the question mentions histogram, I assumed you wanted distribution.

ADD REPLY
1
Entering edit mode

note how taking the actual numbers from an histogram can be misleading. These numbers are very dependent on the size of the histogram bins, and if the bins are too high you risk to merge together two or more different distributions.

ADD REPLY
2
Entering edit mode

Passing add=T in the next call to hist() will add the second histogram to the same plotting area. But so far this does not look like a good method of displaying the distribution, I'd consider either removing the 0 size inserts or using kernel density estimates or transforming your data (or some combination of the three).

ADD REPLY
0
Entering edit mode

No I can't as the 0's are important for the results (it's a long story) :-)

ADD REPLY
1
Entering edit mode

You can still make a useful plot though, e.g. with a broken y-axis that jumps from ~50 to ~2100, unless the only point of the plot is to emphasise that there's a lot of 0s.

ADD REPLY
0
Entering edit mode

ok, I can do that. But what about combining the histograms together.

ADD REPLY
0
Entering edit mode

add=T in your subsequent call to hist()

ADD REPLY
0
Entering edit mode

with add=T it creates a stacked barplot. I would like to have the bars next to each other for each of the group/data sets.

ADD REPLY
2
Entering edit mode

No it overplots a second histogram to the same axes (the bars aren't stacked, just plotted on top of each other). It seems what you're really asking for is just a simple barplot (not histogram) with beside=TRUE

ADD REPLY
0
Entering edit mode

Important or not, this plot in this shape doesn't say much. I suggest cutting Y ylim=c(0,100) , and add textbox to show the number of Zero values:

ADD REPLY
0
Entering edit mode

well, to be honest it does! The idea behind it is to show, that some data sets have very few hits in the bigger bins (15000 onwards). Other data sets show a lot more hits on the right hand side. This is why I would like to plot them together, but keep the colors (or use completely different colors).

ADD REPLY
1
Entering edit mode

As it stands this is an R programming question. Please explain the relevance to a bioinformatics research problem.

ADD REPLY
5
Entering edit mode
10.8 years ago
Woa ★ 2.9k

You can use GGPLOT to make the following kind of histogram:

there should be two columns in the data file for which the histogram to be made and category like "A","B" for how many histograms to be made : say 'dat' and 'catg' enter image description here

library("ggplot2")
my.df <-read.table("data_category.txt",header=TRUE)
ggplot(mydf, aes(x=dat, color=catg,fill=catg)) + geom_bar(position="dodge")
ADD COMMENT
5
Entering edit mode
10.8 years ago

I would like to elaborate a bit on Woa's answer.

Let's imagine you have the following dataset:

> set.seed(2)
> d = data.frame("B1"=rnorm(100),"B2"=rnorm(100), "B3"=rnorm(100), "B4"=rnorm(100), "B5"=rnorm(100), "B6"=rnorm(100), "B7"=rnorm(100), "B8"=rnorm(100))
> d$id = row.names(d)
> d
           B1         B2         B3         B4         B5         B6         B7          B8 id
1 -0.89691455  1.0744594  0.2979836 -0.3181198 -0.2140756 -0.4597894 -1.1150718  1.23874433  1
2  0.18484918  0.2605978 -1.0195522 -0.3154903 -2.7218162  0.6179261 -0.1142184  0.23189621  2
3  1.58784533 -0.3142720  2.8708974  0.8843223 -1.0142618 -0.7204224 -0.8946214 -0.31443788  3
4 -1.13037567 -0.7496301  0.2187100 -1.8854213 -0.8291451 -0.5835119 -0.6540889  1.49970370  4
5 -0.08025176 -0.8621983 -0.9665543  0.7321793  0.8577089  0.2163245  1.1787163  0.06957437  5
6  0.13242028  2.0480403  0.3838382  0.7905447 -0.2385101  1.2449912  0.9515165  1.33403372  6

To plot a histogram of a column using ggplot, you can use the qplot function:

> qplot(B1, data=d, geom='histogram')

To plot multiple histograms, you can add a geom_histogram for each property:

> qplot(B1, data=d, geom='histogram', fill=I('green')) + geom_histogram(aes(B2), data=d, fill='red')

Since it would be impractical to add a new geom_histogram for each column, you can melt the dataframe, transforming it to a long format:

> d.long = melt.data.frame(id.var='id', data=d)
> head(d.long)
  id variable       value
1  1       B1 -0.89691455
2  2       B1  0.18484918
3  3       B1  1.58784533
4  4       B1 -1.13037567
5  5       B1 -0.08025176
6  6       B1  0.13242028

Note how the long format is structured. All the values are stored in the "value" column. The "variable" column keeps tracks of the original columns. Each data point is also determined by an unique id.

Transforming your dataset to a long format is an essential step for plotting multiple distributions together. Most R functions, such as ggplot2, and others like anova, assume that your data is in the long format. Now that you have a dataset in the long format, you can use plot all the histograms in a single statement:

> qplot(value, fill=variable, data=d.long, geom='histogram')

If you look in the documentation for geom_histogram, you will see that there are many ways to arrange the histograms. For example, you can use position='dodge' to put all the values separately:

 > qplot(value, fill=variable, data=d.long, position='dodge')


In my opinion, if there are too many columns, it is better to use the density geom instead of the histogram, using a degree of transparency:

 > qplot(value, fill=variable, data=d.long, geom='density')

If there are too many columns, one alternative is to plot some histograms on the negative y axis:

 > qplot(value, fill=variable, data=subset(d.long, variable %in% c("B1", "B2", "B3", "B4")), position='dodge', geom='density', alpha=0.2) + geom_density(aes(y=-..density..), data=subset(d.long, variable %in% c("B5", "B6", "B7", "B8")))

 # histogram version:
 > qplot(value, fill=variable, data=subset(d.long, variable %in% c("B1", "B2", "B3", "B4")), position='dodge', geom='histogram', alpha=0.2) + geom_density(aes(y=-..count..), position='dodge', data=subset(d.long, variable %in% c("B5", "B6", "B7", "B8")))

Finally, another approach is to use faceting to plot each property in a different panel:

 > qplot(value, fill=variable, facets=~variable, data=d.long)


ADD COMMENT
0
Entering edit mode

I think this adds to the confusion, there's no such thing as a dodged/beside histogram, it's a bar chart.

ADD REPLY
0
Entering edit mode

VERY good comment. Thanks to share!

ADD REPLY
0
Entering edit mode

Your post was pretty useful. Thank you very much!! Just to add a little info,

To make the plot look transparent I used the alpha argument

qplot(value, fill=variable, alpha=I(.5), data=d.long, geom='density')
ADD REPLY
3
Entering edit mode
10.8 years ago
Woa ★ 2.9k

Alternatively you can use R's transparent color scheme:

p1 <- hist(rnorm(500,4))                     # centered at 4
p2 <- hist(rnorm(500,6))                     # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10))  # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T)  # second

Taken from here: http://stackoverflow.com/questions/3541713/how-to-plot-two-histograms-together-in-r

ADD COMMENT
0
Entering edit mode

You can play a little with different color schemes and transparency(alpha),which you probably already know:

rgb(red=188,green=143,blue=143,alpha=90, max=255)#Rosy Brown [1] "#BC8F8F5A" rgb(red=199,green=21,blue=133,alpha=90,max=255)#Medium Violet red [1] "#C715855A"

ADD REPLY

Login before adding your answer.

Traffic: 2957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6