16

I realize there have been several posts for people asking how to plot two histograms together side by side (as in one plot with the bars next to each other) and overlaid in R and also on how to normalize data. Following the advice that I've found, I'm able to do one or the other, but not both operations.

Here's the setup. I have two data frames of different lengths and would like to plot the volume of the objects in each df as a histogram. Eg how many in data frame 1 are between .1-.2 um^3 and compare it with how many in data frame 2 are between .1 and .2 um^3 and so on. Overlaid or Side by Side would be great to do this.

Since there are more measurements in one data frame than the other, obviously I have to normalize, so I use:

read.csv(ctl)
read.csv(exp)
h1=hist(ctl$Volume....)
h2=hist(exp$Volume....

#to normalize#

h1$density=h1$counts/sum(h1$counts)*100
plot(h1,freq=FALSE....)
h2$density=h2$counts/sum(h2$counts)*100
plot(h2,freq=FALSE....)

Now I've been successful overlaying the un-normalized data using this method: http://www.r-bloggers.com/overlapping-histogram-in-r/ and also with this method: plotting two histograms together

but I'm stuck when it comes to how to overlay normalized data

3
  • 1
    What do you mean by "side by side"? Two different plots next to each other (par(mfrow=c(1,2)) or one plot with 2 different bars next to each other?
    – James
    Commented Mar 26, 2015 at 20:01
  • One plot with two different bars, sorry for being unclear.
    – Harry B
    Commented Mar 26, 2015 at 20:26
  • I dont know what $Volume is and I assume is the vector that you want to normalize. This is very janky, but make a new vector in your data frame where ctl$density <- ctl$Volume / max(ctl$Volume). Now make a histogram from that h1 <- hist(ctl$density . Do the same for the other data set and follow the directions on the website you posted
    – James
    Commented Mar 26, 2015 at 21:53

1 Answer 1

22

ggplot2 makes it relatively straightforward to plot normalized histograms of groups with unequal size. Here's an example with fake data:

library(ggplot2)

# Fake data (two normal distributions)
set.seed(20)
dat1 = data.frame(x=rnorm(1000, 100, 10), group="A")
dat2 = data.frame(x=rnorm(2000, 120, 20), group="B")
dat = rbind(dat1, dat2)

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Unormalized")

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=..density..), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Normalized")

enter image description here

If you want to make overlayed density plots, you can do that as well. adjust controls the bandwidth. This is already normalized by default.

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_density(alpha=0.4, lwd=0.8, adjust=0.5) 

enter image description here

UPDATE: In answer to your comment, the following code should do it. (..density..)/sum(..density..) results in the total density over the two histograms adding up to one, and the total density of each individual group adding up to 0.5. So you have multiply by 2 in order for the total density of each group to be individually normalized to 1. In general, you have to multiply by n, where n is the number of groups. This seems kind of kludgy and there may be a more elegant approach.

library(scales) # For percent_format()

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=2*(..density..)/sum(..density..)), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  scale_y_continuous(labels=percent_format())

enter image description here

1
  • worked excellently, thank you very much. The only additional question I have is if would be possible to have the y axis of the normalized ggplot represent a percentage rather than probability density?
    – Harry B
    Commented Mar 27, 2015 at 14:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.