0

I am trying to show both cumulative and non-cumulative distributions on the same plot.

fig, ax = plt.subplots(figsize=(10, 5))

n, bins, patches = ax.hist(x, n_bins, density=True, stacked=True, histtype='step',
                           cumulative=True, label='Empirical cumulative')

# Overlay a non-cumulative histogram.
ax.hist(x, bins=bins, density=True, stacked=True, histtype='step', cumulative=False, label='Empirical non-cumulative')

plt.show()

The Empirical cumulative curve looks well and the values do not exceed 1. However, the Empirical non-cumulative curve has Y values higher than 1. How can I normalize them?

Update:

Sample data:

n_bins = 20
x = [
 0.0051055006412772065,
 0.09770815865459548,
 0.20666651037049322,
 0.5433266733820051,
 0.5717169069724539,
 0.5421114013759187,
 0.4994941193115986,
 0.4391978276380223,
 0.3673067648294034,
 0.3150259778098451,
 0.4072059689437963,
 0.5781929593356039,
 0.6494934859266276,
 0.620882081680377,
 0.5845829440637116,
 0.515705471234385] 

Please see the orange curve.

enter image description here

enter image description here

6
  • The y-axis of the non-cumulative plot shows a "density". Its height depends very strongly on the units of the x-axis. See Normed histogram y-axis larger than 1 for an explanation.
    – JohanC
    Commented Jun 1, 2021 at 11:26
  • @JohanC: Thanks for your comment. I added the requested data to my post.
    – Fluxy
    Commented Jun 1, 2021 at 11:26
  • @JohanC: How can I switch to PDF, so that I can see non-cumulative normalized probability distribution as orange colored curve?
    – Fluxy
    Commented Jun 1, 2021 at 11:28
  • @JohanC: Yeah, I would like to understand if it would be possible to normalize it. Otherwise, I find it difficult to interpret.
    – Fluxy
    Commented Jun 1, 2021 at 11:32
  • The easiest way to show 'probability' instead of 'probability density' would be to use seaborn with sns.histplot(..., stat='probability').
    – JohanC
    Commented Jun 1, 2021 at 11:36

1 Answer 1

1

The easiest way to create a histogram with probability instead of probability density is to use seaborn's sns.histplot(.... stat='probability').

To mimic this with standard matplotlib, you could calculate all values manually. For example:

import matplotlib.pyplot as plt
import numpy as np

n_bins = 20
x = np.random.normal(0, 1, (1000, 3))
bin_edges = np.linspace(x.min(), x.max(), n_bins + 1)
bin_values = np.array([np.histogram(x[:, i], bins=bin_edges)[0] for i in range(x.shape[1])])
cum_values = bin_values.cumsum(axis=1).cumsum(axis=0)
cum_values = cum_values / cum_values.max()

fig, ax = plt.subplots(figsize=(10, 5))
prev = 0
for c in cum_values:
    plt.step(np.append(bin_edges, bin_edges[-1]), np.concatenate([[0], c, [prev]]))
    prev = c[-1]

ax.set_prop_cycle(None)
prev = 0
for c in cum_values:
    c = np.diff(c)
    plt.step(np.append(bin_edges, bin_edges[-1]), np.concatenate([[0], c, [c[-1], prev]]), ls='--')
    prev = c[-1]

plt.show()

histogram with probability instead of density

If you have just one distribution, stacked=True doesn't make a difference. The code would be simpler:

import matplotlib.pyplot as plt
import numpy as np

n_bins = 20
x = np.random.normal(0, 1, 1000)
bin_edges = np.linspace(x.min(), x.max(), n_bins + 1)
bin_values = np.histogram(x, bins=bin_edges)[0]
cum_values = bin_values.cumsum()
cum_values = cum_values / cum_values.max()

fig, ax = plt.subplots(figsize=(10, 5))
plt.step(np.append(bin_edges, bin_edges[-1]), np.concatenate([[0], cum_values, [0]]))

ax.set_prop_cycle(None)
c = np.diff(cum_values)
plt.step(np.append(bin_edges, bin_edges[-1]), np.concatenate([[0], c, [c[-1], 0]]), ls='--')

plt.show()
2
  • Your code works, but I get a weird result like I show in the post. Could you please let me know how to build 1 cumulative and 1 non-cumulative curve on the same plot?
    – Fluxy
    Commented Jun 1, 2021 at 12:58
  • 1
    The code was meant for multiple distributions, as you were explicitly setting stacked=True. I'm updating with a simpler example.
    – JohanC
    Commented Jun 1, 2021 at 13:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.