2

I'm using Pandas within Jupyter to try and draw the counts of one field (bar plot) and the average of another field (line plot) in one figure. My data is within one data frame, and renders OK if I just plot the data frame directly. However, I want the line graph to have a secondary_y axis while sharing the x-axis, so I am using the following code:

mobs_by_cr = data_frame.groupby("cr").agg({'hp': np.mean, 'cr': np.size})
ax = mobs_by_cr["cr"].plot(kind="bar", colormap='Paired')
mobs_by_cr["hp"].plot(kind="line", ax=ax, secondary_y=True)

If I graph either of those columns by itself then it lines up correctly with the x-axis. But when I try to get them both on the same figure by passing in ax=ax then they're mis-aligned.

mis-aligned plot

Looking at the data, the dip in the line graph should be at 18 on the x-axis, not at 15.

                hp    cr
cr                      
0.000     3.848485  33.0
0.125     8.166667  24.0
0.250    14.522727  44.0
0.500    20.025000  40.0
1.000    28.710526  38.0
2.000    43.126984  63.0
3.000    59.205882  34.0
4.000    74.650000  20.0
5.000    96.114286  35.0
6.000   105.823529  17.0
7.000   111.090909  11.0
8.000   114.285714  14.0
9.000   149.700000  10.0
10.000  154.750000   8.0
11.000  178.700000  10.0
12.000  128.000000   5.0
13.000  173.333333   9.0
14.000  185.200000   5.0
15.000  175.166667   6.0
16.000  213.400000   5.0
17.000  252.428571   7.0
18.000   80.000000   1.0
19.000  262.000000   1.0
20.000  310.000000   3.0
21.000  273.750000   4.0
22.000  414.500000   2.0
23.000  438.250000   4.0
24.000  546.000000   2.0
30.000  676.000000   1.0

The data: 'cr,hp,cr\n0.0,3.8484848484848486,33.0\n0.125,8.166666666666666,24.0\n0.25,14.522727272727273,44.0\n0.5,20.025,40.0\n1.0,28.710526315789473,38.0\n2.0,43.12698412698413,63.0\n3.0,59.205882352941174,34.0\n4.0,74.65,20.0\n5.0,96.11428571428571,35.0\n6.0,105.82352941176471,17.0\n7.0,111.0909090909091,11.0\n8.0,114.28571428571429,14.0\n9.0,149.7,10.0\n10.0,154.75,8.0\n11.0,178.7,10.0\n12.0,128.0,5.0\n13.0,173.33333333333334,9.0\n14.0,185.2,5.0\n15.0,175.16666666666666,6.0\n16.0,213.4,5.0\n17.0,252.42857142857142,7.0\n18.0,80.0,1.0\n19.0,262.0,1.0\n20.0,310.0,3.0\n21.0,273.75,4.0\n22.0,414.5,2.0\n23.0,438.25,4.0\n24.0,546.0,2.0\n30.0,676.0,1.0\n'

4
  • Your xaxis data is not equally spaced numerically. How exactly do you want your plot to show? Commented Jul 8, 2018 at 12:22
  • @ImportanceOfBeingErnest Yeah, the x-axis contains "categories", and while numerical, they're not evenly spaced. So I'd want the plot to show the value for the the column directly above the value for the row label. E.g. for row where CR = 18 the bar should go to "1" and the line should be at "80"
    – Jim
    Commented Jul 8, 2018 at 12:27
  • I don't have time to give an answer right now, but you would probably want to plot the line plot against its index, i.e. 0,1,2, etc. Commented Jul 8, 2018 at 12:54
  • @ImportanceOfBeingErnest Thanks to your tip, I have found a work-around. I'll post it as an answer, but I would like to do this with pure pandas if possible. I'm not sure why the line graph interprets the index as a "number line" while bar graph sees it as "categories".
    – Jim
    Commented Jul 8, 2018 at 14:01

2 Answers 2

4

A pandas bar graph is a categorical plot. This means that the values are essentially plotted against their integer index, independent on what the x values would show numerically. Judging from the comments above this is what you would like to have.

A line plot is not categorical. It will plot against the numeric index values. Putting both kinds of plots in the same graph would fail. Also, there is no "categorical line plot" available.

But of course you can plot the line by plotting the values against their integer index as well.

Suppose you have the following dataframe

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"x" : [1, 2.75, 100], "y1" : [1,2,3], "y2" : [300,100,275]})
df.set_index("x", inplace=True)
print(df) 

#         y1   y2
# x              
# 1.00     1  300
# 2.75     2  100
# 100.00   3  275

You may plot the bar graph of y1 as in the question, but for the line plot make x a propper column first and instead of plotting y2 against the x values, plot it against a newly established integer index.

ax = df["y1"].plot(kind="bar")
df.reset_index()["y2"].plot(kind="line", ax=ax, secondary_y=True)

enter image description here

0

The following seems to work, although it requires digging into matplotlib to force the alignment on the line graph.

mobs_by_cr = data_frame.groupby("cr").agg({'hp': np.mean, 'cr': np.size})
mobs_by_cr.rename(columns={"cr":"count"}, inplace=True)
fig, ax = plt.subplots()
mobs_by_cr["count"].plot(kind="bar", ax=ax, colormap='Paired')
ax2 = ax.twinx()
ax2.plot(ax.get_xticks(), mobs_by_cr["hp"])

The result:

enter image description here

Not the answer you're looking for? Browse other questions tagged or ask your own question.