10

I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).

x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)), 
                  columns =['x','cat']
                 )
display(test)

colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x', 
                hue='cat', hue_order=['Y', 'N', ],
                palette=colors,
               )

enter image description here

Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.

2
  • Are you planning to show them both with different symbols/shapes/sizes in case of superposition ?
    – Trevis
    Commented Jun 25, 2021 at 18:44
  • No, I want exact same shape shape and size, with blue plotting over gray. I don't want to see the gray if there is a blue.
    – a11
    Commented Jun 25, 2021 at 18:47

2 Answers 2

12

The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.

So you could use your desired hue order to sort the input data:

hue_order = ["N", "Y"]
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(
    data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
    x='x', y='x',
    hue='cat', hue_order=hue_order,
    palette=colors, s=100,  # Embiggen the points to see what's happening
)

enter image description here

There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.

7

TLDR: Before plotting, sort the data so that the dominant color appears last in the data. Here, it could just be:

test = test.sort_values('cat') # ascending = True

Then you get:

enter image description here


It seems that hue_order doesn't affect the order (or z-order) in which things are plotted. Rather, it affects how colors are assigned. E.g., if you don't specify a specific mapping of categories to colors (i.e. you just use a list of colors or a color palette), this parameter can determine whether 'N' or 'Y' gets the first (and which gets the second) color of the palette. There's an example showing this behavior here in the hue_order section. When you have the dict already linking categories to colors (colors = {'N': 'gray', 'Y': 'blue'}), it seems to just affect the order of labels in the legend, as you probably have seen.

So the key is to make sure the color you want on top is plotted last (and thus "on top"). I would have also assumed the hue_order parameter would do as you expected, but apparently not!

Not the answer you're looking for? Browse other questions tagged or ask your own question.