Seaborn scatterplot can't get hue_order to work

Question

I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).

x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)), 
                  columns =['x','cat']
                 )
display(test)

colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x', 
                hue='cat', hue_order=['Y', 'N', ],
                palette=colors,
               )

Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.

Are you planning to show them both with different symbols/shapes/sizes in case of superposition ? — Trevis, Commented Jun 25, 2021 at 18:44
No, I want exact same shape shape and size, with blue plotting over gray. I don't want to see the gray if there is a blue. — a11, Commented Jun 25, 2021 at 18:47

mwaskom · Accepted Answer · 2021-06-25 19:07:49Z

The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.

So you could use your desired hue order to sort the input data:

hue_order = ["N", "Y"]
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(
    data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
    x='x', y='x',
    hue='cat', hue_order=hue_order,
    palette=colors, s=100,  # Embiggen the points to see what's happening
)

There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.

Tom · Accepted Answer · 2021-06-25 19:15:19Z

TLDR: Before plotting, sort the data so that the dominant color appears last in the data. Here, it could just be:

test = test.sort_values('cat') # ascending = True

Then you get:

It seems that hue_order doesn't affect the order (or z-order) in which things are plotted. Rather, it affects how colors are assigned. E.g., if you don't specify a specific mapping of categories to colors (i.e. you just use a list of colors or a color palette), this parameter can determine whether 'N' or 'Y' gets the first (and which gets the second) color of the palette. There's an example showing this behavior here in the hue_order section. When you have the dict already linking categories to colors (colors = {'N': 'gray', 'Y': 'blue'}), it seems to just affect the order of labels in the legend, as you probably have seen.

So the key is to make sure the color you want on top is plotted last (and thus "on top"). I would have also assumed the hue_order parameter would do as you expected, but apparently not!

Collectives™ on Stack Overflow

Seaborn scatterplot can't get hue_order to work

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
seaborn
scatter
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonseabornscatter or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
seaborn
scatter
or ask your own question.