2

My DataFrame:

import pandas as pd
df = pd.DataFrame(
    {
        'a': ['a', 'a', 'a', 'b', 'c', 'x', 'j', 'w'],
        'b': [1, 1, 1, 2, 2, 3, 3, 3],
    }
)

Expected output is changing column a:

     a  b  
0    a  1  
1    a  1  
2    a  1  
3  NaN  2  
4  NaN  2  
5  NaN  3  
6  NaN  3  
7  NaN  3  

Logic:

The groups are based on b. If for a group df.a.nunique() > 1 then df.a == np.nan.

This is my attempt. It works but I wonder if there is a one-liner/more efficient way to do it:

df['x'] = df.groupby('b')['a'].transform('nunique')
df.loc[df.x > 1, 'a'] = np.nan

4 Answers 4

5

One liner using .where on the "a" column to set the value to np.nan if nunique != 1:

df["a"] = df["a"].where(df.groupby("b")["a"].transform("nunique") == 1, np.nan)

Output:

     a  b
0    a  1
1    a  1
2    a  1
3  NaN  2
4  NaN  2
5  NaN  3
6  NaN  3
7  NaN  3
3

A possible solution:

g = df.groupby('b')

pd.concat(
    [y if y['a'].eq(y['a'].iloc[0]).all()
     else y.assign(a = np.nan)
     for _, y in g])

Output:

     a  b
0    a  1
1    a  1
2    a  1
3  NaN  2
4  NaN  2
5  NaN  3
6  NaN  3
7  NaN  3
1

More efficient than groupby, use duplicated with keep=False, and boolean indexing:

df.loc[~df[['a', 'b']].duplicated(keep=False), 'a'] = float('nan')

If you really want to use groupby.transform:

df.loc[df.groupby('b')['a'].transform('nunique')>1, 'a'] = float('nan')

Output:

     a  b
0    a  1
1    a  1
2    a  1
3  NaN  2
4  NaN  2
5  NaN  3
6  NaN  3
7  NaN  3
1

I'd use simple .loc:

df.loc[df.groupby("b")["a"].transform("nunique").ne(1), "a"] = np.nan
print(df)

Prints:

     a  b
0    a  1
1    a  1
2    a  1
3  NaN  2
4  NaN  2
5  NaN  3
6  NaN  3
7  NaN  3

Not the answer you're looking for? Browse other questions tagged or ask your own question.