4

If I slice a dataframe with something like

>>> df = pd.DataFrame(data=[[x] for x in [1,2,3,5,1,3,2,1,1,4,5,6]], columns=['A'])

>>> df.loc[df['A'] == 1]
# or
>>> df[df['A'] == 1]

   A
0  1
4  1
7  1
8  1

how could I pad my selections by a buffer of 1 and get the each of the indices 0, 1, 3, 4, 5, 6, 7, 8, 9? I want to select all rows for which the value in column 'A' is 1, but also a row before or after any such row.


edit I'm hoping to figure out a solution that works for arbitrary pad sizes, rather than just for a pad size of 1.


edit 2 here's another example illustrating what I'm going for

df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A'])

and we're looking for pad == 2. In this case I'd be trying to fetch rows 0, 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16.

4 Answers 4

6

you can use shift with bitwise or |

c = df['A'] == 1
df[c|c.shift()|c.shift(-1)]

   A
0  1
1  2
3  5
4  1
5  3
6  2
7  1
8  1
9  4
4
  • 2
    I saw this and I didn't even try :) Commented Feb 16, 2021 at 16:36
  • That's a really slick way to incorporate the shift! If I wanted a pad larger than 1, though, this wouldn't work anymore, right? Commented Feb 16, 2021 at 16:45
  • @RagingRoosevelt I can try, would you be able to provide another example in addition to the one you currently have? :)
    – anky
    Commented Feb 16, 2021 at 16:46
  • @anky, sure! say df = df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A']) and we're looking for pad == 2, then I'd be trying to fetch rows 0, 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16. Commented Feb 16, 2021 at 17:05
3

For arbitrary pad sizes, you may try where, interpolate, and notna to create the mask

n = 2
c = df.where(df['A'] == 1)
m = c.interpolate(limit=n, limit_direction='both').notna()
df[m]

Out[61]:
    A
0   1
1   2
2   3
4   3
5   2
6   1
7   1
8   4
9   5
12  0
13  3
14  1
15  2
16  4
3
  • 1
    Dang, that's really slick! Thanks =D Commented Feb 16, 2021 at 17:29
  • @RagingRoosevelt: you are welcome. Glad I could help :)
    – Andy L.
    Commented Feb 16, 2021 at 17:32
  • I played around with this a bit more and it seems like if filter = df['A'] == 1 then you can simplify even more and just do c = filter.where(filter) and then proceed with the interpolation statement to generate the mask. Commented Feb 17, 2021 at 17:34
2

Here is an approach that allows for multiple pad levels. Use ffill and bfill on the boolean mask (df['A'] == 1), after converting the False values to np.nan:

import numpy as np

pad = 2
df[(df['A'] == 1).replace(False, np.nan).ffill(limit=pad).bfill(limit=pad).replace(np.nan,False).astype(bool)]

Here it is in action:

def padsearch(df, column, value, pad):
    return df[(df[column] == value).replace(False, np.nan).ffill(limit=pad).bfill(limit=pad).replace(np.nan,False).astype(bool)]

# your first example
df = pd.DataFrame(data=[[x] for x in [1,2,3,5,1,3,2,1,1,4,5,6]], columns=['A'])
print(padsearch(df=df, column='A', value=1, pad=1))

# your other example
df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A'])
print(padsearch(df=df, column='A', value=1, pad=2))

Result:

   A
0  1
1  2
3  5
4  1
5  3
6  2
7  1
8  1
9  4

    A
0   1
1   2
2   3
4   3
5   2
6   1
7   1
8   4
9   5
12  0
13  3
14  1
15  2
16  4

Granted the command is far less nice, and its a little clunky to be converting the False to and from null. But it's still using all Pandas builtins, so it is fairly quick still.

0

I found another solution but not nearly as slick as some of the ones already posted.

# setup
df = ...
pad = 2

# determine set of indicies
indices = set(
    [
        x for x in filter(
            lambda x: x>=0, 
            [
                x+y 
                for x in df[df['A'] == 1].index 
                for y in range(-pad, pad+1)
            ]
        )
    ]
)

# fetch rows
df.iloc[[*indices]]

Not the answer you're looking for? Browse other questions tagged or ask your own question.