Pad selection range in Pandas Dataframe?

Question

If I slice a dataframe with something like

>>> df = pd.DataFrame(data=[[x] for x in [1,2,3,5,1,3,2,1,1,4,5,6]], columns=['A'])

>>> df.loc[df['A'] == 1]
# or
>>> df[df['A'] == 1]

   A
0  1
4  1
7  1
8  1

how could I pad my selections by a buffer of 1 and get the each of the indices 0, 1, 3, 4, 5, 6, 7, 8, 9? I want to select all rows for which the value in column 'A' is 1, but also a row before or after any such row.

edit I'm hoping to figure out a solution that works for arbitrary pad sizes, rather than just for a pad size of 1.

edit 2 here's another example illustrating what I'm going for

df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A'])

and we're looking for pad == 2. In this case I'd be trying to fetch rows 0, 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16.

anky · Accepted Answer · 2021-02-16 16:34:36Z

6

you can use shift with bitwise or |

c = df['A'] == 1
df[c|c.shift()|c.shift(-1)]

answered Feb 16, 2021 at 16:34

anky

75k11 gold badges44 silver badges74 bronze badges

2

I saw this and I didn't even try :)
– Akshay Sehgal
Commented Feb 16, 2021 at 16:36
That's a really slick way to incorporate the shift! If I wanted a pad larger than 1, though, this wouldn't work anymore, right?
– RagingRoosevelt
Commented Feb 16, 2021 at 16:45
@RagingRoosevelt I can try, would you be able to provide another example in addition to the one you currently have? :)
– anky
Commented Feb 16, 2021 at 16:46
@anky, sure! say df = df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A']) and we're looking for pad == 2, then I'd be trying to fetch rows 0, 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16.
– RagingRoosevelt
Commented Feb 16, 2021 at 17:05

Add a comment |

RagingRoosevelt · Accepted Answer · 2021-02-16 22:06:40Z

3

For arbitrary pad sizes, you may try where, interpolate, and notna to create the mask

n = 2
c = df.where(df['A'] == 1)
m = c.interpolate(limit=n, limit_direction='both').notna()
df[m]

Out[61]:
    A
0   1
1   2
2   3
4   3
5   2
6   1
7   1
8   4
9   5
12  0
13  3
14  1
15  2
16  4

edited Feb 16, 2021 at 22:06

RagingRoosevelt

2,12620 silver badges35 bronze badges

answered Feb 16, 2021 at 17:19

Andy L.

25.1k4 gold badges17 silver badges29 bronze badges

1

Dang, that's really slick! Thanks =D
– RagingRoosevelt
Commented Feb 16, 2021 at 17:29
@RagingRoosevelt: you are welcome. Glad I could help :)
– Andy L.
Commented Feb 16, 2021 at 17:32
I played around with this a bit more and it seems like if filter = df['A'] == 1 then you can simplify even more and just do c = filter.where(filter) and then proceed with the interpolation statement to generate the mask.
– RagingRoosevelt
Commented Feb 17, 2021 at 17:34

Add a comment |

Tom · Accepted Answer · 2021-02-16 17:25:32Z

Here is an approach that allows for multiple pad levels. Use ffill and bfill on the boolean mask (df['A'] == 1), after converting the False values to np.nan:

import numpy as np

pad = 2
df[(df['A'] == 1).replace(False, np.nan).ffill(limit=pad).bfill(limit=pad).replace(np.nan,False).astype(bool)]

Here it is in action:

def padsearch(df, column, value, pad):
    return df[(df[column] == value).replace(False, np.nan).ffill(limit=pad).bfill(limit=pad).replace(np.nan,False).astype(bool)]

# your first example
df = pd.DataFrame(data=[[x] for x in [1,2,3,5,1,3,2,1,1,4,5,6]], columns=['A'])
print(padsearch(df=df, column='A', value=1, pad=1))

# your other example
df = pd.DataFrame(data=[[x] for x in [1,2,3,5,3,2,1,1,4,5,6,0,0,3,1,2,4,5]], columns=['A'])
print(padsearch(df=df, column='A', value=1, pad=2))

Result:

Granted the command is far less nice, and its a little clunky to be converting the False to and from null. But it's still using all Pandas builtins, so it is fairly quick still.

RagingRoosevelt · Accepted Answer · 2021-02-16 17:36:20Z

0

I found another solution but not nearly as slick as some of the ones already posted.

# setup
df = ...
pad = 2

# determine set of indicies
indices = set(
    [
        x for x in filter(
            lambda x: x>=0, 
            [
                x+y 
                for x in df[df['A'] == 1].index 
                for y in range(-pad, pad+1)
            ]
        )
    ]
)

# fetch rows
df.iloc[[*indices]]

answered Feb 16, 2021 at 17:36

RagingRoosevelt

2,12620 silver badges35 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Pad selection range in Pandas Dataframe?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.