Use a list of values to select rows from a Pandas dataframe

Question

Let’s say I have the following Pandas dataframe:

df = DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
df

     A   B
0    5   1
1    6   2
2    3   3
3    4   5

I can subset based on a specific value:

x = df[df['A'] == 3]
x

     A   B
2    3   3

But how can I subset based on a list of values? - something like this:

list_of_values = [3, 6]

y = df[df['A'] in list_of_values]

To get:

     A    B
1    6    2
2    3    3

Does this answer your question? How to filter Pandas dataframe using 'in' and 'not in' like in SQL — David Siret Marqués, Commented Jun 12, 2023 at 15:03

Peter Mortensen · Accepted Answer · 2021-06-06 19:23:23Z

2234

You can use the isin method:

In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})

In [2]: df
Out[2]:
   A  B
0  5  1
1  6  2
2  3  3
3  4  5

In [3]: df[df['A'].isin([3, 6])]
Out[3]:
   A  B
1  6  2
2  3  3

And to get the opposite use ~:

In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
   A  B
0  5  1
3  4  5

edited Jun 6, 2021 at 19:23

Peter Mortensen

31.3k22 gold badges109 silver badges132 bronze badges

answered Aug 23, 2012 at 19:20

Wouter Overmeire

68.2k10 gold badges65 silver badges44 bronze badges

Add a comment |

Mykola Zotko · Accepted Answer · 2021-09-12 19:40:15Z

100

You can use the method query:

df.query('A in [6, 3]')
# df.query('A == [6, 3]')

or

lst = [6, 3]
df.query('A in @lst')
# df.query('A == @lst')

edited Sep 12, 2021 at 19:40

answered May 3, 2021 at 5:07

Mykola Zotko

17.2k5 gold badges78 silver badges83 bronze badges

7

i wonder if query() is computationally better than isin() function
– Hammad
Commented Aug 29, 2021 at 14:51
7

@Hammad According to Pandas docs: "DataFrame.query() using numexpr is slightly faster than Python for large frames."
– Mykola Zotko
Commented Aug 30, 2021 at 8:21
5

@Hammad I did a little test and query is faster than boolean indexing for dataframes with >10k rows (check this answer for more info).
– cottontail
Commented Jan 28, 2023 at 20:45
how to control for case sensitivity with query? In case you are searching strings.
– sheth7
Commented Apr 13, 2023 at 20:16
@sheth7 You can use df.query('A.str.lower() in @lst')
– Mykola Zotko
Commented Sep 21, 2023 at 6:25

Add a comment |

cottontail · Accepted Answer · 2023-11-16 21:22:39Z

list_of_values doesn't have to be a list; it can be set, tuple, dictionary, numpy array, pandas Series, generator, range etc. and isin() and query() will still work.

A note on query():

You can also call isin() inside query():

list_of_values = [3, 6]
df.query("A.isin(@list_of_values)")

You can pass a values to search over as a local_dict argument, which is useful if you don't want to create the filtering list beforehand in a chain of function calls:
```
df.query("A == @lst", local_dict={'lst': [3, 6]})
```

Some common problems with selecting rows

1. `list_of_values` is a range

If you need to filter within a range, you can use between() method or query().

list_of_values = [3, 4, 5, 6] # a range of values

df[df['A'].between(3, 6)]  # or
df.query('3<=A<=6')

2. Return `df` in the order of `list_of_values`

In the OP, the values in list_of_values don't appear in that order in df. If you want df to return in the order they appear in list_of_values, i.e. "sort" by list_of_values, use loc.

list_of_values = [3, 6]
df.set_index('A').loc[list_of_values].reset_index()

If you want to retain the old index, you can use the following.

list_of_values = [3, 6, 3]
df.reset_index().set_index('A').loc[list_of_values].reset_index().set_index('index').rename_axis(None)

3. Don't use `apply`

In general, isin() and query() are the best methods for this task; there's no need for apply(). For example, for function f(A) = 2*A - 5 on column A, both isin() and query() work much more efficiently:

df[(2*df['A']-5).isin(list_of_values)]         # or
df[df['A'].mul(2).sub(5).isin(list_of_values)] # or
df.query("A.mul(2).sub(5) in @list_of_values")

4. Select rows not in `list_of_values`

To select rows not in list_of_values, negate isin()/in:

df[~df['A'].isin(list_of_values)]
df.query("A not in @list_of_values")  # df.query("A != @list_of_values")

5. Select rows where multiple columns are in `list_of_values`

If you want to filter using both (or multiple) columns, there's any() and all() to reduce columns (axis=1) depending on the need.

Select rows where at least one of A or B is in list_of_values:

df[df[['A','B']].isin(list_of_values).any(1)]
df.query("A in @list_of_values or B in @list_of_values")

Select rows where both of A and B are in list_of_values:

df[df[['A','B']].isin(list_of_values).all(1)] 
df.query("A in @list_of_values and B in @list_of_values")

score 14 · Accepted Answer · 2022-06-16 04:53:19Z

14

You can store your values in a list as:

lis = [3,6]

then

df1 = df[df['A'].isin(lis)]

edited Jun 16, 2022 at 4:53

answered May 26, 2022 at 4:34

user2110417

3

What's the difference between top answer?
– Ynjxsjmh
Commented May 26, 2022 at 4:37
if you have more values to filter, it is better to store them as list and filter that list from the main dataframe.
– user2110417
Commented May 26, 2022 at 4:42

Add a comment |

Peter Mortensen · Accepted Answer · 2021-06-06 19:26:10Z

10

Another method;

df.loc[df.apply(lambda x: x.A in [3,6], axis=1)]

Unlike the isin method, this is particularly useful in determining if the list contains a function of the column A. For example, f(A) = 2*A - 5 as the function;

df.loc[df.apply(lambda x: 2*x.A-5 in [3,6], axis=1)]

It should be noted that this approach is slower than the isin method.

edited Jun 6, 2021 at 19:26

Peter Mortensen

31.3k22 gold badges109 silver badges132 bronze badges

answered May 24, 2021 at 15:08

Achintha Ihalage

2,4004 gold badges22 silver badges38 bronze badges

Add a comment |

fuwiak · Accepted Answer · 2022-10-08 18:29:27Z

4

Its trickier with f-Strings

list_of_values = [3,6]


df.query(f'A in {list_of_values}')

answered Oct 8, 2022 at 18:29

fuwiak

7311 gold badge8 silver badges25 bronze badges

Add a comment |

bart-kosmala · Accepted Answer · 2022-10-19 08:28:37Z

3

The above answers are correct, but if you still are not able to filter out rows as expected, make sure both DataFrames' columns have the same dtype.

source = source.astype({1: 'int64'})
to_rem = to_rem.astype({'some col': 'int64'})

works = source[~source[1].isin(to_rem['some col'])]

Took me long enough.

answered Oct 19, 2022 at 8:28

bart-kosmala

9811 gold badge13 silver badges22 bronze badges

Add a comment |

KArrow'sBest · Accepted Answer · 2023-01-26 10:48:03Z

1

A non pandas solution that compares in terms of speed may be:

filtered_column = set(df.A) - set(list_list_of_values)

edited Jan 26, 2023 at 10:48

answered Oct 20, 2022 at 14:01

KArrow'sBest

15010 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Use a list of values to select rows from a Pandas dataframe

8 Answers 8

Some common problems with selecting rows

1. `list_of_values` is a range

2. Return `df` in the order of `list_of_values`

3. Don't use `apply`

4. Select rows not in `list_of_values`

5. Select rows where multiple columns are in `list_of_values`

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Some common problems with selecting rows

1. list_of_values is a range

2. Return df in the order of list_of_values

3. Don't use apply

4. Select rows not in list_of_values

5. Select rows where multiple columns are in list_of_values

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframe or ask your own question.

Linked

Related

1. `list_of_values` is a range

2. Return `df` in the order of `list_of_values`

3. Don't use `apply`

4. Select rows not in `list_of_values`

5. Select rows where multiple columns are in `list_of_values`

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.