count number of rows given column names and values - pandas

Question

I am currently switching for R to Python so please be patient with me. Is the following a good way to count the number of rows given column names and values?

import pandas as pd

df = pd.DataFrame([["1", "2"], ["2", "4"], ["1", "4"]], columns=['A', 'B'])

cn1 = "A"
cn2 = "B"
cv1 = "1"
cv2 = "2"
no_rows = len(df[(df[cn1]==cv1) & (df[cn2]==cv2)].index)
print(no_rows)

Reinderien · Accepted Answer · 2021-12-06 17:34:04Z

First, it's a bad idea to input your numerics as strings in your dataframe. Use plain ints instead.

Your code currently forms a predicate, performs a slice on the frame and then finds the size of the frame. This is more work than necessary - the predicate itself is a series of booleans, and running a .sum() on it produces the number of matching values.

That, plus your current code is not general-purpose. A general-purpose implementation could look like

from typing import Dict, Any

import pandas as pd


def match_count(df: pd.DataFrame, **criteria: Any) -> int:
    pairs = iter(criteria.items())
    column, value = next(pairs)
    predicate = df[column] == value
    for column, value in pairs:
        predicate &= df[column] == value
    return predicate.sum()


def test() -> None:
    df = pd.DataFrame(
        [[1, 2],
         [2, 4],
         [1, 4]],
        columns=['A', 'B'],
    )

    print(match_count(df, A=1, B=2))


if __name__ == '__main__':
    test()

saiden · Accepted Answer · 2021-12-06 16:35:31Z

1

I usually use shape[0] because it's more readable, so in your case it would be:

no_rows = df[(df[cn1]==cv1) & (df[cn2]==cv2)].shape[0]

answered Dec 6, 2021 at 16:35

saiden

1112 bronze badges

Add a comment |

tdy · Accepted Answer · 2021-12-08 06:47:36Z

While this specific example can be completely refactored into Reinderien's top-notch functions, we don't always need something so elaborate (e.g., quick exploratory analysis).

Masking and counting come up very often in one form or another, so I think it's still worth reviewing how to do them idiomatically in pandas.

Revised code

Maintaining the spirit of the original code, I would use something like:

matches = df[cn1].eq(cv1) & df[cn2].eq(cv2)
len(df[matches])  # but remember that matches.sum() is faster

Comments on the original code

len(df[(df[cn1] == cv1) & (df[cn2] == cv2)].index)
^               ^       ^                  ^
3               2       4                  1

No need to use .index explicitly since DataFrame.__len__ does it automatically:

class DataFrame(NDFrame, OpsMixin):
    ...
    def __len__(self) -> int:
        return len(self.index)

DataFrame.eq can sometimes be useful over ==:
- supports axis / level broadcasting
- arguably more readable when joining multiple tests
```
df[cn1].eq(cv1) & df[cn2].eq(cv2) # (df[cn1] == cv1) & (df[cn2] == cv2)
```
- arguably more readable when chaining methods (e.g., when comparing shifted columns)
```
df[cn1].shift().eq(cv1).cumsum() # (df[cn1].shift() == cv1).cumsum()
```
If speed is important, len(df) is faster than df.shape[0] (h/t @root):
If you have a lot of conditions to join (e.g., generated via comprehension), consider np.logical_and.reduce:
```
df[np.logical_and.reduce([
    df[cn1] == cv1,
    # ...
    df[cn2] == cv2,
])]
```

Stack Exchange Network

count number of rows given column names and values - pandas

3 Answers 3

Revised code

Comments on the original code

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.

Linked

Hot Network Questions

count number of rows given column names and values - pandas

3 Answers 3

Revised code

Comments on the original code

Not the answer you're looking for? Browse other questions tagged pythonpandas or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.