Count Points in Polygon and write result to (Geo)Dataframe

Question

I want to count how many points there are per Polygon

# Credits of this code go to: https://stackoverflow.com/questions/69642668/the-indices-of-the-two-geoseries-are-different-understanding-indices/69644010#69644010
import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests

# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on

Now I sjoin the two. I use df_poly first, in order to add the points dfp to the GeoDataframe df_poly.

df_poly.sjoin(dfp)

Now I want to count how many points there are per polygon. I thought

df_poly.sjoin(dfp).groupby('OBJECTID').count()

But that does not add a column to the GeoDataframe df_poly with the count of each group.

Rob Raymond · Accepted Answer · 2021-10-20 12:36:51Z

This is a follow on to this question The indices of the two GeoSeries are different - Understanding Indices

right_index of spatial join gives index of polygon as polygon was on right of spatial join
hence the series gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points") can then be simply joined to the polygon GeoDataFrame to give how many points were found
note how="left" to ensure it's a left join, not an inner join. Any polygons with no points with have NaN you may want to fillna(0) in this case.

import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests

# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = pd.concat([dfp,dfp]).reset_index(drop=True)
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on

df_poly.join(
    gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points"),
    how="left",
)

adrien.ludwig · Accepted Answer · 2023-08-17 18:41:57Z

Building on both your own answer and Rob Raymond's answer, I tried to create a more generic one as a function that:

keeps the polygons containing no points and set their count to 0
has some safeguards on the index of the polygons dataframe
contains many (too much?) comments

Here it is:

def count_points_in_polygons(points, polygons, polygon_id, new_column="points_count"):

    # Save the index to restore it later
    original_index = polygons.index

    # Ensures polygon_id is not the index but a column
    if original_index.name == polygon_id:
        polygons = polygons.reset_index()

    # Count points in polygons
    points_in_polygon = (
        # Spatial join associates points and polygons that intersects each other
        polygons.sjoin(
            points,
            how="inner",  # Only keep points that matches a polygon
        )
        .groupby(polygon_id)  # Group points by polygons
        .size()  # Get number of points
        .rename(new_column)  # Name your column as you want it to appear in polygons
    )

    # Add count series to the polygons dataframe
    polygons = (
        polygons.set_index(polygon_id)  # Ensures the index is the same as points_in_polygons
        .join(
            points_in_polygon,
            how="left",  # Keep polygons containing no points
        )
        .fillna({new_column: 0})  # Fill NaN with 0
    )

    if original_index.name != polygon_id:
        # Avoids duplicating polygon_id as column and index
        polygons = polygons.reset_index()

    polygons = polygons.set_index(original_index) # Restore the original index

    return polygons

In your specific case it could be called like this:

count_points_in_polygons(dfp, df_poly, "OBJECTID", new_column="n_points")

Fergus McClean · Accepted Answer · 2021-10-20 13:00:28Z

1

You need to add one of the columns from the output of count() back into the original DataFrame using merge. I have used the geometry column and renamed it to n_points:

df_poly.merge(
    df_poly.sjoin(
        dfp
    ).groupby(
        'OBJECTID'
    ).count().geometry.rename(
        'n_points'
    ).reset_index())

edited Oct 20, 2021 at 13:00

answered Oct 20, 2021 at 11:27

Fergus McClean

1763 bronze badges

7

This answer works but could you explain it more to people who are looking to gain understanding?
– Paul Brennan
Commented Oct 20, 2021 at 12:25

Add a comment |

four-eyes · Accepted Answer · 2021-10-20 15:52:02Z

0

Building on the answere Fergus McClean provided, this can even be done in less code:

df_poly.merge(df_poly.sjoin(dfp).groupby('OBJECTID').size().rename('n_points').reset_index())

However, the method (.join()) proposed by Rob Raymond to combine the two dataframes keeps the entries that have no count.

edited Oct 20, 2021 at 15:52

answered Oct 20, 2021 at 15:44

four-eyes

11.9k31 gold badges127 silver badges241 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Count Points in Polygon and write result to (Geo)Dataframe

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
pandas
geopandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonpandasgeopandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
geopandas
or ask your own question.