Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

Question

I'm attempting to update a numeric field using an arcpy.da.UpdateCursor. I'm applying the update to a Feature Layer and Table, which participate in a One to Many Relationship. The Feature layer is quite large, with 1,219,495 Polygon records. The table has even more records at 4,735,679 rows.

The update should only be applied if certain conditions are met. If the Child Table's Provider field has a value of "Optimum", it needs updated. If the Parent Feature's MaxDownload field is 1,000, it needs updated. I'm doing the queries first via SearchCursor, and saving the relevant IDs that satisfy the query. Then, in an edit session, I'm attempting to update the relevant fields using a where_clause to select the OIDs I need.

import datetime as dt

import arcpy

def log_it(msg):
    print(f"[{dt.datetime.now()}]: {msg}")
    return


def query_str_in_list(field_name, values):
    vals = "'" + "', '".join(values) + "'"
    return f"{field_name} IN ({vals})"


def query_int_in_list(field_name, values):
    vals = ", ".join([str(v) for v in values])
    return f"{field_name} IN ({vals})"


def main():
    start_time = dt.datetime.now()
    print(f"Starting script at {start_time}")

    arcpy.env.workspace = r"C:\dev\arcgis\NYS-DPS\NysDpsBroadbandMap\2022-06-01_Analysis\2022-06-01_Analysis\NysDpsBroadbandMap.gdb"

    global_ids = []
    provider_oids = []
    with arcpy.da.SearchCursor(
        "StandardProviders",
        ["ParentGlobalID", "OBJECTID"],
        where_clause="Provider = 'Optimum'",
    ) as search_cursor:
        for row in search_cursor:
            global_ids.append(row[0])
            provider_oids.append(row[1])

    polygon_oids = [
        row[0]
        for row in arcpy.da.SearchCursor(
            "StandardSummaryPolygons",
            ["OBJECTID"],
            where_clause=query_str_in_list("GlobalID", global_ids)
            + " AND MaxDownload = 1000",
        )
    ]

    log_it(
        f"Beginning to edit {len(provider_oids)} Provider Records and {len(polygon_oids)} Polygon Records"
    )

    with arcpy.da.Editor(arcpy.env.workspace) as edit_session:
        with arcpy.da.UpdateCursor(
            "StandardSummaryPolygons",
            ["OID@", "MaxDownload"],
            where_clause=query_int_in_list("OBJECTID", polygon_oids),
        ) as polygon_update_cursor:
            for idx, row in enumerate(polygon_update_cursor):
                log_it(f"{idx} Polygon Iterations")
                oid = row[0]
                if oid in polygon_oids:
                    polygon_update_cursor.updateRow([oid, 940])

        with arcpy.da.UpdateCursor(
            "StandardProviders",
            ["OID@", "Download"],
            where_clause=query_int_in_list("OBJECTID", provider_oids),
        ) as provider_update_cursor:
            for idx, row in enumerate(provider_update_cursor):
                log_it(f"{idx} Provider Iterations")
                oid = row[0]
                if oid in provider_oids:
                    provider_update_cursor.updateRow([oid, 940])

    end_time = dt.datetime.now()
    duration = end_time - start_time
    print(f"Finished at {end_time} after executing for {duration}")
    return


if __name__ == "__main__":
    main()

The script is going quite slow, and it appears that the Polygon Feature Class is the root cause. When iterating through the cursor, each iteration is taking 2 seconds. Iterating through the table gives the performance I would expect.

Here's a log snippet:

## Parent Table Log (Polygon Features)

Starting script at 2022-07-20 14:13:51.937132
[2022-07-20 14:13:59.535107]: Beginning to edit 276149 Provider Records and 212815 Polygon Records
[2022-07-20 14:13:59.801700]: 0 Polygon Iterations
[2022-07-20 14:14:01.015486]: 1 Polygon Iterations
[2022-07-20 14:14:02.363565]: 2 Polygon Iterations
[2022-07-20 14:14:03.780736]: 3 Polygon Iterations

## Child Table Log (Table Rows)

[2022-07-20 14:19:10.037031]: 6740 Provider Iterations
[2022-07-20 14:19:10.037031]: 6741 Provider Iterations
[2022-07-20 14:19:10.038029]: 6742 Provider Iterations
[2022-07-20 14:19:10.038029]: 6743 Provider Iterations

Am I simply running into physical limits due to the volume of the data? Are there any performance gains that could be implemented here?

The IN list could be slowing down the query; the existing updateRow logic would let you go without it. You could see a 50% performance improvement if the source is switched from a network share to a local disk. If you're assigning a constant, using a selection set and a Calculate Field might be faster. — Vince, Commented Jul 20, 2022 at 18:42
Interestingly, if I remove the where_clause from the UpdateCursor, I get the expected performance on the first 1,728 records. Then it starts to slow down to 2s again. Quite strange. — vitale232, Commented Jul 20, 2022 at 18:58
If the IN list is 6700+ features, the SQL query is likely to be really inefficient (Oracle won't let the number of terms exceed 2000). — Vince, Commented Jul 20, 2022 at 20:33
The production data actually lives in an AGOL Hosted Feature Layer. In the script, I have everything batched into chunks of 1,000 to try and avoid REST issues, including the list comprehension SearchCursor that's in the example above. Would iterating over all of the rows in an UpdateCursor work? — vitale232, Commented Jul 20, 2022 at 21:14

vitale232 · Accepted Answer · 2022-07-20 19:48:18Z

1

I believe this was a case of a corrupted file geodatabase. While interacting with @Vince in the comments, we uncovered that the first ~1700 records were performing normally. After that, things went haywire.

I tried to copy/paste the layers into a new file geodatabase, which has helped in the past. That did not work here. I went back to the FGDB source and got a fresh copy of the data, and performance returned to normal.

FWIW, this FGDB has been "Compressed" then later "Uncompressed", which probably complicates things.

answered Jul 20, 2022 at 19:48

vitale232

1314 bronze badges

Compress/Uncompress shouldn't make a difference. Just Compress might make it faster (lower I/O load). Compact and/or Reindex might be able to correct this sort of issue.
– Vince
Commented Jul 20, 2022 at 20:25
I had to Uncompress since Compressed makes a fgbd read-only. I'll have to try Compact, that makes a lot of sense. A reindex did not seem to help.
– vitale232
Commented Jul 20, 2022 at 20:44
Compact won't help a brand-new FGDB downloaded from AGOL.
– Vince
Commented Jul 20, 2022 at 21:41

Add a comment |

Stack Exchange Network

Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
arcpy
arcgis-pro
performance
or ask your own question.

Hot Network Questions

Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonarcpyarcgis-properformance or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
arcpy
arcgis-pro
performance
or ask your own question.