1

I have a dataset with couple million farm polygons (around 2.5 million and only expected to grow in the future), and I need to calculate the NDVI for these farm polygons, with some basic stats (Average, Min, Max), and store the results in a PostGIS table as a simplified JSON (rounding off the NDVI values to the nearest 0.05).

Currently I am using a combination of rasterio and gdal in python to clip these polygons with respect to preprocessed NDVI images, but the process is very slow. On average, I am getting 4 new NDVI images each month, the process is slow enough that I have not been able to clear a backlog to calculate historical NDVI values.

I have implemented Multi-Processing in Python to limited success. The current stack I'm working with has python, gdal and postgis setup, but I am willing to consider other frameworks or languages if the performance boost is significant enough.

2

1 Answer 1

3

I have similar workflows, and I recently did some testing to compare the performance of different python solutions to calculate "zonal statistics". Using exactextract or pyQGIS gives me the best performance. Depending on your exact needs one or the other has advantages.

The benchmark can be found here.

The following image shows the timings to calculate zonal statistics for 5.000 agricultural parcels for some libraries:

enter image description here

5
  • 3
    You might consider giving pypi.org/project/exactextract a try.
    – dbaston
    Commented May 14 at 17:16
  • @dbaston thanks for the suggestion. I thought exactextract always used weighted values depending on the overlap between the polygon and pixels, but it seems options to control this are on the way, which is very interesting for remote sensing applications. I added exactextract to the benchmark and the performance is indeed very good!
    – Pieter
    Commented May 14 at 23:14
  • Nice! Maybe it's getting off topic but I'd love to learn more about applications that benefit from not considering the fractional overlap.
    – dbaston
    Commented May 15 at 0:55
  • 2
    To avoid "mixels". For the case at hand here as an e.g.: the NDVI is a measure of vegetation density: from -1 till ~0.2 you can consider there are no plants on a pixel, from 0.2 till 1 increasingly more. If you calculate the average NDVI for an agricultural parcel (a polygon) using e.g. Sentinel2 satellite images of 10 meter resolution, at the borders there will be pixels partly inside and outside the parcel: mixels. For small parcels they can give a big impact, so they should be excluded. The min_coverage_frac option of exactextract sounds perfect for that.
    – Pieter
    Commented May 15 at 4:59
  • 1
    exactextract looks very promising, I'll give it a shot Commented May 16 at 5:32

Not the answer you're looking for? Browse other questions tagged or ask your own question.