EXPLAIN ANALYZE:
The EXPLAIN ANALYZE <your_query>
command will execute the query and returns the Execution plan (only, no table), as developed and applied by the Query planner, with details about cost (e.g. execution time) and strategy (e.g. usage of indexes).
To allow the Query planner to work with updated table statistics, run VACUUM ANALYZE [<table>]
(<table>
is optional, when omitted is run on the whole database) after each table operation (INSERT
, DELETE
, UPDATE
), and, if unsure, before you run possibly expensive queries.
The Execution plan is the main pool of information about how your queries perform and it is a good idea to include the result here in your questions for all inquiries concerning query performance.
KNN:
The (K) Nearest Neighbor in PostGIS search is best applied via the KNN operator <->
as the ORDER BY
parameter, and wrapped into a LATERAL JOIN
; in your case:
SELECT g1.gid AS gref_gid,
g2.gid AS gnn_gid,
g2.code_mun,
g1.codigo_mun,
g2.text,
g2.via AS igcvia1
FROM u_nomen_dom As g1
JOIN LATERAL (
SELECT gid,
code_mun,
text,
via
FROM u_nomen_via AS g
WHERE g1.codigo_mun = g.codigo_mun
ORDER BY g1.geom <-> g.geom
LIMIT 1
) AS g2
ON true;
Indexes:
This is mainly efficient due to the spatial index in place on your tables geom
column, as the <->
operator will include an index search (if used in ORDER BY
); create a spatial index on both tables with:
CREATE INDEX u_nomen_dom_geom_idx
ON u_nomen_dom
USING gist (geom);
CREATE INDEX u_nomen_via_geom_idx
ON u_nomen_via
USING gist (geom);
In general (but not necessary), a column used in a JOIN
, in a filter (WHERE
) and in ORDER BY
may benefit from an index...unused indexes, however, will rather decrease overall performance, and indexes do take considerable space on disk. A primary key column should always have an index.
You might want to try if the Query planner sees an index on codigo_mun
as beneficial (if not, drop it):
CREATE INDEX u_nomen_dom_codigo_mun_idx
ON u_nomen_dom
USING btree (codigo_mun);
CREATE INDEX u_nomen_via_codigo_mun_idx
ON u_nomen_via
USING btree (codigo_mun);
Indexes and their proper usage are crucial for performance, but they are a complex topic, worth to dive in with a bit of free time at hand.
CRS:
Most functions in PostGIS will have an explicit note on their doc pages concerning the CRS, e.g. results will be in CRS units.
With this in mind, consider your usage of ST_DWithin
: you intend to find geometries in a radius of 150 meter - but your CRS is EPSG:4326, which has degree as unit. Thus you are searching for geometries in a radius of 150 degrees, with the result of e.g. ST_Distance
being useless due to it being in degrees as well.
The same applies to ST_Expand
, and that´s why I asked if your geometries are projected - and I should have asked for cartesian projection, to differentiate from a geographical projection (not the official terminology I guess...).
If you want consistent results (one degree of Longitude doesn´t represent the same distance over different Latitudes), and to use meter as units for your calculations (and as return value from used functions) you will need to reproject your data into an appropriate cartesian projection for your AOI.
However, if that is undesirable, PostGIS implements a second data type for geometries that is worth getting to know better: geography. Many functions (e.g. ST_DWithin
, but not ST_Expand
) offer an own function signature to pass in geography types instead of geometry; if used, those functions will do their calculations based on spherical/spheroidal algebra (Haversine/Vincenty's formulae etc.) with significantly better precision (at the cost of performance) and will implicitly use meter as units. This data type assumes EPSG:4326 as CRS to transform from geometry, and can simply be cast on-the-fly.
This is all pretty loosely explained and some things are only the tip of the iceberg. With the KNN query you should get better speed, given that you set up indexes and make sure you know what units you are working with. If precision is of concern, use a cast to geography, at the cost of some speed (e.g. g1.geom::geography
).
VACUUM ANALYZE
run on both tables, then implement aLATERAL JOIN
with the KNN operator like this (possible duplicate?). conditions go into the sub-query. what type are your geometries?... WHERE ST_Expand(g1.geom, 150) && g2.geom ...
; this will expand the bbox instead of true radius search and should be faster thanST_DWithin
.