Skip to main content

In the context of a database, optimisation refers to the process of the query optimiser selecting an efficient physical execution plan.

SQL is a declarative language, so the actual physical operations selected to fulfil a query are not under the direct control of the person writing the query. The database management system will have the option of a number of semantically equivalent strategies to execute the query. Most database management systems have a query optimiser that generates the set (or a subset) of possible execution plans and selects the best one.

A database query plan consists of a tree structure of operations that process data to produce the final result. For any given query, there may be multiple query plans that are semantically equivalent (i.e. will return the same final result) but vary in their structure and performance. A query plan can also be transformed algorithmically into other semantically equivalent query plans.

For example, a predicate (filter) could be applied at various points in a query. It may be quicker to apply it early, reducing the amount of data to be processed by the rest of a complex query. The query optimiser can apply transformations to push the query predicate down toward the leaves of the plan so it is evaluated earlier, which avoids subsequent operations having to process unnecessary data.

Other optimisations possible include generating intermediate results and post-processing them (sometimes known as spool operations), altering the order that tables are joined, and selecting the algorithm used for the join. For example, a hash join is efficient for matching a larger table against a smaller one but has a startup overhead in generating the hash table. A nested loos join is efficient for a query that processes a small amount of data as it has little startup overhead. A merge join processes data sequentially, so it is good for processing multiple large data sets if they can be sorted on the key used for the join.

Most DBMS platforms use a technique called 'cost based query optimisation' that works by joining a large number of query plans and using statistics about data volumes and distributions of key values to estimate a cost metric for the candidate plans. The cheapest plan is then selected and executed. Cost based optimisation is heuristic, and for various reasons a cost based optimiser can produce suboptimal plans. This can necessitate tuning work on the database if the suboptimal plans cause performance issues.

Other optimisation strategies include 'rule based', where a set of transformation rules are applied to query plans where specific patterns are found in the query plan. PostgreSQL has an unusual optimiser based on a genetic algorithm that evolves optimal plans by mutating query plans over time and retaining successful mutations.