Skip to main content

A database structure that can improve the speed of queries at the cost of disk space and slower inserts/updates. It stores a copy of one or more columns sorted but structures the data differently to allow faster access.

An index can be added to a table to facilitate efficient lookups. If a query predicate is not supported by an index then it will be necessary to scan the entire table to identify rows matching the predicate.

Various data structures can be used for indexing, with particular characteristics. Some commonly used on-disk index structures are:

B-Trees and their variants allow O(log n) lookup times and efficient ordinal, sequential access to records (i.e. the tree can be traversed in order). BTrees are tree structures that hold multiple keys in their nodes, which is efficient for disk access. B-Tree type indexes (the actual structures tend to be variations on the BTree) are supported by most DBMS platforms.

Linear Hash Tables is a method of constructing a hash table that can grow incrementally without requiring the table to be rebuilt as it expands. This feature makes the structure useful for disk indexing. Linear Hash Tables cannot traverse data ordinally, but have O(1) lookup times for a single record.

R-Trees are used for indexing spatial data. Commonly used in purpose built spatial database systems, modern RDBMS platforms often feature spatial indexing capabilities as well.

Bitmap Indexes: Are used for applications such as reporting systems where queries combine searches on a number of low cardinality columns where any column has a small number of distinct values and thus low selectivity. The data structure used in bitmap indexes is very sparse, so it compresses very well with run-length encoding schemes. Index intersection computations can be resolved very quickly with bitmap indexes, so they are very efficient for certain types of quereies. However they cannot be updated incrementally, so they are mainly useful for applications such as data warehouse systems where the data is loaded in batch operations for a subsequent read-only workload.

Related: What does "index" means on RDBMS?