232

I can't find a definite answer to this question in the documentation. If a column is an array type, will all the entered values be individually indexed?

I created a simple table with one int[] column, and put a unique index on it. I noticed that I couldn't add the same array of ints, which leads me to believe the index is a composite of the array items, not an index of each item.

INSERT INTO "Test"."Test" VALUES ('{10, 15, 20}');
INSERT INTO "Test"."Test" VALUES ('{10, 20, 30}');

SELECT * FROM "Test"."Test" WHERE 20 = ANY ("Column1");

Is the index helping this query?

1

3 Answers 3

274

Yes you can index an array, but you have to use the array operators and the GIN-index type.

Example:

    CREATE TABLE "Test"("Column1" int[]);
    INSERT INTO "Test" VALUES ('{10, 15, 20}');
    INSERT INTO "Test" VALUES ('{10, 20, 30}');
    
    CREATE INDEX idx_test on "Test" USING GIN ("Column1" gin__int_ops);
            
    EXPLAIN ANALYZE
    SELECT * FROM "Test" WHERE "Column1" @> ARRAY[20];

Result:

Bitmap Heap Scan on "Test"  (cost=4.26..8.27 rows=1 width=32) (actual time=0.014..0.015 rows=2 loops=1)
  Recheck Cond: ("Column1" @> '{20}'::integer[])
  ->  Bitmap Index Scan on idx_test  (cost=0.00..4.26 rows=1 width=0) (actual time=0.009..0.009 rows=2 loops=1)
        Index Cond: ("Column1" @> '{20}'::integer[])
Total runtime: 0.062 ms

Note

it appears that in many cases the gin__int_ops option is required

create index <index_name> on <table_name> using GIN (<column> gin__int_ops)

I have not yet seen a case where it would work with the && and @> operator without the gin__int_ops options

6
  • 28
    As the OP surmises, this doesn't actually index individual array values, but instead indexes the entire array. So, while this will help the query in question (see explain plan), this means you can't create unique constraints (easily) on individual array values. That said, if you are using integer arrays, you can use the contrib module "intarray" to index individual array values, which can be much faster in many cases. (IIRC there is some work being done on this for text values, but contributors would probably be welcome to help finish it off).
    – xzilla
    Commented Dec 5, 2011 at 18:13
  • 21
    Please don't use uppercase letters in PostgreSQL identifiers in code examples, it just confuses people who aren't familiar with the quoting/case folding rules, particularly people new to PostgreSQL.
    – intgr
    Commented Nov 6, 2015 at 9:32
  • 14
    To repeat my comment here: from my experience, these indexes offer little to no speedup unless gin__int_ops is used for integer[] columns. It took me years of frustration and looking for other solutions until I discovered this op class. It's a borderline miracle worker.
    – IamIC
    Commented Nov 28, 2017 at 17:46
  • 7
    @IamIC does that mean I should not bother indexing an array of strings? And I should only index integer arrays? Commented Feb 10, 2019 at 20:41
  • 3
    Operator class "gin__int_ops" is only required if you have installed "intarray" extension, otherwise the index works by default. I have expanded on this here: stackoverflow.com/questions/63996454/… Commented Sep 21, 2020 at 16:37
151
+100

@Tregoreg raised a question in the comment to his offered bounty:

I didn't find the current answers working. Using GIN index on array-typed column does not increase the performance of ANY() operator. Is there really no solution?

@Frank's accepted answer tells you to use array operators, which is still correct for Postgres 16. The manual:

... the standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators:

<@
@>
=
&&

The complete list of built-in operator classes for GIN indexes in the standard distribution is here.

In Postgres indexes are bound to operators (which are implemented for certain types), not data types alone or functions or anything else. That's a heritage from the original Berkeley design of Postgres and very hard to change now. And it's generally working just fine. Here is a thread on pgsql-bugs with Tom Lane commenting on this.

Some PostGis functions (like ST_DWithin()) seem to violate this principal, but that is not so. Those functions are rewritten internally to use respective operators.

The indexed expression must be to the left of the operator. For most operators (including all of the above) the query planner can achieve this by flipping operands if you place the indexed expression to the right - given that a COMMUTATOR has been defined. The ANY construct can be used in combination with various operators and is not an operator itself. When used as constant = ANY (array_expression) only indexes supporting the = operator on array elements would qualify and we would need a commutator for = ANY(). GIN indexes are out.

Postgres is not currently smart enough to derive a GIN-indexable expression from it. For starters, constant = ANY (array_expression) is not completely equivalent to array_expression @> ARRAY[constant]. Array operators return an error if any NULL elements are involved, while the ANY construct can deal with NULL on either side. And there are different results for data type mismatches.

Related answers:

Asides

While working with integer arrays (int4, not int2 or int8) without NULL values (like your example implies) consider the additional module intarray, that provides specialized, faster operators and index support. See:

As for the UNIQUE constraint in your question that went unanswered: That's implemented with a btree index on the whole array value (like you suspected) and does not help with the search for elements at all. Details:

5
  • 1
    Aaaaaaah, feeling quite embarrassed right now, but it just didn't come to my mind that postgres would not use the index even if theoretically possible. Maybe it's also because my lack of insight into postgres, such as that indices are bound to operators. Thank you for taking time to answer my ill-posed question and sharing your knowledge!
    – Tregoreg
    Commented Mar 25, 2015 at 22:17
  • 7
    @Tregoreg: Don't be too embarrassed, it's really not too obvious. I remember being confused by this myself when I first ran into it. The added question and clarification should be quite useful to the general public. Commented Mar 25, 2015 at 22:23
  • 1
    From my experience, these indexes offer little to no speedup unless gin__int_ops is used for integer[] columns. It took me years of frustration and looking for other solutions until I discovered this op class. It's a borderline miracle worker.
    – IamIC
    Commented Nov 28, 2017 at 17:46
  • 2
    @IamIC: I added pointers to intarray. Seems noteworthy, as you pointed out. Commented May 19, 2018 at 14:39
  • 1
    For ANY (array_expression) = constant expressions, GIN indexes work fine?
    – user10375
    Commented Mar 22, 2019 at 13:55
37

It's now possible to index the individual array elements. For example:

CREATE TABLE test (foo int[]);
INSERT INTO test VALUES ('{1,2,3}');
INSERT INTO test VALUES ('{4,5,6}');
CREATE INDEX test_index on test ((foo[1]));
SET enable_seqscan TO off;

EXPLAIN ANALYZE SELECT * from test WHERE foo[1]=1;
                                                QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using test_index on test  (cost=0.00..8.27 rows=1 width=32) (actual   time=0.070..0.071 rows=1 loops=1)
   Index Cond: (foo[1] = 1)
 Total runtime: 0.112 ms
(3 rows)

This works on at least Postgres 9.2.1. Note that you need to build a separate index for each array index, in my example I only indexed the first element.

2
  • 42
    Let it not be lost - this approach is hopeless for variable length array where you want to use the ANY() operator. Commented Aug 5, 2014 at 11:10
  • 41
    This is really not very useful. If you have a fixed number of array elements, you'd rather use individual columns for each element (and plain btree indices) instead of building a more expensive expression index for each array item. Storage of individual columns is much cheaper without array overhead, too. Commented Mar 25, 2015 at 1:09

Not the answer you're looking for? Browse other questions tagged or ask your own question.