14

What considerations are involved in choosing between a jsonb column and a composite type column of the same structure?

For example, consider a column like that used in the Postgres documentation:

CREATE TYPE inventory_item AS (
name            text,
supplier_id     integer,
price           numeric
);

What are the tradeoffs involved between this approach vs a jsonb column mirroring this structure?

For example, I suspect that the composite type won't require storing the key names for every record, whereas the jsonb type would require this.

2
  • 2
    IMHO don't use json for relational data.
    – McNets
    Commented Jan 17, 2019 at 20:08
  • 1
    This question is a good context to review this other question, about "Binary to binary cast with JSONb". The "cast to composite type" performance seems a good comparison case for jsonb_populate_record function Commented Jul 19, 2020 at 22:17

4 Answers 4

7

The TYPE inventory_item was defined in the question (same as in the guide), so we need only to define tables tc with composite (ROW) type, and tj with JSONb type.

INSERT TIME

-- drop table tc; drop table tj;
CREATE TABLE tc (id serial, x inventory_item);
CREATE TABLE tj (id serial, x JSONb);

EXPLAIN ANALYSE 
  INSERT INTO tc(x) VALUES 
    (ROW('fuzzy dice', 42, 1.99)),
    (ROW('test pi', 3, 3.1415))
; -- Execution Time: try1 0.386 ms; try2 0.559 ms; try3 0.102 ms; ... 
EXPLAIN ANALYSE 
  INSERT INTO tj(x) VALUES 
    ('{"name":"fuzzy dice", "supplier_id":42, "price":1.99}'::jsonb),
    ('{"name":"test pi", "supplier_id":3, "price":3.1415}'::jsonb)
; -- Execution Time: try1 0.343; try2 0.355 ms; try3 0.112 ms; ...

Of course, we need loops, etc. something por complex to test... But seems "comparable" times, no big difference.

SELECT NATIVE TIME

Only retrieving the original datatype. Need good benchmark, but lets imagine something simple only to check big differences.

EXPLAIN ANALYSE   SELECT x, i FROM tc, generate_series(1,999999) g(i); 
EXPLAIN ANALYSE   SELECT x, i FROM tj, generate_series(1,999999) g(i); 

no difference again. Both with "Execution Time: ~460".

EXPLODE TIME

EXPLAIN ANALYSE
  SELECT i, id, (x).name, (x).supplier_id, (x).price
  FROM tc, generate_series(1,999999) g(i)
; -- Execution Time: ~490 ms 
EXPLAIN ANALYSE
  SELECT i, tj.id, t.* 
  FROM tj, generate_series(1,999999) g(i),
       LATERAL jsonb_populate_record(null::inventory_item, tj.x) t  
; -- Execution Time: ~650 ms

Seems that is very fast to transform JSONb-object into SQL-row! Seems a binary cast: we can suppose that the function jsonb_populate_record maps the JSONb types to SQL using by inventory_item internal definition.

And it is faster tham composite table.

Explode and calculate something

EXPLAIN ANALYSE
  SELECT i, (x).supplier_id+i, (x).price+0.01
  FROM tc, generate_series(1,999999) g(i)
; -- Execution Time: ~800 ms

EXPLAIN ANALYSE
  SELECT i,  t.supplier_id+i, t.price+0.01
  FROM tj, generate_series(1,999999) g(i),
       LATERAL jsonb_populate_record(null::inventory_item, tj.x) t  
; -- Execution Time: ~620 ms

Perhaps ~150 ms to calculations, so same expected time... There are some error in the example above, need better benchmark to check real difference.


Check comparative time to cast from text.

EXPLAIN ANALYSE -- (supposed to) cast from binary
  SELECT i, id, x->>'name' as name, 
         (x->'supplier_id')::int as supplier_id, (x->'price')::float as price
  FROM tj, generate_series(1,999999) g(i)
; -- Execution Time: ~1600 ms 

EXPLAIN ANALYSE -- cast from text
  SELECT i, id, x->>'name' as name, 
         (x->>'supplier_id')::int as supplier_id, (x->>'price')::float as price
  FROM tj, generate_series(1,999999) g(i)
; -- Execution Time: ~1600 ms 

Long and same times. Seems that (x->'supplier_id')::int it is only a sugar syntax for (x->>'supplier_id')::int or (x->'supplier_id')::text::int.

PS: this answer is also a complement for this other question, about "Binary to binary cast with JSONb".

1
  • I couldn't replicate these results when running against existing data (as opposed to generating the series during the SELECT). See here for my results. It seemed like the benefits of the binary cast were outweighed by the lateral join. Maybe I'm doing something wrong?
    – Robert
    Commented Jun 15 at 0:16
5

It's not really a question of performance,

  • Composite Types are strongly typed meaning you have to define their shape.
  • JSON is weakly typed.

Moreover, JSON has a use case, but with a composite type you almost never want to use them because it's almost always better normalizing them into their own relations. There is an exception to that though -- when you're going to create a whole library around a set of data then composite types can make everything and nicer. For example stdaddr in PostGIS is one such example. That type represents a physical address and multiple things can take that type (like the PostGIS geocoder).

So you can do

CREATE TABLE foo ( foo_id int, address postgis.stdaddr );

now you can get the geocoded information relatively easily and you only have to pass around one value rather than 15.

3

Using a composite type Pros

  • less disc space
  • faster select/insert/update

Using a composite type Cons

  • it's structured (ie not flexible). You can not make up fields on the fly.
  • longer coding time (especially when you use ORM)

There is one more option for storing structured data which is using another table :). You can test yourself by creating 2 tables.

2

Edit: These tests were performed in Postgres 12.16.

When I ran the queries in the selected answer against my own data, I found significantly different results. In particular, the LATERAL JOIN to jsonb_populate_record was very slow. Perhaps the difference is because I was running the queries against existing data, as opposed to generating the series during the SELECT?

Here is a fresh setup (similar to that in the selected answer) that demonstrates my results:

Initial setup:

CREATE TYPE inventory_item AS (name text, supplier_id integer, price numeric);
CREATE TABLE t_composite (id serial, x inventory_item);
CREATE TABLE t_jsonb (id serial, x JSONb);

Seeding data:

INSERT INTO t_composite (x)
  SELECT ROW('fuzzy-' || round(random()*1000), round(random()*70), random())::inventory_item 
  FROM generate_series(1,1000000)
; -- Execution time: 3.8s

INSERT INTO t_jsonb (x)
  SELECT json_build_object('name','fuzzy-' || round(random()*1000),'supplier_id',round(random()*70),'price',random()) 
  FROM generate_series(1,1000000)
; -- Execution time: 7.2s

Testing SELECT:

EXPLAIN ANALYZE     
  SELECT id, (x).price 
  FROM t_composite
; -- Execution time: 300ms

EXPLAIN ANALYZE     
  SELECT id, (x->>'price')::float 
  FROM t_jsonb
; -- Execution time: 510ms (casting from text -> float)

EXPLAIN ANALYZE     
  SELECT id, price 
  FROM t_jsonb, LATERAL jsonb_populate_record(null::inventory_item, t_jsonb.x) t
; -- Execution time: 2000ms (binary cast)

Joining to jsonb_populate_record made the query significantly slower. I'm guessing the performance boost of the binary cast is outweighed by having to do a lateral join?

Getting the jsonb value by simply casting it from text proved to be much faster than the jsonb_populate_record method. But it was still slower than using a composite.

2
  • 1
    Thanks Robert, good analysis (!). Please edit to show your PostgreSQL version. It is an old question, perhaps I need to run again on newer versions. On INSERT (your "Seeding data" section) I have average of execution times of: 2.3 s for t_composite and 3.3 for t_jsonb, using jsonb_build_object. With JSON instead JSONb grows to 4.2 s. So, the factor is 70% (2.3/3.3), not 53% (your 3.8/7.2 ~ my 2.3/4.2). All other SELECT times ok, similar factors on my machine. Commented Jun 16 at 17:45
  • Thanks for the response, I've edited my post -- I'm still on a pretty old Postgres version (12.16). That's a bummer about the SELECT performance. I was hoping to use jsonb_populate_record to get around the wasteful type casting but I haven't been able to use it effectively on real data.
    – Robert
    Commented Jun 17 at 20:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.