I gave that talk at https://fosdem.org/2016/ on January, 31st 2015 in the PostgreSQL developer room.
The talk covert PostgreSQL JSON features and included new functions and operators introduced in 9.5.
The SQL statements are available on GiHub at https://github.com/sjstoelting/talks/tree/master/json-by-example
Report
Share
Report
Share
1 of 47
More Related Content
JSON By Example
1. JSON by example
FOSDEM PostgreSQL Devevoper Room
January 2016
Stefanie Janine Stölting
@sjstoelting
2. JSON
JavaScript Object Notation
Don't have to care about encoding, it is always
Unicode, most implemantations use UTF8
Used for data exchange in web application
Currently two standards RFC 7159 by Douglas
Crockford und ECMA-404
PostgreSQL impementation is RFC 7159
3. JSON Datatypes
JSON
Available since 9.2
BSON
Available as extension on GitHub since 2013
JSONB
Available since 9.4
Crompessed JSON
Fully transactionoal
Up to 1 GB (uses TOAST)
7. Index on JSON
Index JSONB content for faster access with indexes
GIN index overall
CREATE INDEX idx_1 ON jsonb.actor USING
GIN (jsondata);
Even unique B-Tree indexes are possible
CREATE UNIQUE INDEX actor_id_2 ON
jsonb.actor((CAST(jsondata->>'actor_id' AS
INTEGER)));
8. New JSON functions
PostgreSQL 9.5 new JSONB functions:
jsonb_pretty: Formats JSONB human readable
jsonb_set: Update or add values
PostgreSQL 9.5 new JSONB operators:
||: Concatenate two JSONB
-: Delete key
Available as extions for 9.4 at PGXN: jsonbx
9. Data sources
The Chinook database is available
at chinookdatabase.codeplex.com
Amazon book reviews of 1998 are
available at
examples.citusdata.com/customer_review
11. CTE
Common Table Expressions will be used in examples
● Example:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n), min(n), max(n) FROM t;
●
Result:
13. Live with Chinook data
-- Step 1: Tracks as JSON with the album identifier
WITH tracks AS
(
SELECT "AlbumId" AS album_id
, "TrackId" AS track_id
, "Name" AS track_name
FROM "Track"
)
SELECT row_to_json(tracks) AS tracks
FROM tracks
;
14. Live with Chinook data
-- Step 2 Abums including tracks with aritst identifier
WITH tracks AS
(
SELECT "AlbumId" AS album_id
, "TrackId" AS track_id
, "Name" AS track_name
FROM "Track"
)
, json_tracks AS
(
SELECT row_to_json(tracks) AS tracks
FROM tracks
)
, albums AS
(
SELECT a."ArtistId" AS artist_id
, a."AlbumId" AS album_id
, a."Title" AS album_title
, array_agg(t.tracks) AS album_tracks
FROM "Album" AS a
INNER JOIN json_tracks AS t
ON a."AlbumId" = (t.tracks->>'album_id')::int
GROUP BY a."ArtistId"
, a."AlbumId"
, a."Title"
)
SELECT artist_id
, array_agg(row_to_json(albums)) AS album
FROM albums
GROUP BY artist_id
;
16. Live with Chinook data
-- Step 3 Return one row for an artist with all albums as VIEW
CREATE OR REPLACE VIEW v_json_artist_data AS
WITH tracks AS
(
SELECT "AlbumId" AS album_id
, "TrackId" AS track_id
, "Name" AS track_name
, "MediaTypeId" AS media_type_id
, "Milliseconds" As milliseconds
, "UnitPrice" AS unit_price
FROM "Track"
)
, json_tracks AS
(
SELECT row_to_json(tracks) AS tracks
FROM tracks
)
, albums AS
(
SELECT a."ArtistId" AS artist_id
, a."AlbumId" AS album_id
, a."Title" AS album_title
, array_agg(t.tracks) AS album_tracks
FROM "Album" AS a
INNER JOIN json_tracks AS t
ON a."AlbumId" = (t.tracks->>'album_id')::int
GROUP BY a."ArtistId"
, a."AlbumId"
, a."Title"
)
, json_albums AS
(
SELECT artist_id
, array_agg(row_to_json(albums)) AS album
FROM albums
GROUP BY artist_id
)
-- -> Next Page
17. Live with Chinook data
-- Step 3 Return one row for an artist with all albums as VIEW
, artists AS
(
SELECT a."ArtistId" AS artist_id
, a."Name" AS artist
, jsa.album AS albums
FROM "Artist" AS a
INNER JOIN json_albums AS jsa
ON a."ArtistId" = jsa.artist_id
)
SELECT (row_to_json(artists))::jsonb AS artist_data
FROM artists
;
18. Live with Chinook data
-- Select data from the view
SELECT *
FROM v_json_artist_data
;
19. Live with Chinook data
-- SELECT data from that VIEW, that does querying
SELECT jsonb_pretty(artist_data)
FROM v_json_artist_data
WHERE artist_data->>'artist' IN ('Miles Davis', 'AC/DC')
;
20. Live with Chinook data
-- SELECT some data from that VIEW using JSON methods
SELECT artist_data->>'artist' AS artist
, artist_data#>'{albums, 1, album_title}' AS album_title
, jsonb_pretty(artist_data#>'{albums, 1, album_tracks}') AS album_tracks
FROM v_json_artist_data
WHERE artist_data->'albums' @> '[{"album_title":"Miles Ahead"}]'
;
21. Live with Chinook data
-- Array to records
SELECT artist_data->>'artist_id' AS artist_id
, artist_data->>'artist' AS artist
, jsonb_array_elements(artist_data#>'{albums}')->>'album_title' AS album_title
, jsonb_array_elements(jsonb_array_elements(artist_data#>'{albums}')#>'{album_tracks}')->>'track_name' AS song_titles
, jsonb_array_elements(jsonb_array_elements(artist_data#>'{albums}')#>'{album_tracks}')->>'track_id' AS song_id
FROM v_json_artist_data
WHERE artist_data->>'artist' = 'Metallica'
ORDER BY album_title
, song_id
;
22. Live with Chinook data
-- Convert albums to a recordset
SELECT *
FROM jsonb_to_recordset(
(
SELECT (artist_data->>'albums')::jsonb
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
)
) AS x(album_id int, artist_id int, album_title text, album_tracks jsonb)
;
23. Live with Chinook data
-- Convert the tracks to a recordset
SELECT album_id
, track_id
, track_name
, media_type_id
, milliseconds
, unit_price
FROM jsonb_to_recordset(
(
SELECT artist_data#>'{albums, 1, album_tracks}'
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
)
) AS x(album_id int, track_id int, track_name text, media_type_id int, milliseconds int, unit_price float)
;
24. Live with Chinook data
-- Create a function, which will be used for UPDATE on the view v_artrist_data
CREATE OR REPLACE FUNCTION trigger_v_json_artist_data_update()
RETURNS trigger AS
$BODY$
-- Data variables
DECLARE rec RECORD;
-- Error variables
DECLARE v_state TEXT;
DECLARE v_msg TEXT;
DECLARE v_detail TEXT;
DECLARE v_hint TEXT;
DECLARE v_context TEXT;
BEGIN
-- Update table Artist
IF (OLD.artist_data->>'artist')::varchar(120) <> (NEW.artist_data->>'artist')::varchar(120) THEN
UPDATE "Artist"
SET "Name" = (NEW.artist_data->>'artist')::varchar(120)
WHERE "ArtistId" = (OLD.artist_data->>'artist_id')::int;
END IF;
-- Update table Album with an UPSERT
-- Update table Track with an UPSERT
RETURN NEW;
EXCEPTION WHEN unique_violation THEN
RAISE NOTICE 'Sorry, but the something went wrong while trying to update artist data';
RETURN OLD;
WHEN others THEN
GET STACKED DIAGNOSTICS
v_state = RETURNED_SQLSTATE,
v_msg = MESSAGE_TEXT,
v_detail = PG_EXCEPTION_DETAIL,
v_hint = PG_EXCEPTION_HINT,
v_context = PG_EXCEPTION_CONTEXT;
RAISE NOTICE '%', v_msg;
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
26. Live with Chinook data
-- The trigger will be fired instead of an UPDATE statement to save data
CREATE TRIGGER v_json_artist_data_instead_update INSTEAD OF UPDATE
ON v_json_artist_data
FOR EACH ROW
EXECUTE PROCEDURE trigger_v_json_artist_data_update()
;
27. Live with Chinook data
-- Manipulate data with jsonb_set
SELECT artist_data->>'artist_id' AS artist_id
, artist_data->>'artist' AS artist
, jsonb_set(artist_data, '{artist}', '"Whatever we want, it is just text"'::jsonb)->>'artist' AS new_artist
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
;
28. Live with Chinook data
-- Update a JSONB column with a jsonb_set result
UPDATE v_json_artist_data
SET artist_data= jsonb_set(artist_data, '{artist}', '"NEW Metallica"'::jsonb)
WHERE (artist_data->>'artist_id')::int = 50
;
29. Live with Chinook data
-- View the changes done by the UPDATE statement
SELECT artist_data->>'artist_id' AS artist_id
, artist_data->>'artist' AS artist
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
;
30. Live with Chinook data
-- Lets have a view on the explain plans
– SELECT the data from the view
31. Live with Chinook data
-- View the changes in in the table instead of the JSONB view
-- The result should be the same, only the column name differ
SELECT *
FROM "Artist"
WHERE "ArtistId" = 50
;
32. Live with Chinook data
-- Lets have a view on the explain plans
– SELECT the data from table Artist
33. -- Manipulate data with the concatenating / overwrite operator
SELECT artist_data->>'artist_id' AS artist_id
, artist_data->>'artist' AS artist
, jsonb_set(artist_data, '{artist}', '"Whatever we want, it is just text"'::jsonb)->>'artist' AS new_artist
, artist_data || '{"artist":"Metallica"}'::jsonb->>'artist' AS correct_name
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
;
Live with Chinook data
34. Live with Chinook data
-- Revert the name change of Metallica with in a different way: With the replace operator
UPDATE v_json_artist_data
SET artist_data = artist_data || '{"artist":"Metallica"}'::jsonb
WHERE (artist_data->>'artist_id')::int = 50
;
35. Live with Chinook data
-- View the changes done by the UPDATE statement with the replace operator
SELECT artist_data->>'artist_id' AS artist_id
, artist_data->>'artist' AS artist
FROM v_json_artist_data
WHERE (artist_data->>'artist_id')::int = 50
;
36. Live with Chinook data
-- Remove some data with the - operator
SELECT jsonb_pretty(artist_data) AS complete
, jsonb_pretty(artist_data - 'albums') AS minus_albums
, jsonb_pretty(artist_data) = jsonb_pretty(artist_data - 'albums') AS is_different
FROM v_json_artist_data
WHERE artist_data->>'artist' IN ('Miles Davis', 'AC/DC')
;
37. Live Amazon reviews
-- Create a table for JSON data with 1998 Amazon reviews
CREATE TABLE reviews(review_jsonb jsonb);
38. Live Amazon reviews
-- Import customer reviews from a file
COPY reviews
FROM '/var/tmp/customer_reviews_nested_1998.json'
;
39. Live Amazon reviews
-- There should be 589.859 records imported into the table
SELECT count(*)
FROM reviews
;
41. Live Amazon reviews
-- Select data with JSON
SELECT
review_jsonb#>> '{product,title}' AS title
, avg((review_jsonb#>> '{review,rating}')::int) AS average_rating
FROM reviews
WHERE review_jsonb@>'{"product": {"category": "Sheet Music & Scores"}}'
GROUP BY title
ORDER BY average_rating DESC
;
Without an Index: 248ms
42. Live Amazon reviews
-- Create a GIN index
CREATE INDEX review_review_jsonb ON reviews USING GIN (review_jsonb);
43. Live Amazon reviews
-- Select data with JSON
SELECT review_jsonb#>> '{product,title}' AS title
, avg((review_jsonb#>> '{review,rating}')::int) AS average_rating
FROM reviews
WHERE review_jsonb@>'{"product": {"category": "Sheet Music & Scores"}}'
GROUP BY title
ORDER BY average_rating DESC
;
The same query as before with the previously created GIN Index: 7ms
44. Live Amazon reviews
-- SELECT some statistics from the JSON data
SELECT review_jsonb#>>'{product,category}' AS category
, avg((review_jsonb#>>'{review,rating}')::int) AS average_rating
, count((review_jsonb#>>'{review,rating}')::int) AS count_rating
FROM reviews
GROUP BY category
;
Without an Index: 9747ms
45. Live Amazon reviews
-- Create a B-Tree index on a JSON expression
CREATE INDEX reviews_product_category ON reviews ((review_jsonb#>>'{product,category}'));
46. Live Amazon reviews
-- SELECT some statistics from the JSON data
SELECT review_jsonb#>>'{product,category}' AS category
, avg((review_jsonb#>>'{review,rating}')::int) AS average_rating
, count((review_jsonb#>>'{review,rating}')::int) AS count_rating
FROM reviews
GROUP BY category
;
The same query as before with the previously created BTREE Index: 1605ms
47. JSON by example
This document by Stefanie Janine Stölting is covered by the
Creative Commons Attribution 4.0 International