Compact or renumber IDs for all tables, and reset sequences to max(id)?

Question

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.

I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.

But it's not easy to compact IDs by hand because of references and constraints.

Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?

Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.

Community · Accepted Answer · 2017-05-23 11:54:13Z

The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:

Compacting a sequence in PostgreSQL

The currently accepted answer will fail for most cases.

Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
- Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.

The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:

BEGIN;

LOCK tbl;

-- remove all FK constraints to the column

ALTER TABLE tbl DROP CONSTRAINT tbl_pkey;  -- remove PK

-- for the simple case without FK references - or see below:    
UPDATE tbl t  -- intermediate unique violations are ignored now
SET    id = t1.new_id
FROM  (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE  t.id = t1.id;

-- Update referencing value in FK columns at the same time (if any)

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back

-- add all FK constraints to the column back

COMMIT;

This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.

If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.

Example for a table fk_tbl and a FK column fk_id:

WITH u1 AS (
   UPDATE tbl t
   SET    id = t1.new_id
   FROM  (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
   WHERE  t.id = t1.id
   RETURNING t.id, t1.new_id  -- return old and new ID
   )
UPDATE fk_tbl f
SET    fk_id = u1.new_id      -- set to new ID
FROM   u1
WHERE  f.fk_id = u1.id;       -- match on old ID

More in the referenced answer on dba.SE.

there is another method: renaming the id column, adding a new serial id column; the same for referencing FKs, then using {oldid, newid} to update the referencing FKs, then dropping the {oldid, oldFK} The order of renaming can be varied; in the extreme case the old and new ids and FKs coexist, allowing the old scheme to still exist while the work is in progress. Should I elaborate? — joop, Commented Aug 24, 2015 at 15:36
@joop: You might add another answer with details here or, better yet, under the new question on dba.SE with a much more substantial answer. — Erwin Brandstetter, Commented Aug 24, 2015 at 15:47
I don't have an account there (what? no single sign on?) so I'll post it here. — joop, Commented Aug 24, 2015 at 16:41
@joop: you can "register" with dba.se using your existing stackexchange account. — user330315, Commented Aug 24, 2015 at 16:48

rubik · Accepted Answer · 2015-08-24 14:51:50Z

15

Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.

CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.

For example:

Create the table, put some data in, and remove a middle value:

db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
 id | data  
----+-------
  1 | hello
  2 | world
  4 | are
  5 | you
(4 rows)

Reset your sequence:

db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE

Update your data:

db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
 id | data  
----+-------
  1 | hello
  2 | world
  3 | are
  4 | you
(4 rows)

edited Aug 24, 2015 at 14:51

rubik

8,97410 gold badges61 silver badges92 bronze badges

answered Aug 1, 2011 at 6:49

unpythonic

4,05020 silver badges20 bronze badges

1

This is not going to work as expected for most use cases. Consider details in the added answer.
– Erwin Brandstetter
Commented Aug 24, 2015 at 17:32
A small trick can make this answer work in all cases: you just have to renumber the ids to some unique higher numbers that are sure not going to interfere with new compact ids. So before running the answer above, just do: UPDATE foo SET id = id + (SELECT max(id) FROM foo);
– Robert Špendl
Commented Dec 14, 2020 at 14:00

Add a comment |

joop · Accepted Answer · 2015-08-27 16:50:28Z

2

new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)

\i tmp.sql
    -- the test tables
CREATE TABLE one (
    id serial NOT NULL PRIMARY KEY
    , payload text
    );
CREATE TABLE two (
    id serial NOT NULL PRIMARY KEY
    , the_fk INTEGER REFERENCES one(id)
            ON UPDATE CASCADE ON DELETE CASCADE
    );
    -- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);

    -- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;

INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;

    -- make some gaps
DELETE FROM one WHERE id % 13 > 0;

-- SELECT * FROM two;

    -- Add new keycolumns to one and two
ALTER TABLE one
    ADD COLUMN new_id SERIAL NOT NULL UNIQUE
    ;

    -- UPDATE:
    -- This could need DEFERRABLE
    -- Note since the update is only a permutation of the
    -- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;

ALTER TABLE two
    ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
    ;

    -- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
    ;

SELECT * FROM two;

    -- The crucial part: the final renaming
    -- (at this point it would be better not to allow other sessions
    -- messing with the {one,two} tables ...
    -- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);

ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);

    -- Some checks.
    -- (the automatically generated names for the indexes
    -- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two

UPDATE: added the permutation of new_id (after creating it as a serial) Funny thing is: it doesn't seem to need 'DEFERRABLE'.

edited Aug 27, 2015 at 16:50

answered Aug 24, 2015 at 16:42

joop

4,4481 gold badge17 silver badges26 bronze badges

Some details: 1: Typically, one would want to retain original order while closing gaps. ADD COLUMN new_id SERIAL NOT NULL UNIQUE doesn't do that - just like the currently accepted answer. 2: The new FK constraints should CASCADE like the old one. 3: No CASCADE needed with DROP COLUMN the_fk.
– Erwin Brandstetter
Commented Aug 24, 2015 at 17:16
0) It was basically intended as a PoC. 1) You are right about the order, I didn't think anybody would be interested in the ordering of key values ... 2) Without CASCADE, the drop column didn't work here (9.3.5) 3) ditto. 2+3 can be easily fixed (might need some extra steps) 1 is a bit harder; would need at least a row_number() plus a set_val() afterwards.
– joop
Commented Aug 25, 2015 at 9:00
The added statement works with a not deferrable constraint because it happens to update rows in order. The window function row_number() produces an ordered set and Postgres simply uses that in the UPDATE, so no conflict arises. However, it's an implementation detail that's not documented and not guaranteed to work in all implementations or keep working across Postgres versions. The currently accepted answer updates in arbitrary order and is almost certain to fail. To verify my explanation, add ORDER BY random() to the subquery of the UPDATE, you'll get a unique violation error.
– Erwin Brandstetter
Commented Aug 27, 2015 at 15:58
[I believe that this is caused by an implementation detail but ] I would expect that when permuting a set of N keyvalues (onto themselves) one-by-one, touching the first (or any of them) would already create a (temporal) duplicate. So, for some reason PG is able to postpone part of the check (in this particular case) to a later point in the operation (could we call this "semi- deferrable" /-) On second thought, this could be a side effect of the row-versioning process. BTW: it would be trivial to postpone the addition the UNIQUE constraint to the new_id to a later stage of the operation.
– joop
Commented Aug 27, 2015 at 16:15
Postgres does not postpone the check, that's documented explicitly. We discussed that in detail under this related question. I also added the link to my answer. Go through it step-by-step. No updated row violates the unique (PK) constraint if done in order.
– Erwin Brandstetter
Commented Aug 27, 2015 at 16:27

| Show 1 more comment

Mesbah Gueffaf · Accepted Answer · 2016-01-19 10:12:22Z

*This script will work for postgresql

This is a generic solution that works for all cases

This query find the desciption of the fields of all tables from any database.

WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from 
 ( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
  FROM information_schema.columns 
 ) colum

 full join (SELECT schemaname, relid, relname,objoid,  objsubid, description
 FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid  ) descre
  on descre.relname = colum.table_name and  descre.objsubid=colum.ordinal_position   and  descre.schemaname=colum.schemaname )

This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).

It finds the number of records of the table and then increment this number by one.

SELECT  table_name, column_name, ordinal_position,column_default, 
   data_type, is_nullable, character_maximum_length, is_updatable, 
   description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''',    (select max( '||column_name ||')+1  from '|| table_name ||' ), true);' as req
  FROM description_bd where column_default  like '%nextva%'

Hello @Mesbah Gueffaf, it would help the readability of your answer if you formatted the SQL statements nicer (shorter lines and consistent indentation would be my suggestion), and explained exactly what the statements do and why they work. — Niels Abildgaard, Commented Jan 19, 2016 at 9:22
Thanks @NielsAbildgaard for the review. We have added more explanation to the answer. We hope this contribution could help the community. — Mesbah Gueffaf, Commented Jan 19, 2016 at 10:13

furiozo ga · Accepted Answer · 2021-02-21 14:37:03Z

Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job. It is called like this :

=> SELECT resequence('port','id','port_id_seq');
 resequence   
--------------
 5090 -> 3919

Takes 3 parameters

name of table
name of column that is SERIAL
name of sequence that the SERIAL uses

The function returns a short report of what it has done, with the previous value of the sequence and the new value.

The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.

The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.

The code :

CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE                                            
        _old BIGINT;_new BIGINT := 0;              
BEGIN
        FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
                _new=_new+1;
                EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
        END LOOP;
        RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;

Collectives™ on Stack Overflow

Compact or renumber IDs for all tables, and reset sequences to max(id)?

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
postgresql
auto-increment
identity
compact-database
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Not the answer you're looking for? Browse other questions tagged postgresqlauto-incrementidentitycompact-database or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
postgresql
auto-increment
identity
compact-database
or ask your own question.