How to replace multiple special characters in Postgres 9.5

Question

I have a table containing a list of name which might contain special character:

id   name
1    Johän
2    Jürgen
3    Janna
4    Üdyr
...

Is there a function that replaces each character for another specific one? (Not necessarily an unaccented one). Something like this:

SELECT id, function('ä,ü',name,'ae,ue');
Result:

    id   name
    1    Johaen
    2    Juergen
    3    Janna
    4    UEdyr
    ...

Possible duplicate of Multiple split and assign order_id CHECK the TRANSLATE part. How convert every char to <space> — Juan Carlos Oropeza, Commented Jul 27, 2016 at 16:59
@JuanCarlosOropeza this is little bit different task - you cannot use translate because it is designed for single characters only. — Pavel Stehule, Commented Jul 27, 2016 at 17:46
I had a similar problem and what I did was just use replace twice. Though I don’t know if that’s the best / efficient way to go about it. replace(replace(column,‘ä’,’ae’), ‘ü’,’ue’) — DardanM, Commented Jul 1, 2020 at 17:39

Community · Accepted Answer · 2017-05-23 12:22:36Z

replace()

If you want just to replace one or few characters you can use function replace(string text, from text, to text) that replaces all occurrences in string substring. The replace function can be used to replace one character to several characters.

translate()

If you want to translate some letters to other letters you can user function translate(string text, from text, to text) that replaces any character in a string that matches a character in the from by the corresponding character in the to set.

Some data to play with:

drop table if exists xyz;

create table xyz (
    id serial not null,
    name varchar(30)
);

insert into xyz (name) values
    ('Juhänäo'),
    ('Jürgüen'),
    ('Dannäu'),
    ('Übüdyr');

Example of replace function:

select replace(name, 'ä', 'a') from xyz;

This function replaces letter ä in the name column with letter a. Juhänäo becomes Juhanao.

select replace(name, 'ä', 'ae') from xyz;

Now it replaces letter ä with ae.

select replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue') from xyz;

Not very nice, but in the example all ä become ae, ü become ue, and Ü become 'Ue'.

update xyz set name = replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue');

Changes letters and updates rows. The result of the update is following:

Juhaenaeo
Juergueen
Dannaeu
Uebuedyr

Example of translate function:

select translate(name, 'ä,ü,Ü', 'a,u,U') from xyz;

Translates all letters ä to a, ü to u and Ü to U.

update xyz set name = translate(name, 'ä,ü,Ü', 'a,u,U');

Updates table so all predefined letters are translated and the change is saved to the database. The result of the update is following:

Juhanao
Jurguen
Dannau
Ubudyr

More information:

Replace characters with multi-character strings

Postgresql string functions

The commas in translate are superfluous; the correct syntax according to the docs would be translate(name, 'äüÜ', 'auU');. — Martin von Wittich, Commented Feb 10, 2023 at 10:18

Pavel Stehule · Accepted Answer · 2016-07-27 17:39:27Z

11

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

answered Jul 27, 2016 at 17:39

Pavel Stehule

44.9k6 gold badges95 silver badges98 bronze badges

I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed.
– Juan Carlos Oropeza
Commented Jul 27, 2016 at 18:22
just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 ms
– Juan Carlos Oropeza
Commented Jul 27, 2016 at 18:33
The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions;
– Pavel Stehule
Commented Jul 27, 2016 at 19:05
Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.
– klin
Commented Jul 27, 2016 at 19:14
@PavelStehule Is my fault didnt saw the 9.5 on the title.
– Juan Carlos Oropeza
Commented Jul 27, 2016 at 20:20

| Show 1 more comment

Tomi Panula-Ontto · Accepted Answer · 2021-10-19 12:14:06Z

1

Avoid writing your own unaccent function. Instead, I recommend using unaccent extension.

create extension unaccent;
select unaccent('Juhänäo');
select unaccent(name) from xyz;

Here is a link to Postgres documentation for Unaccent

answered Oct 19, 2021 at 12:14

Tomi Panula-Ontto

111 bronze badge

Add a comment |

soufiane ELAMMARI · Accepted Answer · 2022-11-10 20:08:51Z

1

you simply use this :

select  translate(column, 'âàãáéêèíóôõüúç', 'aaaaeeeiooouuc') from table

Good luck ^^

answered Nov 10, 2022 at 20:08

soufiane ELAMMARI

97910 silver badges13 bronze badges

Add a comment |

Rafs · Accepted Answer · 2023-05-31 12:27:18Z

If you are after German letters, then this works:

CREATE OR REPLACE FUNCTION public.udf_transliterate_german(
    german_word character varying)
    RETURNS character varying
    LANGUAGE 'sql'
    COST 100
    IMMUTABLE PARALLEL UNSAFE
AS $BODY$
SELECT REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(german_word,
                                'ä','ae'),
                            'ö','oe' ),
                        'ü','ue'),
                    'ß','ss'),
                'Ä', 'AE'),
            'Ö', 'OE'),
        'Ü', 'UE'),
    'ẞ', 'SS');
$BODY$;

It is not elegant though.

bcsta · Accepted Answer · 2022-12-11 11:47:23Z

This is a generic recursive function written in python (but can easily be replicated in whatever language you prefer) That takes in the original string and a list of substrings to be removed:

def replace_psql_string(str_to_remove, query):

    ss = str_to_remove.split(',')
    if ss[0] != '':
        query = "REGEXP_REPLACE('{0}', '{1}', '', 'gi')".format(query, ss[0])
        return self.replace_psql_string(','.join(ss[1:]), query)
    else:
        return query

# Run it
replace_psql_string("test,foo", "input test string")

The code splits the comma separated string to a list of substrings to remove, Creates a REGEXP_REPLACE function in sql with the first substring. Then, recursively calls the function again with [1:] elements and append each REGEXP query to the passed parameter. So that finally, when no more substrings are in the list to remove, the final aggregated query is returned.

This will output:

REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi')

Then use it in whatever part of your query that makes sense. Can be tested by appending a SELECT in front:

SELECT REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi');


regexp_replace 
----------------
 input  string
(1 row)

Collectives™ on Stack Overflow

How to replace multiple special characters in Postgres 9.5

6 Answers 6

Not the answer you're looking for? Browse other questions tagged
postgresql
special-characters
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Not the answer you're looking for? Browse other questions tagged postgresqlspecial-characters or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
postgresql
special-characters
or ask your own question.