16

I have a table containing a list of name which might contain special character:

id   name
1    Johän
2    Jürgen
3    Janna
4    Üdyr
...

Is there a function that replaces each character for another specific one? (Not necessarily an unaccented one). Something like this:

SELECT id, function('ä,ü',name,'ae,ue');
Result:

    id   name
    1    Johaen
    2    Juergen
    3    Janna
    4    UEdyr
    ...
3
  • Possible duplicate of Multiple split and assign order_id CHECK the TRANSLATE part. How convert every char to <space> Commented Jul 27, 2016 at 16:59
  • @JuanCarlosOropeza this is little bit different task - you cannot use translate because it is designed for single characters only. Commented Jul 27, 2016 at 17:46
  • 1
    I had a similar problem and what I did was just use replace twice. Though I don’t know if that’s the best / efficient way to go about it. replace(replace(column,‘ä’,’ae’), ‘ü’,’ue’)
    – DardanM
    Commented Jul 1, 2020 at 17:39

6 Answers 6

22

replace()

If you want just to replace one or few characters you can use function replace(string text, from text, to text) that replaces all occurrences in string substring. The replace function can be used to replace one character to several characters.

translate()

If you want to translate some letters to other letters you can user function translate(string text, from text, to text) that replaces any character in a string that matches a character in the from by the corresponding character in the to set.

Some data to play with:

drop table if exists xyz;

create table xyz (
    id serial not null,
    name varchar(30)
);

insert into xyz (name) values
    ('Juhänäo'),
    ('Jürgüen'),
    ('Dannäu'),
    ('Übüdyr');

Example of replace function:

select replace(name, 'ä', 'a') from xyz;

This function replaces letter ä in the name column with letter a. Juhänäo becomes Juhanao.

select replace(name, 'ä', 'ae') from xyz;

Now it replaces letter ä with ae.

select replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue') from xyz;

Not very nice, but in the example all ä become ae, ü become ue, and Ü become 'Ue'.

update xyz set name = replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue');

Changes letters and updates rows. The result of the update is following:

Juhaenaeo
Juergueen
Dannaeu
Uebuedyr

Example of translate function:

select translate(name, 'ä,ü,Ü', 'a,u,U') from xyz;

Translates all letters ä to a, ü to u and Ü to U.

update xyz set name = translate(name, 'ä,ü,Ü', 'a,u,U');

Updates table so all predefined letters are translated and the change is saved to the database. The result of the update is following:

Juhanao
Jurguen
Dannau
Ubudyr

More information:

Replace characters with multi-character strings

Postgresql string functions

1
  • 1
    The commas in translate are superfluous; the correct syntax according to the docs would be translate(name, 'äüÜ', 'auU');. Commented Feb 10, 2023 at 10:18
11

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

6
  • I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed. Commented Jul 27, 2016 at 18:22
  • just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 ms Commented Jul 27, 2016 at 18:33
  • The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions; Commented Jul 27, 2016 at 19:05
  • Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.
    – klin
    Commented Jul 27, 2016 at 19:14
  • @PavelStehule Is my fault didnt saw the 9.5 on the title. Commented Jul 27, 2016 at 20:20
1

Avoid writing your own unaccent function. Instead, I recommend using unaccent extension.

create extension unaccent;
select unaccent('Juhänäo');
select unaccent(name) from xyz;

Here is a link to Postgres documentation for Unaccent

1

you simply use this :

select  translate(column, 'âàãáéêèíóôõüúç', 'aaaaeeeiooouuc') from table

Good luck ^^

1

If you are after German letters, then this works:

CREATE OR REPLACE FUNCTION public.udf_transliterate_german(
    german_word character varying)
    RETURNS character varying
    LANGUAGE 'sql'
    COST 100
    IMMUTABLE PARALLEL UNSAFE
AS $BODY$
SELECT REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(german_word,
                                'ä','ae'),
                            'ö','oe' ),
                        'ü','ue'),
                    'ß','ss'),
                'Ä', 'AE'),
            'Ö', 'OE'),
        'Ü', 'UE'),
    'ẞ', 'SS');
$BODY$;

It is not elegant though.

0

This is a generic recursive function written in python (but can easily be replicated in whatever language you prefer) That takes in the original string and a list of substrings to be removed:

def replace_psql_string(str_to_remove, query):

    ss = str_to_remove.split(',')
    if ss[0] != '':
        query = "REGEXP_REPLACE('{0}', '{1}', '', 'gi')".format(query, ss[0])
        return self.replace_psql_string(','.join(ss[1:]), query)
    else:
        return query

# Run it
replace_psql_string("test,foo", "input test string")

The code splits the comma separated string to a list of substrings to remove, Creates a REGEXP_REPLACE function in sql with the first substring. Then, recursively calls the function again with [1:] elements and append each REGEXP query to the passed parameter. So that finally, when no more substrings are in the list to remove, the final aggregated query is returned.

This will output:

REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi')

Then use it in whatever part of your query that makes sense. Can be tested by appending a SELECT in front:

SELECT REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi');


regexp_replace 
----------------
 input  string
(1 row)

Not the answer you're looking for? Browse other questions tagged or ask your own question.