1

I tried finding special characters using generic regex attributes and NOT LIKE clause but have been getting confusing results. The research suggested that it does not work the way it works in SQL Server or elsewhere.

  1. For finding if there is any character
  2. For finding if there is any number
  3. For finding if there is any special character

like '%[^0-9]%' or '%[^a-Z]%' does not work very well when finding if non-numeric data is available and if non-alphabetical data is present, respectively

SELECT column1 from some_table where column1 like '%[^0-9]%'; 
SELECT column1 from some_table where column1 like '%[^a-Z]%' 
SELECT column1 from some_table where column1 like '%[^a-Z0-9]%' 

Have also noted that people use -> NOT like '%[^0-9]%'

1
  • 1
    LIKE does not support regular expressions in SQL
    – user330315
    Commented May 7, 2019 at 20:43

2 Answers 2

4

Postgres LIKE does not support regular expressions.
You need the regular expression operator ~.

Standard SQL also defines SIMILAR TO as an odd mix of the above, but rather don't use that. See:

For finding if there is any character

... meaning any character at all:

... WHERE col <> '';                        -- any character at all?

So neither NULL nor empty. See:

... meaning any alphabetic character (letter):

... WHERE col ~ '[[:alpha:]]';              -- any letters?

[[:alpha:]] is the character class for all alphabetic characters - not just the ASCII letters [A-Za-z], includes letters like [ÄéÒçòý] etc.

For finding if there is any number

... meaning any digit:

... WHERE col ~ '\d';                       -- any digits?

\d is the class shorthand for [[:digit:]].

For finding if there is any special character

... meaning anything except digits and letters:

... WHERE col ~ '\W';                       -- anything but digits & letters? 

\W is the class shorthand for [^[:alnum:]_] (underscore excluded - the manual is currently confusing there).

... meaning anything except digits, letters and plain space:

... WHERE col ~ '[^[:alnum:]_ ]'            -- ... and space

That's the class shorthand \W spelled out, additionally excluding plain space.

... meaning anything except digits, letters and any white space:

... WHERE col ~ '[^[:alnum:]_\s]'           -- ... and any white space
... WHERE col ~ '[^[:alnum:]_[:space:]]'    -- ... the same spelled out

This time excluding all white space as defined by the Posix character class space. About "white space" in Unicode:

... meaning any non-ASCII character:

If your DB cluster runs with UTF8 encoding, there is a simple, very fast hack:

... WHERE octet_length(col) > length(col);  -- any non-ASCII letter?

octet_length()counts the bytes in the string, while length() (aliases: character_length() or char_length()) counts characters in the string. All basic ASCII characters ([\x00-\x7F]) are encoded with 1 byte in UTF-8, all other characters use 2 - 4 bytes. Any non-ASCII character in the string makes the expression true.

Further reading:

2
  • Worked like a charm! Thank you so much for putting the effort. ... WHERE column1 ~ '\W'; worked but it also shows records with spaces. For that, I used the AND like '% %'; (there is <space> between '% %'). This eliminates records which have spaces and gives me results of records which has special characters
    – Vish_er
    Commented May 8, 2019 at 20:59
  • 1
    I added some more to handle white space additionally. Commented May 8, 2019 at 22:18
1

The problem is that you are using LIKE incorrectly. These patterns are not recognized by LIKE.

Use ~ for regular expression matching:

select column1 from some_table where column1 ~ '[^a-Z0-9]' 

or more aptly:

select column1 from some_table where column1 ~ '[^a-zA-Z0-9]'

This will return any column that has a character not specified in the character class.

Here is a db<>fiddle.

3
  • This does not work for me. I tried ~ '[^a-Z]' to find out if I can get non-character values but it just gave me names of the cities which did not start with 'a' or 'Z'. Any thoughts?
    – Vish_er
    Commented May 8, 2019 at 12:23
  • @VishwalShah . . . The regular expression matches a city name that has any non-letter anywhere in the name. I would assume that it just happens to return city names that start with different characters first. Commented May 8, 2019 at 12:26
  • It should return non-letter based records but it is just showing clean letter-based records. Your assumption might be right but it still does not show me any non-letter based records
    – Vish_er
    Commented May 8, 2019 at 12:34

Not the answer you're looking for? Browse other questions tagged or ask your own question.