2

I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values - around 20 elements at most.

I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?

Example:

a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'

Columns are only 5, so the output would be like:

c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2 
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'
3
  • Does the order of result rows matter? If so, how? Commented Feb 20, 2021 at 12:21
  • @ErwinBrandstetter Yes, the order in which they are in. There's no logic part in the order. Thanks
    – Martin2000
    Commented Feb 20, 2021 at 12:27
  • @ErwinBrandstetter Hello. Is there any function or a query that puts the values from the right side into the columns? Like a1 value under c5, b1 under c4, and c1 under c3?
    – Martin2000
    Commented Feb 23, 2021 at 9:07

4 Answers 4

5

It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.

While stuck with your current situation ...

For known small maximum number of elements

A simple solution without trickery or recursion will do:

SELECT id, 1 AS rnk
     , split_part(csv, ', ', 1) AS c1
     , split_part(csv, ', ', 2) AS c2
     , split_part(csv, ', ', 3) AS c3
     , split_part(csv, ', ', 4) AS c4
     , split_part(csv, ', ', 5) AS c5
FROM   tbl
WHERE  split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
     , split_part(csv, ', ', 6)
     , split_part(csv, ', ', 7)
     , split_part(csv, ', ', 8)
     , split_part(csv, ', ', 9)
     , split_part(csv, ', ', 10)
FROM   tbl
WHERE  split_part(csv, ', ', 6) <> '' -- skip empty rows

-- three more blocks to cover a maximum "around 20"

ORDER  BY id, rnk;

db<>fiddle here

id being the PK of the original table.
This assumes ', ' as separator, obviously.
You can adapt easily.

Related:

For unknown number of elements

Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...

-- for any number of elements
SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 1) AS c1
     , split_part(c.csv5, ', ', 2) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 4) AS c4
     , split_part(c.csv5, ', ', 5) AS c5
FROM   tbl t
     , unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER  BY t.id, c.rnk;

db<>fiddle here

This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)

The regular expression pattern is the key: '((?:.*?,){4}.*?),'

(?:) ... “non-capturing” set of parentheses
() ... “capturing” set of parentheses
*? ... non-greedy quantifier
{4}? ... sequence of exactly 4 matches

The replacement '\1;' contains the back-reference \1.

'g' as fourth function parameter is required for repeated replacement.

Further reading:

Other ways to solve this include a recursive CTE or a set-returning function ...

Fill from right to left

(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:

SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 5) AS c1
     , split_part(c.csv5, ', ', 4) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 2) AS c4
     , split_part(c.csv5, ', ', 1) AS c5
FROM ...

db<>fiddle here

3
  • Thank you so much for the help. Question- what does '<>' this means in WHERE clause? Though I am new to this. Is there any option to do it with hard coding like what if in the future there is a row with more than 20 elements? Please help.
    – Martin2000
    Commented Feb 20, 2021 at 12:45
  • @Martin2000: <> is the "not equal" operator. If the first column is not equal to the empty string (''), then the row isn't noise. See: stackoverflow.com/a/23767625/939860 Commented Feb 20, 2021 at 12:49
  • @Martin2000: I added solutions for an arbitrary number of elements, and to fill from right to left. Commented Feb 24, 2021 at 12:19
3
CREATE UNLOGGED TABLE foo( x TEXT );
\copy foo FROM stdin
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
\.

From lines to single column...

SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (SELECT unnest(string_to_array(x,', ')) u from foo) y;
 r | u
---+----
 0 | a1
 0 | b1
 0 | c1
 0 | a2
 0 | b2
 1 | c2
 1 | d2
...etc

...and back to lines of known length.

SELECT r,array_agg(u) a FROM (
 SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (
  SELECT unnest(string_to_array(x,', ')) u from foo) y) y1 
GROUP BY r ORDER BY r;
 r |    a
---+------------------
 0 | {a1,b1,c1,a2,b2}
 1 | {c2,d2,e2,f2,a3}
 2 | {b3,c3,d3,e3,f3}
 3 | {g3,h3,i3,j3,a4}
 4 | {a5,b5,c5}

After this you can insert it into a table using a[] for each column. What to do with the last line is left as an exercise to the reader...

1

Answer to related question: How to put values starting from the right side into columns?

The accepted great answer from @ErwinBrandstetter can be easily adapted to required right-to-left output.

You just need to change to order of the split parts. So you don't return split parts 1-5 and 6-10 but 5-1 and 10-6:

demo:db<>fiddle

SELECT id, 1 AS rnk
     , split_part(csv, ', ', 5) AS c1
     , split_part(csv, ', ', 4) AS c2
     , split_part(csv, ', ', 3) AS c3
     , split_part(csv, ', ', 2) AS c4
     , split_part(csv, ', ', 1) AS c5
FROM   tbl
WHERE  split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
     , split_part(csv, ', ', 10)
     , split_part(csv, ', ', 9)
     , split_part(csv, ', ', 8)
     , split_part(csv, ', ', 7)
     , split_part(csv, ', ', 6)
FROM   tbl
WHERE  split_part(csv, ', ', 6) <> '' -- skip empty rows

-- more?

ORDER  BY id, rnk;
1
  • What do you mean with "strong logic"? Sry, the question is not quite clear to me.
    – S-Man
    Commented Feb 23, 2021 at 14:58
0

You need to do this in the whatever backend layer you are using.

First, convert the CSV rows to array of string

Then, use logic something like this to add Values to the database

int row = 0; // database row index - can be used to just have a count

final int MAX_COLUMNS = 5;

for(int i = 0; i<rows.length; i++) {
    // Convert csv row string to array of each value.
    String [] values = rows[i].split(",");

    // Dividing whole row into chunks of size of number of columns

    for(int j = 0; j < (values.length/(MAX_COLUMNS)) + 1; j++) {
       Add Values [MAX_COLUMNS*j,MAX_COLUMNS*j+(MAX_COLUMNS - 1)] to the row [row + j]
      row++;
    }

}
1
  • Could you please tell me in Postgres? I am new to this and don't know the syntax you are using or how to convert your syntax into Postgres.
    – Martin2000
    Commented Feb 20, 2021 at 16:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.