9

Problem

I plan to load a CSV with more 10 million records into PostgreSQL v12.1, one of its columns has "categorical" values, so creating an enumerated type for it seems to be a good choice, but it holds 208 categories.

The shortest field is 2 and longest is 11 character long. The Average of all fields is 2.4. The character encoding is UTF8, but all characters are ASCII.

Questions:

Which type should I use enumerated or varchar?

Additional info

I discard char because the official PostgreSQL documentation states the followig about char, varchar and text:

Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

An enum value in PostgreSQL occupies 4 bytes on the disk (see 8.7.4. Implementation Details). Considering this and 2.4 average string length using the enum type would lead a slightly higher disk usage (Short stings in PostgreSQL needs one extra byte disk space). Still I have the intiution that using enum is a better choice, because its implementation makes many operations faster against it.

2 Answers 2

12

With an average of 2.4 characters (more relevant: avg bytes - but that's the same for all ASCII characters) I would not bother to use enums. Those occupy 4 bytes on disk plus, possibly, alignment padding. (text does not require alignment padding.) You are not even saving storage and get more overhead for it.

With most values below 7 characters (= 8 bytes on disk), an index on a text category column will also be only slightly bigger than one on an enum. (Space for data is (typically) allocated in multiples of 8 bytes.)

For a fixed number of 208 categories, a "char" encoding (not to be confused with char!) might be an option to save storage. See:

But, again, not worth the trouble for such small strings. Just use text. Maybe enforce correctness with a FK constraint to a category table like:

CREATE TABLE category (category text PRIMARY KEY);

Also a good place to store additional information per category. And you can easily modify the set of categories. Make the FK constraint ON UPDATE CASCADE and you can change category names in one central place. Make it ON DELETE SET NULL , and you can easily remove a category. Etc.

Related:

7

I fully support Erwin's answer, but I wanted to add a warning against enums.

Enums are a good choice if you have a fixed number of possible values that can never change (at least there must be a guarantee that no values would have to be removed).

In all other cases, you should not use enums: It is impossible to remove an enum value once you have added it.

For example, when choosing a data type for a column that contains a US state, I would not choose an enum — unlikely as it is, it could be that a state secedes, or that two states unite.

Based on how you describe the data, I would not recommend enums in your case.

5
  • 2
    Thank you, this warning most likely will save me from a lot of headache in the future.
    – atevm
    Commented Jan 31, 2020 at 15:20
  • Why should there be a guarantee that the enum values do not change? Isn't it possible to alter the type to add/drop an attribute? Commented Dec 15, 2021 at 8:00
  • @MortezaMilani You can add values, but not drop them. They could be referenced in tables. Commented Dec 15, 2021 at 8:43
  • @LaurenzAlbe Technically, we can drop them. What you probably mean is that we have to handle dependant data before. Using your example, we should replace all data with the new state name and then drop the unused value. The same would be required when using a foreign key. Commented Dec 17, 2021 at 22:40
  • I invite you to try and drop a value from an enum. You will see that you cannot do that. Commented Dec 18, 2021 at 15:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.