Problem
I plan to load a CSV with more 10 million records into PostgreSQL v12.1, one of its columns has "categorical" values, so creating an enumerated type for it seems to be a good choice, but it holds 208 categories.
The shortest field is 2 and longest is 11 character long. The Average of all fields is 2.4. The character encoding is UTF8, but all characters are ASCII.
Questions:
Which type should I use enumerated or varchar?
Additional info
I discard char because the official PostgreSQL documentation states the followig about char, varchar and text:
Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
An enum value in PostgreSQL occupies 4 bytes on the disk (see 8.7.4. Implementation Details). Considering this and 2.4 average string length using the enum type would lead a slightly higher disk usage (Short stings in PostgreSQL needs one extra byte disk space). Still I have the intiution that using enum is a better choice, because its implementation makes many operations faster against it.