0

I have columns in my application that have values ​​that always have 4 digits, e.g. '0100' or '1230'. One of examples could be Merchant Category Code.

For the time being, I have set the type of this column to char (4). I have many columns of this type. One of these types of columns is also included in nonclustered UNIQUE index. I would like to establish one approach to this type of column.

Is it better to use char (4) or maybe smallint, padding in the application by adding ' ' or 0 to the beginning?

The table in which I have these columns is partitioned and will contain about 300 million records. UPDATE will probably not be executed on columns with this type. They will also not appear in WHERE (maybe sometimes in ad hoc reports).

Which data type should I choose?

It's like a zip code, it can be numeric but you must store leading zeros

Exactly. I am thinking what is better approach in terms of performance. Also have to take into consideration that if I store it as smallint, the application code has to parse it to string and add leading zeros.

0

3 Answers 3

6

The standard clearly says "four-digit code", which in my view means it should always contain 4 numeric characters, which in turn means the appropriate data type is char(4).

If you want to ensure the code value adheres to ISO 18245, store it as char(4) in the database along with a check constraint.

7
  • why not smallint? Commented Oct 25, 2020 at 17:29
  • @NikitaSerbskiy because unless you have business rules prohibiting codes starting with one or more 0s, your code is a 4-character string, not a number.
    – alroc
    Commented Oct 25, 2020 at 17:55
  • @alroc it seems author is ready to adjust his application code to support smallint and asked what would be better in terms of performance Commented Oct 25, 2020 at 18:47
  • I would avoid a smallint because the data is not numeric (in the sense you would do arithmetic operations) and fields like that tend to evolve over time to include alpha characters. Additionally, coded fields often have structure (i.e. first digit means X, second digit means Y)
    – Jim D
    Commented Oct 27, 2020 at 3:12
  • 1
    @NikitaSerbskiy what about every other system that expects a char(4)? When you buy a airplane ticket the number is 014-1234567890 where 014 is the airline code. Not 14, 014. Some systems may use 0141234567890 but none will use 14. You really, really don't want to mix those two up, as everyone (airlines, travel agents, airports etc) use those specific digits as an identifier. If you used smallint you'd have to convert it back to char(3) at every point - all client applications, all imports, all exports. Commented Oct 27, 2020 at 8:39
2

The document says, smallint data type storage size is 2 Bytes and char(n) data type storage size is every character equal to 1 byte.

char [ ( n ) ] Fixed-size string data. n defines the string size in bytes and must be a value from 1 through 8,000. For single-byte encoding character sets such as Latin, the storage size is n bytes, and the number of characters that can be stored is also n.

smallint -2^15 (-32,768) to 2^15-1 (32,767) 2 Bytes

The data type size consequently will affect your index size, query result size, memory use.

But on the other side, you have to think about performance losing when adding 0, ('0121').

I think, there is no correct answer to the question. You can't decide which one is better without testing.

0
1

If you want the best of both worlds, you can store it as CHAR(4) and add a persisted computed column to the table that uses the CONVERT function to store it as an int too. This way if you need to report off of it and use it as a predicate, it's more efficient and you can even create indexes on it.

That being said, I don't think CHAR(4) would cause much difference in performance in your predicates anyway even at 300 million records, but only way to know is to test it. Note the persisted computed column will make the table consume more disk space too since it's essentially a materialized column at that point.

Not the answer you're looking for? Browse other questions tagged or ask your own question.