32

Inspired by this question on ASCII, I have wondered similar things about EBCDIC.

At work we have an EBCDIC file that gets sent to a mainframe (I presume an IBM one) and to view it on my laptop I needed to run a command to convert it. dd if=blah.ebcdic conv=ascii > blah.txt Before I found that command I took a peek at the code page to see if I could whip something up myself.

Like ASCII you can shift a bit to get from lowercase to uppercase (0x8_ to 0xc_ is one bit different). However, the cases are not contiguous themselves. The low bits 0x_a to 0x_f are skipped. Is there a reason?

Also like ASCII, the numbers' low bits match the number they represent.

EBCDIC Code page

9
  • 14
    See en.wikipedia.org/wiki/EBCDIC for a start, and note the relationships with punched cards and not wanting holes too close to each other for structural integrity.
    – Jon Custer
    Commented Jun 26, 2019 at 16:39
  • 2
    @JonCuster thanks for the insight, can you post the relation with punch cards as an answer so I can give it an upvote? If you would rather not I can post it myself, I just don't want you to feel like I'm "stealing" it. Commented Jun 26, 2019 at 17:52
  • 2
    feel free to steal! It has been a long time since I used punch cards (or dropped them on the floor).
    – Jon Custer
    Commented Jun 26, 2019 at 17:56
  • I'm not convinced by logic about avoiding card damage, for two reasons. One is that you often get long runs of holes in the top three rows from alphabetic data. The other is that IBM also used "column binary" format cards where the 24 positions in two rows represented 3 8-bit bytes. Storing binary data (e.g. executable file images) in that format, about 50% of the holes on every card were punched, and that never gave any problems. (We used to ship executable code in column binary format to customers who didn't have any compatible mag tape drives, and it never gave us any transmission errors).
    – alephzero
    Commented Jun 26, 2019 at 18:30
  • 3
    Radix-sorting cards that contain nothing but letters, numbers, and blanks requires two passes per character position. The first pass sorts cards into one of ten bins based upon the bottom nine rows, and the second sorts them into one of four bins based on the top three. Using more complicated hole patterns would necessitate the use of more passes or more complicated sorting apparatus.
    – supercat
    Commented Jun 26, 2019 at 18:41

2 Answers 2

28

There is a clue in the name - BCD stands for "binary-coded decimal", where 4 bits are used to represent 1 decimal digit (0-9). The hexadecimal values A-F are not used in BCD.

EBCDIC is an extended version of BCDIC, and it shifts BCDIC alphanumerics, and inserts characters in some of the non-decimal positions. But there's a simple relationship to ease conversion of BCDIC to EBCDIC.

8
  • 2
    I suppose this begs the question why BCDIC encoding is not contiguous but as Jon Custer mentioned in a comment it has to do with punch cards and ensuring the holes are not too close together. Commented Jun 26, 2019 at 17:27
  • 5
    BCDIC has the same issue, "binary coded decimal" uses 4 bits to encode digits from 0-9, which means hex values a-f will generally not be used. The gaps where the a-f ranges fall will naturally lead to non-contiguous encodings.
    – Ken Gober
    Commented Jun 26, 2019 at 17:56
  • 5
    @CaptainMan It doesn't beg the question. It raises the question. See en.wikipedia.org/wiki/Begging_the_question. Commented Jun 27, 2019 at 22:22
  • 7
    Y'all are thinking too much in terms of bits and bytes in the way we use them today. Computers haven't always been base-2. It's only because 8-bit bytes and base-2 (and particularly, 2s-compliment math) are common today that these questions make sense. There is no "space" between "9" and "0" in a decimal computer. Commented Jun 27, 2019 at 23:43
  • 2
    @JulieinAustin there is however a space between 89 and 91. "Because it's decimal" doesn't justify why the 0 column is also skipped.
    – OrangeDog
    Commented Jun 28, 2019 at 15:00
21

As pointed out by Jon Custer, part of the reason is due to the input at the time being punch cards. If holes were close together there was a risk of the card being unreadable or ripping.

In addition, this punch card from the Wikipedia article helps explain why both uppercase and lowercase end at 0x_9. The punch card only goes from 0 to 9. I don't know how A through F were entered, maybe different cards or multiple holes (or maybe Wikipedia is wrong and this is for BCDIC, not EBCDIC).

EBCDIC punch card

14
  • 7
    A..F wasn't entered at all, as input was decimal. Mainframes where made to cranc out invoices, all decimal in dollars and cents (or whatever else was used to create debt). Maiking them binary was already an odd move creating a lot of fights between designers :))
    – Raffzahn
    Commented Jun 26, 2019 at 20:53
  • 7
    That card is a standard IBM punched card that uses 12 positions for encoding. Each of the decimal digits is represented by a hole in one of 10 positions. Each letter is represented by a hole in one of three extra positions and one of the digit positions. Other characters are represented by two or three holes in various combinations. BCDIC is a way of compressing the 12 bit code of the card into only 6 bits.
    – JeremyP
    Commented Jun 26, 2019 at 22:30
  • 1
    I'm not sure what you mean by "how were A through F encoded". They were encoded in exactly the same way as on that punched card. This is a character encoding, not a number encoding.
    – JeremyP
    Commented Jun 26, 2019 at 22:34
  • 7
    @supercat The punched card code came first. There was no need to be able to encode 0xa to 0xf, they couldn't be expressed on the punched card.
    – JeremyP
    Commented Jun 27, 2019 at 9:10
  • 3
    @Raffzahn it just so happens that the file that originally started all my curiosity is for sending out invoices. :) Commented Jun 27, 2019 at 14:50

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .