Why are these DOS console drivers wasting precious bytes?

Question

While doing some research on DOS device drivers, I took a peek at the console drivers DISPLAY.SYS and ANSI.SYS that are part of the DOS 6.20 installation. Both have "Microsoft" stamped on, and so I'm not surprised to see that one copied some code from the other.
I could tell because they copied the errors as well:

In their search for a previous CON: driver, both DISPLAY.SYS and ANSI.SYS think that the driver chain ends with a full doubleword of -1. This is not true. On DOS 6.20, and even on DOS 2.11, the COM4 driver ends the chain with a Next Driver Pointer of 0070:FFFF. Luckily, this error will never show because the search is bound to be a successful one. There will always be the default CON: driver present.

In their search for a previous CON: driver, both DISPLAY.SYS and ANSI.SYS also don't properly check the 8th character of the device name.

push di             ; CX=8
push si
lea  di, [di+10]
lea  si, [si+10]
repe cmpsb
pop  si
pop  di
and  cx, cx         <<<<<< Anything goes because of this!
jne  <NotFound>
<Found>

Both DISPLAY.SYS and ANSI.SYS have support for 20 driver functions, but ANSI.SYS erroneously can dispatch to a non-existing 21st function.
```
cmp  al, 20
ja   <HasError>
<IsFine>
```

What amazed me is the way that these codes deal with the loading and storing of segment registers and the stack pointer. Often (but not always) they use an intermediate general purpose register. See next 4 snippets:

mov  ax, [es:di+6]
mov  [cs:bx+4], ax
mov  ax, es               <<<<<<
mov  [cs:bx+6], ax        <<<<<<  mov [cs:bx+6], es
mov  ax, [es:di+8]
mov  [cs:bx+8], ax
mov  ax, es               <<<<<<
mov  [cs:bx+10], ax       <<<<<<  mov [cs:bx+10], es

mov  di, [cs:0BABh]       <<<<<<
mov  es, di               <<<<<<  mov es, [cs:0BABh]
mov  di, [cs:0BADh]

cli
mov  si, sp               <<<<<<
mov  [cs:04D4h], si       <<<<<<  mov [cs:04D4h], sp
mov  si, ss               <<<<<<
mov  [cs:04D2h], si       <<<<<<  mov [cs:04D2h], ss
mov  ax, 04D0h            <<<<<<
mov  si, cs
mov  ss, si
mov  sp, ax               <<<<<<  mov sp, 04D0h
sti

cli
mov  ax, [cs:04D4h]       <<<<<<  
mov  si, [cs:04D2h]       <<<<<<
mov  ss, si               <<<<<<  mov ss, [cs:04D2h]
mov  sp, ax               <<<<<<  mov sp, [cs:04D4h]
sti

I have checked the surrounding code for any dependencies on the values in the general purpose registers that were used, and found none.

So my question is: Why did the programmer(s) not write the shortest code possible in an OS that is confined to conventional memory? The 8086 has always allowed the shorter and faster instructions that I have added on the right hand side of the above snippets.

Somewhat related is their frequent use of the LEA <reg>, <var> instruction instead of the one byte shorter MOV <reg>, OFFSET <var> when it comes to loading an address.
This is definitely a "MASM thingy". I have seen this many times before in MASM programs. It would seem that having to insert OFFSET is a bit demanding of our natural born laziness.

The question that remains is: why using intermediate general purpose registers when loading or storing segment registers and the stack pointer?

Are you sure the code is written in assembly? The inefficient patterns you observed look more like code a C compiler would generate, especially at low or no optimization. — Michael Karcher, Commented Mar 27, 2022 at 20:15
@MichaelKarcher Good point. I did not consider that possibility. — Sep Roland, Commented Mar 27, 2022 at 20:20
By the time of MS-DOS 5 or higher, we're in the early 90s, with 386-or-higher machines having megabytes of memory, and most software capable of using EMS and XMS. I don't think there was any necessity, technical or business-wise, to optimize away a couple of bytes. — Michael Graf, Commented Mar 27, 2022 at 20:26
@MichaelGraf Even though my machine had 4 MB, the conventional memory always fell short. And sure, Microsoft didn't feel the necessity to optimize individual components, instead they burdened us with MEMMAKER.EXE... — Sep Roland, Commented Mar 27, 2022 at 20:37
I think the question would be much more efficient if you trimmed down the first few paragraphs in which you describe your discoveries concerning the poor programming of these drivers. You're only asking about the wasted bytes, right? — Schmuddi, Commented Mar 28, 2022 at 7:59

user3840170 · Accepted Answer · 2022-11-13 18:41:42Z

Given that these snippets of assembly code look as if mechanically translated, one might suppose that they indeed were: that they are outputs of a compiler of a higher-level language or an assembler macro. This seems pretty reasonable; compilers of the time were rather awful at code generation and register allocation, while assembler macros are even more primitive, merely performing token or string substitution. It would have been a pretty good guess. Nevertheless, I happen to know that it is wrong. The driver was written directly in assembly language, and in fact, the asker’s disassembly is pretty much exactly how that code was originally written by the programmer; no macros were involved. How could I possibly know this? Let’s say, a little bird told me.

However, the little bird is rather tight-beaked about why this is the case. Deprived of authoritative sources, I am resigned to speculate. There are a handful of plausible reasons why someone might deliberately code something in such a roundabout way:

CPU errata. Sometimes hardware has bugs, and those were about as prevalent during DOS’s development as they are today. Working around those bugs involves replacing natural instruction sequences with less obvious circumlocutions. But I am having a hard time finding any specific erratum describing a bug affecting segment register loads with a memory source operand. The only remotely relevant one I managed to find is that an interrupt occurring immediately after a mov ss instruction may result in memory corruption on early steppings of original 8088, but even here, the kind of source operand doesn’t seem to matter.
Better instruction timings. Sometimes sequences of simple instructions can execute faster than a more complex instruction. But alas, this doesn’t seem to be the case here. According to this timings table, a mov r16, m16 / mov sreg, r16 pair was always slower than a direct mov sreg, m16 on processors that were in mainstream use when DOS was written.
Backwards compatibility. Software vendors may preserve specific opcode sequences in order to preserve compatibility with other software that may be looking for them, possibly to patch them. This was especially common in DOS’s time, when concepts like ‘stable API surface’ were not particularly well-established in the programmers’ collective consciousness. And so, code that was originally written suboptimally might be kept that way in order to preserve a well-known fixed opcode pattern; or alternatively, modifications to what originally was good code may be written in a suboptimal way to keep a recognisable opcode pattern. But the code the asker provided doesn’t seem to be a particularly attractive target for patching. This does not seem right either.

With more charitable hypotheses eliminated, my strongest one is that this was simply a force of habit. The fact that there is no opcode for loading an immediate into a segment register was pretty well-known; so is the fact that one can always move data between segment registers and general-purpose registers. The fact one might as well use a memory location instead of a general-purpose register as the other operand seems comparatively more obscure. So much so, in fact, that I happened to find a StackOverflow post whose author was convinced this was not possible while researching this very answer. The hapless Microsoft coder who wrote that assembly might simply have had the same misconception, and therefore used the one workaround he knew out of habit, without stopping to think whether it was even necessary. As the asker points out, sloppy code wasn’t even all that rare at Microsoft. I happen to have found and documented a couple of considerably worse instances thereof myself.

All the above said, I doubt spending the effort to find more compact instruction encodings would have been very fruitful anyway. I suspect it might have saved them a few tens of bytes at best; this is not an amount that might tip the scales of whether a large program will fit in memory or not.

Another possibility is that programmers who ran into the lack of mov sreg,imm before they would have had occasion to use mov sreg,m16 and recognized the need to use the pattern mov reg16,imm/mov sreg,r16 as a workaround, may have assumed that the same workaround would be needed for mov sreg,m16. I know that I was writing x86 assembly for quite awhile, and applied the needless "workaround" quite a few times, before I discovered the availability of mov sreg,m16. — supercat, Commented Mar 28, 2022 at 16:41
@supercat Yes, that’s basically what I was referring to in the next-to-last paragraph. — user3840170, Commented Mar 28, 2022 at 16:44
When I went to college, that misconception (there is no segment-register <-> memory mov instruction was being taught). — Joshua, Commented Mar 29, 2022 at 1:10

Raffzahn · Accepted Answer · 2022-03-28 22:53:58Z

As Michael Karcher already points out in a comment, the shown sequences are quite typical for a compiler at the time. I'd say most likely C. That or use of a standardized macro set.

This is not only marked by the way far pointers (they are all far pointers) are first moved into a register (pair), but as well which registers are use. For example while the generic way is using AX, it's (AX) DI when it's about a data pointer and (AX) SI when it's a stack pointer. These are registers most likely to be available at the time.

Similar the use of LEA. Using a straight move immediate (which is what a OFFSET <var> as second parameter generates) may save a byte, but it's only usable for this singular case when an address offset can be generated during compile time. LEA in contrast can work with any address construct possible as calculation happens at runtime. For compiler construction it provides a simple way to encode any loading of an effective offset independent of it's structure.

After all, LEA is part of the 8086's orientation toward high level languages, a good reason for it's long term success. That, in the late 1980s, these high level services (like ENTER/LEAVE as well) got shunned at, is a prime case of staircase wit. Long story short, it really simplified code generation for compilers.

Last but not least it's Microsoft. Microsoft was well known, at least during 8 bit times, to use loads of macros to encapsulate complex operation, especially pointer handling. MS-BASIC is full thereof. It was done not only to simplify programming by providing easy ways to do what on 8 bit CPUs may be anywhere from 1 to 10 instructions, but as well to ease porting between different CPU architectures

As an additional note, I miss some information about what parts these code snippets are in. Driver or loader?

If these 'long' pointer operations are only within the loader part, which gets discarded anyway, then saving a few bytes and clock cycles in instructions, which are only once executed during the whole lifetime of a driver, isn't worth it. The advantage of using proven code sequences weights way more. Optimization may only add potential problems.

Now, what you call "MASM thingy" is not really related to the MASM package, but a simple case of more abstract programming. Being a life long assembly programmer, I did also, almost exclusive, use LEA when loading an address (offset). Doing so prevents quite hard to find addressing errors. Of course it might be influenced by many years of /370 programming, where LA, Load Address, is the way to calculate any address - except precalculated constants that is, which are rare and explicit anyway.

Comments are not for extended discussion; this conversation has been moved to chat. — Chenmunka, Commented Mar 29, 2022 at 17:28
All snippets come from the resident part of the driver. The larger than necessary footprint influences the amount of free application memory. — Sep Roland, Commented Apr 3, 2022 at 13:25

Will Hartung · Accepted Answer · 2022-03-28 14:43:36Z

12

So my question is: Why did the programmer(s) not write the shortest code possible in an OS that is confined to conventional memory?

Because they had deadlines, other things to work on and "good enough to ship" is always a viable success criteria. Everything was fast moving, yet everything was hard and slow to develop. For many projects the work flow "Make it work, make it fast, make it small" stopped at "make it work", because it's not just good, it's good enough.

answered Mar 28, 2022 at 14:43

Will Hartung

12.4k1 gold badge27 silver badges53 bronze badges

Add a comment |

Stack Exchange Network

Why are these DOS console drivers wasting precious bytes?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
ms-dos
assembly
8086
driver
.

Linked

Hot Network Questions

Why are these DOS console drivers wasting precious bytes?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged ms-dosassembly8086driver.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
ms-dos
assembly
8086
driver
.