5

I found out today that a large project like Microsoft Windows 1.0 took 80 man-years to develop. And this one was written in x86 assembly language.

Is there a form or rule of thumb that states how much more productive development would have been had the C programming language been used instead? Assuming that the developers were equally good at both and the hardware would not be a limit in terms of RAM and performance requirements?

Is it possible to estimate, say, if a project took 80 man-years in x86 assembly language, how many man-years it would take using the C programming language instead?

Are there reliable figures on this topic, i.e. on productivity and efficiency?

22
  • 6
    I don't think this is possible to estimate in general. The work involved when you're building something entirely new, like Windows, would be a lot of design work and inventing new things, that you would have to do no matter what language you were using. On the other hand, if you're doing something like "dump this database, so some processing, and import it into this other one", then you're solving a problem that is well understood and you don't have to invent anything new along the way. The latter kind of problem would benefit more from choice of language than building Windows would. Commented May 4, 2023 at 23:50
  • 7
    Now that I think of it, you might get better answers over on softwareengineering.stackexchange.com. Commented May 4, 2023 at 23:51
  • 7
    My beloved C - a language that provides all the speed and low-level access of assembly language with the readability and maintainability of ... well, assembly language :-)
    – paxdiablo
    Commented May 5, 2023 at 3:19
  • 2
    I suspect the answer relies heavy on time period. Early 1980s assembler was still used heavily and assembler was often tought in school. On the other hand, c compilers had a lousy (at best) system to optimize the code. Today few people use assembler for anything big, and the compilers today have a optimization logic that is hard for most programmers to outmatch.
    – UncleBod
    Commented May 5, 2023 at 7:54
  • 3
    Do we count how many man-years were spent waiting for the code to compile? Commented May 6, 2023 at 5:59

5 Answers 5

10

You'll never find a good single number for this sort of thing.

One factor is that software time estimation is famously difficult in-general. Nobody has ever been good at estimating how long software will take to produce in a working state. It was impossible to do in the 80's. It's impossible to do today, despite all sorts of modern work on project management tools and techniques and certifications and cults and trainings.

Another factor is that C isn't a fixed concept. You need to specify a more specific toolset. C with Address Sanitizer and a visual debugger is way easier to track down things like use after free bugs than C as implemented in the early 1980's.

Likewise, x86 Assembly has changed massively over time. Writing modern x86 Assembly means dealing with stuff like SSE vector instruction modes directly, and estimating the performance of code is incredibly difficult because instructions don't take fixed numbers of clock cycles any more. OTOH x86 Assembly was much simpler and more constrained on the original 8086 CPU so a person could sit down and learn all the relevant instructions a lot more quickly, and consult the relevant documentation a lot more conveniently because the scope was much smaller.

Because of factors like these, the question is just too underspecified to have a clear answer. Writing C is "easier" than writing Assembly, but you won't get a single number out of that. The only thing you could do is compare a list of historical projects and vaguely categorize some of them as having a "similar" complexity, but that would be a very squishy and arbitrary metric for collating the historical data, such that you could get the historical data to say whatever you wanted.

8

TL;DR The productivity gain is in the portability of the "C" language relative to Assembly.

I think you might have an unintended caveat in the question in referencing a specific dialect of Assembly (x86). The productivity issue with Assembly is that it is decidedly not portable. So, that makes it much more project-specific, and exacerbates the learning curve for programmers moving from one project to another. You could standardize all your projects on a particular ISA (e.g. x86) and a particular assembler (e.g. MASM) and a particular set of reusable assembly libraries... BUT, then you have constrained your programmers and they may actually lose productivity by having to adapt such "standards" to each new project.

So, the key with "C" and other highly-portable languages, is that programmers move amongst different projects that all use "C", and do so with a bit less of a learning curve, and that raises their productivity by a non-trivial amount.

Back to Assembly, intermediate languages that provide a level of portability have been a "boon" to making compilers more cross-platform. Using LLVM as an intermediate "language", compiler authors are freed from dealing with the non-portable native ISA's of the various processors they want to target. So, once again, productivity is actually realized through portability.

2
  • C code may be mostly portable but you have to include the usability of any given C compiler on a given platform. C compilers in the mid-1980's did not have a good reputation for optimizing code. Portable code that ran like molasses did not get you anywhere.
    – doneal24
    Commented May 6, 2023 at 18:10
  • Then again, C doesn't provide much functionality beyond very basic I/O. Every project past the most primitiv (like writing Windows) will most certainly consist mostly of project specific libraries/functions to call, thus adding the same learning curve independent of language used.
    – Raffzahn
    Commented May 8, 2023 at 8:29
4

Yes, there probably have been studies, but they are almost certainly in paywalled journals. You can find some links if you go to scholar.google.com and search for something like "programmer productivity" https://scholar.google.com/scholar?hl=en&as_sdt=0%2C37&q=productivity+programming+languages+comparison&btnG= which gives this hit for example

https://dl.acm.org/doi/abs/10.1145/2851613.2851780 "An empirical study on the effect of programming languages on productivity" SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing April 2016 Pages 1434–1439https://doi.org/10.1145/2851613.2851780

Like I said, paywalled.

2

Assuming that the developers were equally good at both and the hardware would not be a limit in terms of RAM and performance requirements? Is it possible to estimate, say, if a project took 80 man-years in x86 assembly language, how many man-years it would take using the C programming language instead?

For a project like this the difference due language is negligible.

In general, and assuming equally experienced programmers, productivity is more or less the same with all programming languages - including extremes like LISP and BASIC :)) Of course, at first sight, and in a bare bone configuration, some languages may seem more productive, but that's simply due to more built-ins/delivered standard functions/libraries.

Doing Hello-World like tasks do require less lines in BASIC (or C) than in Assembly, but that's a rather short lived advantage. Real life applications past Hello-World level rely heavily on application build libraries and functions, neglecting the influence of language constructs or default libraries.

There is no real difference between C constructs like:

 if(PointerValid(PTR))
   {
     DoStuff()
   }
 else
   {
     Error(ERRPTR)
   };

or writing Assembly (*1) like this:

       @IF   ZE
         @PASS NAME=PointerValid,PLIST=(PTR)
       @THEN
         @PASS NAME=DoStuff
       @ELSE
         ERROR ERRPTR
       @BEND

Beside that, coding is only one and often the least time consuming part of development. Defining layers, program structure, interfaces, data structures (and of course GUI) is usually way more challenging than coding those.

Writing a windowing environment means doing way more than printing "Hello World" or reading a file.


*1 - Yes, that's actual x86 assembly code like I used for many projects. I wouldn't assume MS to write code less sophisticated than we did already in the 1970s.

1
2

There's not single answer here because you have a number of compromises involved.

Calling Conventions

For example, in assembly language you get (roughly) three choices about how to pass parameters to functions. One is that you pass parameters on the stack, about like a high level language would. Another is that you specify common register usage throughout the project. A third is that you attempt to pass parameters in registers that make sense for the specific function at hand. Yet a fourth is (more or less) a combination of 2 and 3: specify common register usage on a per-type basis; for example, pass the first two pointers in SI and DI, and the first two integers in AX and DX.

A stack-based convention tends to improve productivity--f(a, b) will always be equivalent to something like:

push a
push b
call f

If you use a completely common register-based convention, it's almost as easy--something like:

mov ax, a
mov bx, b
call f

...but a function-specific convention means you'll need to look up f and see what registers it expects a and b to be passed in. A type-based system is a little easier, since you undoubtedly already know the types of a and b, and get accustomed to the convention--but it still requires a little more thought than a completely common convention (especially in assembly language, where types are often rather loosely defined).

If you care purely about productivity, the winner is almost certainly a stack-based convention. But that also reduces the code's efficiency--especially on mid-1980's processors, that didn't have a cache so every memory reference went straight to main memory.

For what it's worth: MS-DOS used a common, register-based convention throughout. Windows used a stack-based convention from the beginning.

Hardware Manipulation

In theory, Windows 1.0 was a layer on top of MS-DOS, so all hardware manipulation should theoretically have gone through DOS (or at least the BIOS). But (for one obvious example) DOS provided no access for drawing graphics at all, and although the BIOS did provide minimal support, it was grossly inadequate for Windows' needs.

At the time, graphics was mostly EGA and VGA, which had designs that were somewhat non-trivial to deal with in assembly language, but substantially more difficult in higher level languages.

At the same time, a large part of the idea of anything like Windows is for most of the code to deal with some higher level abstractions, so code that's specific to a particular piece of hardware is isolated in its own little module. So even if you used assembly language inside that module, it would be fairly easy to write the rest of the code in a higher level language.

Tools

People who previously worked in mainframe environments (especially IBMs) often overestimate the sophistication of the tools Microsoft was using in the Windows 1.0 timeframe. For an obvious example, @Raffzahn's answer assumes an assembler that supports block-structured @if statements. Microsoft started to support that in MASM 6.0, (or maybe 6.1--my memory's a bit fuzzy), which didn't become available until around the Windows 3.x timeframe--well after Windows 1.0 was obsolete.

At least to my knowledge, the source code to Windows 1.x has never been released--but source to Windows NT 4 and Win2K were both leaked decades ago, and to support running 16-bit Windows code both of those included the full source to the Windows 3.1 kernel.exe (among other things). I never looked at it personally, but from what I recall of what I've read about it, it appears to have used source code for something like MASM 4.

Optimization

Assembly language on an x86 has almost uniquely poor productivity because it has almost no such thing as a truly general purpose register. On an x86, you frequently go through some fairly nasty contortions to avoid register spillage (which tended to be quite expensive at the time).

Especially in the Windows 1.0 development time-frame (early to mid-1980's) you had to pay pretty close attention to optimization. Windows 1.0 ran entirely in real mode, so it had only 640K of RAM available. As it was, Windows itself occupied around 430K (going from memory, so that could be a little off, but not drastically). As such, Windows 1 left only around 200K of RAM for your executables to use. That made it a lot more of proof of concept than a useful tool.

Without nearly a lot of care in optimization, it probably wouldn't have even qualified as a proof of concept. For one example of careful optimization, look through the code for BitBlt. They basically created a small virtual machine to do bit blitting. When you call BitBlt, it compiles a small program for that VM on the stack, then executes it. Initially it can seem pretty weird, but once you figure out what's going on, you start to realize that it's very nice code--and very fast at what it does.

I'm honestly a bit uncertain how much difference language makes in this respect. As a really general rule of thumb, higher level languages tend to make it easier to do algorithmic improvements, while lower level languages make it easier to get fairly low-level improvements (like using registers better, so you do fewer references to main memory).

My immediate guess is that this probably favored assembly language overall. Under the circumstances, they probably cared more about memory usage than speed, and that tends to be easier to optimize with assembly language (at least in my experience).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .