17
$\begingroup$

Why can high-level languages seemingly never reach lower-level languages in terms of speed? Examples of high-level languages would be Python, Haskell, and Java. Low-level languages would be trickier to define, but let's say C. Comparisons can be found all around the Internet$^1$ and they all agree that C is a lot faster, sometimes by a factor of 10 or more.

What is it that causes such big difference in performance and why can't high-level languages catch up?

Initially I believed that this is all the compilers' fault and that things will improve in the future, but some of the most popular higher-level languages have been around for decades and they still lag behind when it comes to speed. Can't they simply compile to a similar to C syntax tree and then follow the same procedures that generate the machine code? Or perhaps it's something to do with the syntax itself?


$^1$ Examples:

$\endgroup$
7
  • 7
    $\begingroup$ "and they all agree that C is a lot faster" -- that's certainly wrong. $\endgroup$
    – Raphael
    Commented May 26, 2016 at 20:26
  • 2
    $\begingroup$ Anyway, I think is post is best answered by my answer to a similarly ill-conceived question; duplicate? $\endgroup$
    – Raphael
    Commented May 26, 2016 at 20:28
  • 2
    $\begingroup$ See also here and here. Beware the junk answers. $\endgroup$
    – Raphael
    Commented May 26, 2016 at 20:31
  • $\begingroup$ Relevant: stackoverflow.com/questions/6964392/… The myth of all "high-level languages" being slow is quite sad. $\endgroup$
    – xji
    Commented Jun 4, 2016 at 13:47
  • $\begingroup$ I agree with others that abstraction != slowness, but I'm afraid there is a much larger issue that computer science teachers (I was one) are not aware of. That is that for real programs (1000 lines of code and up) there are many ways to do them, and those can differ in performance by orders of magnitude. Just thinking about big-O totally misses the point. Check here. $\endgroup$ Commented Jul 12, 2016 at 22:07

4 Answers 4

21
$\begingroup$

Debunking some myths

  1. There is no such thing as a fast langauge. A language can generally produce fast code, but different languages will excel on different benchmarks. We can rank languages on a particular set of flawed benchmarks, but cannot rank languages in a vacuum.

  2. C code tends to be faster because people who need every inch of performance use C. A statistic that C is faster "by a factor of 10" might be inaccurate, because it might be that the people using Python just don't care as much about speed and didn't write optimal Python code. We see this in particular with languages like Haskell. If you try really hard, you can write Haskell that performs on par with C. But most people don't need that performance, so we have a bunch of flawed comparisons.

  3. Sometimes, it is unsafety, not abstraction, that makes C fast. Its lack of array-bounds and null-pointer checks save time, and have been the cause of numerous security holes throughout the years.

  4. Languages aren't fast, implementations are fast. Many abstract languages start slow, because speed is not their goal, but get faster as more and more optimizations are added.

The abstraction vs speed tradeoff is perhaps inaccurate. I would suggest a better comparison:

Simplicity, speed, abstraction: Choose two.

If we're running identical algorithms in different languages, the question of speed comes down to the problem of "How much stuff do we need to do at runtime to make this work?"

In a highly abstract language that is simple, such as Python or JavaScript, because we don't know about the representation of data until runtime, there end up being lots of pointer de-referencing and dynamic checks happening at runtime, which are slow.

Likewise, there are a lot of checks that need to be done to ensure that you don't wreck your computer. When you call a function in python, it needs to make sure that the object you're calling actually is a function, since otherwise you could end up executing random code that does terrible things.

Finally, most abstract languages have overhead from garbage collection. Most programmers have agreed that having to track the lifetimes of dynamically allocated memory is a pain, and they'd rather have a garbage collector do it for them at run time. This takes time, that a C program doesn't need to spend on GC.

Abstract Doesn't mean slow

However, there are languages which are both abstract and fast. The most dominant in this era is Rust. By introducing a borrow checker and a sophisticated type system, it allows for abstract code, and uses compile-time information to reduce the amount of work that we need to do at runtime (i.e. garbage collection).

Any statically typed language saves us time by reducing the number of runtime checks, and introduces complexity by requiring us to please a typechecker at compile-time. Similarly, if we have a language that encodes whether a value can be null into its type system, we can save time with compile time null-pointer checks.

$\endgroup$
3
  • 3
    $\begingroup$ "C code tends to be faster because people who need every inch of performance use C" -- exactly. I want to see benchmarks of the form "average running time of code written by X-year students/professionals with Y years of experience in Z". I expect that the answer for C is usually "code not correct" for small X and Y. It'd be really interesting to see how much more experience/expertise you need to leverage the potential for performance C promises. $\endgroup$
    – Raphael
    Commented May 26, 2016 at 23:09
  • $\begingroup$ Haskell is really an exception that proves the rule. ;) I think C++ has found a rather reasonable middle ground about GC by its smart pointers as long as you don't nest shared pointers and will be as fast as C. $\endgroup$
    – Kaveh
    Commented Jun 1, 2016 at 13:50
  • $\begingroup$ Somewhat recently there was a "fastest language" "competition" by Dave's Garage on YouTube. The conclusions were not useful because they crowdsourced the code and every language had a different developer and sometimes a completely different problem approach. At best it wad a comparison of how well the coders knew both their language of choice and how good the approach they chose to code was at solving the problem. The difference wasn't the languages; some of the slower languages did very well and somehow Zig "beat" C despite being so new. $\endgroup$ Commented Sep 20, 2023 at 22:25
3
$\begingroup$

By its nature abstraction reduces the communcation of information, both to the programmer and to the lower layers of the system (the compiler, libraries, and runtime system). In favor of abstraction, this generally allows the lower layers to assume that the programmer is not concerned with any unspecified behavior, providing greater flexibility in supplying the behavior that is specified.

An example of a potential benefit from this "don't care" aspect is in data layout. In C (low abstraction), the compiler is more constrained in data layout optimizations. Even if the compiler could discern (e.g., through profile information) that hot/cold or false-sharing-avoidance optimizations would be beneficial, it is generally prevented from applying such. (There is some freedom in specifying "as if", i.e., treating the specification more abstractly, but deriving all the potential side effects adds a burden to the compiler.)

A more abstract specification is also more robust against changes in tradeoffs and uses. The lower layers are less constrained in reoptimizing the program for new system characteristics or new uses. A more concrete specification must either be rewritten by a programmer or additional effort must be made by the lower layers to guarantee "as if" behavior.

The performance hindering aspect of information hiding abstraction is "can't express", which the lower layers will typically handle as "don't know." This means that the lower layers must discern information useful for optimization from other means such as typical general use, use targeted by the language or library, or specific profile information.

The impact of information hiding works in the other direction as well. The programmer can be more productive by not having to consider and specify every detail, but the programmer may have less information about the impact of higher level design choices.

On the other hand, when code is more specific (less abstract), the lower layers of the system can more simply do what they are told to do as they are told to do it. If the code is well-written for its targeted use, it will fit its targeted use well. A less abstract language (or programming paradigm) allows the programmer to optimize the implementation by detailed design and by the use of information which is not easily communicated in a given language to the lower layers.

As has been noted, less abstract languages (or programming techniques) are attractive when additional programmer skill and effort can produce worthwhile results. When more programmer effort and skill are applied, results will typically be better. In addition, a language system which is used less for performance-critical applications (instead emphasizing development effort or reliability — bounds checks and garbage collection are not just about programmer productivity but about correctness, reducing via abstraction the mental load on the programmer can improve reliability) will have less pressure to improve performance.

Specificity (lower abstraction) also works against the principle of don't repeat yourself, but tailoring code to a specific use is a common optimization technique. This reduced abstraction has obvious reliability and programming effort implications.

The abstractions provided by a language may also include undesired or unnecessary work with constrained ability to choose a less heavyweight abstraction. While unnecessary work can sometimes be discovered and removed by the lower layers (e.g., bounds checks may be extracted from the body of a loop and entirely removed in some cases), determining that such is a valid optimization requires more "skill and effort" by the compiler.

Language age and popularity are also noteworthy factors both in availability of skilled programmers and the quality of the lower layers of the system (including mature libraries and code examples).

Another conflating factor in such comparisons is the somewhat orthogonal difference between ahead-of-time compilation and just-in-time compilation. While just-in-time compilation may more easily exploit profile information (not relying on the programmer to provide profile runs) and system-specific optimization (ahead-of-time compilation may target broader compatibility), the overhead of aggressive optimization is accounted as part of the runtime performance. JIT results can be cached (reducing the overhead for commonly used code) but typically not persistently.

For ahead-of-time compilation, binary reoptimization can provide some advantages of JIT compilation, but traditional binary distribution formats drop most source-code information potentially forcing the system to attempt to discern intent from a specific implementation.

(Install-time compilation/finalization is an intermediate choice with intermediate trade-offs.)

Lower abstraction languages, because of their emphasis on programmer control, favor the use of ahead-of-time compilation. Install-time compilation might be tolerated, though link-time implementation selection would provide greater programmer control. JIT compilation sacrifices significant control.

There is also the issue of benchmarking methodology. Equal effort/skill is effectively impossible to establish, but even if such could be achieved the language goals would bias the results. If a low maximum programming time was required, a program for a less abstract language might fail even to be completely written compared to a simple idiomatic expression in a more abstract language. If a high maximum programming time/effort was allowed, lower-abstraction languages would have an advantage. Benchmarks presenting best-effort results would naturally be biased in favor of less abstract languages.

It is sometimes possible to program in a less idiomatic manner in a language to gain the advantages of other programming paradigms, but even when the expressive power is available the tradeoffs for doing such may not be favorable.

$\endgroup$
1
$\begingroup$

here are a few key ideas on this.

  • an easy comparison/ case study to make wrt abstraction vs speed is java vs c++. java was designed to abstract away some of the lower-level aspects of c++ such as memory management. in early days (around the invention of the language mid 1990s), java garbage detection was not very fast, but after a few decades of research, the garbage collectors are extremely finetuned/ fast/ optimized, so the garbage collectors are largely eliminated as a performance drain on java. eg see even this 1998 headline: Performance tests show Java as fast as C++ / javaworld

  • programming languages and their long evolution have an inherent "pyramidal/ hierarchical structure" as a sort of transcendant design pattern. at the top of the pyramid is something that controls other lower sections of the pyramid. in other words, building blocks are made out of building blocks. this can be seen in API structure also. in this sense, greater abstraction always leads to some new component at the top of the pyramid controlling other components. so in a sense its not so much that all the languages are level with each other, but that languages call upon routines in other languages. eg a lot of scripting languages (python/ ruby) often call C or C++ libraries, numerical or matrix routines are a typical example of this. so there are higher level and lower level languages and the high level languages call lower level languages so to speak. in this sense measuring relative speed is not really comparable.

  • one might say that new languages are being invented all the time to try to optimize the abstraction/ speed tradeoff, ie its key design goal. maybe its not so much that greater abstraction always sacrifices speed but that a better balance is always being sought with newer designs. eg Google Go was in many ways specifically optimized with the tradeoff in mind, to be simultaneously both high-level and performant. see eg Google Go: Why Google’s programming language can rival Java in the enterprise / techworld

$\endgroup$
0
$\begingroup$

The way I like to think about performance is "where the rubber meets the road". The computer executes instructions, not abstractions.

What I want to see is this: is every instruction that is executed "earning its keep" by contributing substantially to the end result? As a too-simple example, consider looking up an entry in a table of 1024 entries. That is a 10-bit problem because the program has to "learn" 10 bits before it knows the answer. If the algorithm is binary search, every iteration contributes 1 bit of information, because it shrinks the uncertainty by a factor of 2. So it takes 10 iterations, one for each bit.

Linear search, on the other hand, is initially very inefficient because the first iterations shrink the uncertainty by a very small factor. So they are not learning much for the effort expended.

OK, so if the compiler can allow the user to wrap good instructions in a way considered "abstract", that's fine.

$\endgroup$