63
$\begingroup$

Safe programming languages (PL) are gaining popularity. What is the formal definition of safe PL? For example, C is not safe, but Java is safe. I suspect that the property “safe” should be applied to a PL implementation rather than to the PL itself. If so, how do we define what is a safe PL implementation?

$\endgroup$
2
  • $\begingroup$ Comments are not for extended discussion; this conversation has been moved to chat. $\endgroup$ Commented Jul 6, 2018 at 19:19
  • $\begingroup$ "the property “safe” should be applied to a PL implementation rather than to the PL itself" - you can call a PL safe if a safe implementation of it exists. $\endgroup$ Commented Apr 29, 2019 at 8:08

10 Answers 10

32
$\begingroup$

When we call a language “safe” in some respect, that formally means that there’s a proof that no well-formed program in the language can do something we consider dangerous. The word “safe” is also used less formally, but that’s what people here understand your question to mean. There are many different definitions of properties we want a “safe” language to have.

A few important ones are:

  • Andrew Wright and Matthias Felleisen’s definition of “type soundness”, which is cited in many places (including Wikipedia) as an accepted definition of “type safety,” and their 1994 proof that a subset of ML meets it.

  • Michael Hicks lists several definitions of “memory safety” here. Some are lists of types of errors that cannot occur, and some are based on treating pointers as capabilities. Java guarantees that none of those errors are possible (unless you explicitly use a feature marked unsafe) by having a garbage collector manage all allocations and deallocations. Rust makes the same guarantee (again, unless you explicitly mark code as unsafe), through its affine type system, which requires a variable to be either owned or borrowed before being used at most once.

  • Similarly, thread-safe code is usually defined as code that cannot exhibit certain kinds of bugs involving threads and shared memory, including data races and deadlocks. These properties are often enforced at the language level: Rust guarantees that data races cannot occur in its type system, C++ guarantees that its std::shared_ptr smart pointers to the same objects in multiple threads will not delete an object prematurely or fail to delete it when the last reference to it is destroyed, C and C++ additionally have atomic variables built into the language, with atomic operations guaranteed to enforce certain kinds of memory-consistency if used correctly. MPI restricts interprocess communication to explicit messages, and OpenMP has syntax to ensure that access to variables from different threads is safe.

  • The property that memory will never leak is often called safe-for-space. Automatic garbage collection is one language feature to ensure this.

  • Many languages have a guarantee that its operations will have well-defined results and its programs will be well-behaved. As supercat gave an example of above, C does this for unsigned arithmetic (guaranteed to wrap around safely) but not signed arithmetic (where overflow is allowed to cause arbitrary bugs, because C needed to support CPUs that do wildly-different things when signed arithmetic overflows), but then the language sometimes silently converts unsigned quantities to signed ones.

  • Functional languages have a large number of invariants that any well-formed program is guaranteed to maintain, for example, that pure functions cannot cause side-effects. These may or may not be described as “safe.”

  • Some languages, such as SPARK or OCaml, are designed to facilitate proving program correctness. This may or may not be described as “safe” from bugs.

  • Proofs that a system cannot violate a formal security model (hence the quip, “Any system that’s provably secure probably isn’t.”)

$\endgroup$
10
  • 1
    $\begingroup$ This may or may not be described as “safe” from bugs. Would you please elaborate this a little? What do you mean by "from bugs"? $\endgroup$
    – Nobody
    Commented Jul 4, 2018 at 8:05
  • 2
    $\begingroup$ @scaaahu Here’s an example of a website that refers to software formally proven correct as “provably safe.” In this context, it’s referring to software to prevent aircraft from colliding, so it means safe from collisions. $\endgroup$
    – Davislor
    Commented Jul 4, 2018 at 9:01
  • 1
    $\begingroup$ I am accepting this answer because it lists types of safety. The type that I had in mind is memory safety. $\endgroup$
    – beroal
    Commented Jul 16, 2018 at 13:32
  • $\begingroup$ While this answer lists some helpful links and a lot of examples, most of them are completely messed up. Garbage collection ensures that memory will never leak or not using "unsafe" blocks automatically gives you safety or signed overflow being Undefined Behavior because С compilers need to support some weird CPUs, seriously? And just a brief word for Ada / SPARK which is the only one of the mentioned languages that does take safety seriously. $\endgroup$ Commented Apr 13, 2019 at 23:08
  • 2
    $\begingroup$ OCaml is not designed to facilitate proofs of correctness. It may have a sophisticated type system that can make many guarantees, but OCaml isn't a proof assistant. (From the Curry-Howard point of view, OCaml's type system is an inconsistent logic, unlike the type system of e.g. Inria's Coq.) $\endgroup$
    – Del
    Commented Jan 21, 2020 at 23:20
100
$\begingroup$

There is no formal definition of "safe programming language"; it's an informal notion. Rather, languages that claim to provide safety usually provide a precise formal statement of what kind of safety is being claimed/guaranteed/provided. For instance, the language might provide type safety, memory safety, or some other similar guarantee.

$\endgroup$
12
  • 14
    $\begingroup$ As an addeumdum, if we talk about C vs Java like OP's post : it's the memory safety that is guaranted in Java and not in C. Type safety is provided by both in their own way. (Yeah a lot of people reading this already know that, but maybe some don't). $\endgroup$
    – Walfrat
    Commented Jul 3, 2018 at 12:31
  • 17
    $\begingroup$ @Walfrat That's part of it. Java also has no undefined behaviour, which is something we expect of a language that calls itself 'safe'. As for type systems, I don't think a strong static type system is what people tend to mean by 'safe'. Dynamically typed languages like Python are generally 'safe' after all. $\endgroup$ Commented Jul 3, 2018 at 13:24
  • 3
    $\begingroup$ My definition of type safety is the compilation check that handle that. That may not be the formal definition. Note that I said "type safety", not "safe". For me "safe" language refer to "my" definition of "type and memory safety" and I think it may be the most widespread one. Of course i'm not talking of some pitfall like reflection/void pointer in C that compilation cannot handle. Another possible definition of safe are program that don't crash with a segment fault like the unitialized pointer in C. Things like that are generally granted in Python and Java. $\endgroup$
    – Walfrat
    Commented Jul 3, 2018 at 13:52
  • 7
    $\begingroup$ @Walfrat All that gets you though is a language where the syntax is well defined. It doesn't guarantee that execution is well defined - and the number of times I've seen a JRE crash, I can tell you flat out that as a system it isn't "safe". In C on the other hand, MISRA has put work into avoiding undefined behaviour to get a safer subset of the language, and compilation of C into assembler is much better defined. So it does depend on what you consider "safe". $\endgroup$
    – Graham
    Commented Jul 3, 2018 at 15:55
  • 5
    $\begingroup$ @MaxBarraclough - "Java also has no undefined behaviour" - Java has no undefined behaviour in the sense used in the C specifications in the language definition (although it does allow some code to produce values that do not have a single predefined value, e.g. accessing a variable that is being modified in another thread, or by accessing a double or long while it is being modified in another thread, which isn't guaranteed not to produce half of one value mixed in some unspecified way with half of another) but the API specification however has undefined behavior in some cases. $\endgroup$
    – Jules
    Commented Jul 6, 2018 at 10:18
42
$\begingroup$

If you can get your hands on a copy of Benjamin Pierce's Types and Programming Languages, in the introduction, he has a good overview on various perspectives on the term "safe language".

One proposed interpretation of the term that you might find interesting is:

“A safe language is completely defined by its programmer’s manual.” Let the definition of a language be the set of things the programmer needs to understand in order to predict the behavior of every program in the language. Then the manual for a language like C does not constitute a definition, since the behavior of some programs (e.g., ones involving unchecked array accesses or pointer arithmetic) cannot be predicted without knowing the details of how a particular C compiler lays out structures in memory, etc., and the same program may have quite different behaviors when executed by different compilers.

So, I would be hesitant to use the term "unsafe" to refer to a programming language implementation. If an undefined term in a language produces different behavior in different implementations, one of the implementations might product behavior that might be more expected, but I wouldn't call it "safe".

$\endgroup$
21
  • 8
    $\begingroup$ The Halting Problem of course says that no matter the language, there will always be programs whose behavior is not predictable from the language definition. So any definition that hinges on "predict the behavior of every program in the language" is fundamentally flawed for any Turing-complete language. $\endgroup$
    – MSalters
    Commented Jul 3, 2018 at 9:53
  • 16
    $\begingroup$ @MSalters This is a popular misunderstanding of the halting problem. The undecidability of the halting problem implies that it is impossible to mechanically derive the behavior of an arbitrary program in a Turing-complete language. But is possible that for any given program, the behavior is predictable. It's just that you can't make a computer program that makes this prediction. $\endgroup$ Commented Jul 3, 2018 at 12:22
  • 7
    $\begingroup$ @Giles: That is not the case. Suppose there exists a proof of non-termination for every non-terminating program. Then you can enumerate the proofs of non-termination to find whether a given program halts. So the halting problem is decidable. Contradiction. Thus some non-terminating programs are not provably non-terminating. $\endgroup$
    – Kevin
    Commented Jul 3, 2018 at 14:12
  • 9
    $\begingroup$ @Gilles: I'm perfectly aware of the fact that many programs are trivially proven to halt or not. But the statement here is literally about the behavior of every program. The proof of the Halting Theorem shows that there exists at least one program for which that is not true. It's just a non-constructive proof, it won't tell you which program is undecidable. $\endgroup$
    – MSalters
    Commented Jul 3, 2018 at 14:27
  • 8
    $\begingroup$ @MSalters I think the implied bit is that the statement is about the small-scale behavior of the program, rather than large-scale, emergent behavior. For example, take the Collatz conjecture. The individual steps of the algorithm are simple and well defined, but the emergent behavior (how many iterations until stopping, and if it does at all), is anything but. -- "Predict" is being used informally here. It might be better written as "know how any given statement in an arbitrary program will execute." $\endgroup$
    – R.M.
    Commented Jul 3, 2018 at 15:36
20
$\begingroup$

Safe is not binary, it's a continuum.

Informally speaking, safety is meant by opposition to bugs, the 2 most often mentioned being:

  • Memory Safety: the language and its implementation prevent a variety of memory related errors such as use-after-free, double-free, out-of-bounds access, ...
  • Type Safety: the language and its implementation prevent a variety of type related errors such as unchecked casts, ...

Those are not the sole classes of bugs that languages prevent, data-race freedom or deadlock freedom is rather desirable, proofs of correctness are pretty sweet, etc...

Simply incorrect programs are rarely considered "unsafe" however (only buggy), and the term safety is generally reserved for guarantees affecting our ability to reason about a program. Thus, C, C++ or Go, having Undefined Behavior, are unsafe.

And of course, there are languages with unsafe subsets (Java, Rust, ...) which purposefully delineate areas where the developer is responsible for maintaining the language guarantees and the compiler is in "hands-off" mode. The languages are still generally dubbed safe, despite this escape hatch, a pragmatic definition.

$\endgroup$
3
  • 10
    $\begingroup$ I'd say it's a lattice. $\endgroup$
    – PatJ
    Commented Jul 3, 2018 at 14:26
  • 1
    $\begingroup$ Most programming languages implementations have unsafe features (e.g. Obj.magic in Ocaml). And in practice, these are really needed $\endgroup$ Commented Jul 3, 2018 at 16:44
  • 4
    $\begingroup$ @BasileStarynkevitch: Indeed. I would think that any language with FFI necessarily contains some level of unsafe, as calling a C function will require "pining" GC'ed objects and manually ensure that the signatures on both sides match up. $\endgroup$ Commented Jul 3, 2018 at 17:06
15
$\begingroup$

While I don't disagree with D.W.'s answer, I think it leaves one part of "safe" unaddressed.

As noted, there are multiple types of safety promoted. I believe it's good to understand why there are multiple notions. Each notion is associated with the idea that programs suffer especially from a certain class of bugs, and that programmers would be unable to make this specific kind of bug if the language blocked the programmer from doing so.

It should be noted that these different notions therefore have different classes of bugs, and these classes are not mutually exclusive nor do these classes cover all forms of bugs. Just to take D.W.'s 2 examples, the question whether a certain memory location holds a certain object is both a question of type safety and memory safety.

A further criticism of "safe languages" follows from the observation that banning certain constructs deemed dangerous leaves the programmer with the need to come up with alternatives. Empirically, safety is better achieved by good libraries. using code that's already field-tested saves you from making new bugs.

$\endgroup$
2
  • 10
    $\begingroup$ It's rather off-topic for this site, because software engineering is not really a science, but I disagree with your empirical statement. Using good libraries won't save you in unsafe languages, because you aren't protected from using them wrong. Safe languages let you get more guarantees from the library author and let you gain more assurance that you're using them correctly. $\endgroup$ Commented Jul 3, 2018 at 21:49
  • 3
    $\begingroup$ I'm with MSalters on this. -"Safe languages let you get more guarantees from the library author and let you gain more assurance that you're using them correctly." This is a non sequitur for all practical purposes. $\endgroup$ Commented Jul 4, 2018 at 0:05
8
$\begingroup$

A fundamental difference between C and Java is that if one avoids certain easily-identifiable features of Java (e.g. those in the Unsafe namespace), every possible action one may attempt--including "erroneous" ones--will have an limited range of possible outcomes. While this limits what one can do in Java--at least without using the Unsafe namespace, it also makes it possible to limit the damage that can be caused by an erroneous program, or--more importantly--by a program which would correctly process valid files but is not particularly guarded against erroneous ones.

Traditionally, C compilers would process many actions in Standard-defined fashion in "normal" cases, while processing many corner cases "in a manner characteristic of the environment". If one were using a CPU which would short out and catch fire if numerical overflow occurred and wanted to avoid having the CPU catch fire, one would need to write code to avoid numerical overflow. If, however, one were using a CPU which would perfectly happily truncate values in two's-complement fashion, one didn't have to avoid overflows in cases where such truncation would result in acceptable behavior.

Modern C takes things a step further: even if one is targeting a platform which would naturally define a behavior for something like numerical overflow where the Standard would impose no requirements, overflow in one portion of a program may affect the behavior of other parts of the program in arbitrary fashion not bound by the laws of time and causality. For example, consider something like:

 uint32_t test(uint16_t x)
 {
   if (x < 50000) foo(x);
   return x*x; // Note x will promote to "int" if that type is >16 bits.
 }

A "modern" C compiler given something like the above might conclude that since the computation of x*x would overflow if x is greater than 46340, it can perform the call to "foo" unconditionally. Note that even if it would be acceptable to have a program abnormally terminate if x is out of range, or have the function return any value whatsoever in such cases, calling foo() with an out-of-range x might cause damage far beyond either of those possibilities. Traditional C wouldn't provide any safety gear beyond what the programmer and underlying platform supplied, but would allow safety gear to limit the damage from unexpected situations. Modern C will bypass any safety gear that isn't 100% effective at keeping everything under control.

$\endgroup$
18
  • 3
    $\begingroup$ @DavidThornley: Perhaps my example was too subtle. If int is 32 bits, then x will get promoted to signed int. Judging from the rationale, the authors of the Standard expected that non-weird implementations would treat signed and unsigned types in equivalent fashion outside of some specific cases, but gcc sometimes "optimizes" in ways that break if a uint16_t by uint16_t multiply yields a result beyond INT_MAX, even when the result is being used as an unsigned value. $\endgroup$
    – supercat
    Commented Jul 3, 2018 at 20:17
  • 4
    $\begingroup$ Good example. This is one reason why we should always (on GCC or Clang) compile with -Wconversion. $\endgroup$
    – Davislor
    Commented Jul 3, 2018 at 22:31
  • 2
    $\begingroup$ @Davislor: Ah, I just noticed that godbolt reversed the order in which compiler versions are listed, so selecting the last version of gcc in the list yields the latest rather than earliest. I don't think the warning is particularly helpful since it's prone to flag a lot of situations like return x+1; which shouldn't be problematic, and casting the result to uint32_t would stifle the message without fixing the issue. $\endgroup$
    – supercat
    Commented Jul 4, 2018 at 19:12
  • 2
    $\begingroup$ @supercat Eliminating tests is pointless if the compiler is required to put the tests back in a different place. $\endgroup$ Commented Jul 4, 2018 at 23:05
  • 3
    $\begingroup$ @immibis: A "checked assume" directive may allow a compiler to replace many tests, or a check that would be performed many times within a loop, with a single test which may be hoisted outside a loop. That's better than requiring programmers to add checks that would not b needed in machine code for a program to meet requirements, for the purpose of ensuring that a compiler doesn't "optimize out" checks that are necessary to meet requirements. $\endgroup$
    – supercat
    Commented Jul 5, 2018 at 0:45
6
$\begingroup$

There are several layers of correctness in a language. In order of increasing abstraction:

  • Few programs are error free (only those for which correctness can be proven). Others mentioned already that error containment is therefore the most concrete safety aspect. Languages which run in a virtual machine like Java and .net are generally safer in this respect: Program errors are normally intercepted and handled in a defined way.1
  • At the next level, errors detected at compile time instead of at run time make a language safer. A syntactically correct program should also be semantically right as much as possible. Of course the compiler cannot know the big picture, so this concerns the detail level. Strong and expressive data types are one aspect of safety on this level. One could say the language should make it hard to make certain kinds of errors (type errors, out-of bound access, uninitialized variables etc.). Run-time type information like arrays which carry length information avoid errors. I programmed Ada 83 in college and found that a compiling Ada program typically contained perhaps an order of magnitude fewer errors than the corresponding C program. Just take Ada's ability to user-define integer types which are not assignable without explicit conversion: Whole space ships have crashed because feet and meters were confused, which one could trivially avoid with Ada.

  • At the next level, the language should provide means to avoid boilerplate code. If you have to write your own containers, or their sorting, or their concatenation, or if you must write your own string::trim() you'll make mistakes. Since the abstraction level rises this criteria involves the language proper as well as the language's standard library.

  • These days the language should provide means for concurrent programming on the language level. Concurrency is hard to get right and perhaps impossible to do correctly without language support.

  • The language should provide means for modularization and collaboration. The strong, elaborate, user-defined types from above help create expressive APIs.

Somewhat orthogonally the language definition should be intelligible; the language and libraries should be well documented. Bad or missing documentation leads to bad and wrong programs.


1 But because usually the correctness of the virtual machine cannot be proven such languages may somewhat paradoxically not be suitable for very strict safety requirements.

$\endgroup$
2
  • 1
    $\begingroup$ +1 For clear layer by layer explanation. A question for you, Whole space ships have crashed because feet and meters were confused, which one could trivially avoid with Ada., are you talking about Mars Probe Lost Due to Simple Math Error? Do you happen to know the language they were using for that space ship? $\endgroup$
    – Nobody
    Commented Jul 5, 2018 at 3:43
  • 2
    $\begingroup$ @scaaahu Yes, I think I was referring to that. No, I don't know the language. Actually, reading the report, it seems that data sent by the probe was processed by software on earth producing a data file which was then used to determine thrust levels. Simple language typing is not applicable in this scenario. Btw., they had multiple issues with the ground software and data file format, a confusion which prevented early detection of the problem. So the story is not a direct argument for strong typing but still a cautionary tale. $\endgroup$ Commented Jul 5, 2018 at 7:08
2
$\begingroup$

Please, do not say that every PL has unsafe commands. We can always take a safe subset.

Every language I know of has ways of writing illegal programs that can be (compiled and) run. And every language that I know of has a safe subset. So, what's your question really?


Safety is multidimensional and subjective.

Some languages have lots of operations that are "unsafe". Others have a fewer such operations. In some languages, the default way of doing something is inherently unsafe. In others, the default way is safe. In some languages, there is an explicit "unsafe" subset. In other languages, there is no such subset at all.

In some languages, "safety" refers exclusively to memory safety — a service offered by the standard library and/or the runtime where memory access violations are made difficult or impossible. In other languages, "safety" explicitly includes thread safety. In other languages, "safety" refers to the guarantee that a program will not crash (a requirement that includes not allowing uncaught exceptions of any kind). Finally, in many languages "safety" refers to type safety — if the type system is consistent in certain ways, it is said to be "sound" (incidentally, Java and C# do not have completely sound type systems).

And in some languages, all the different meanings of "safety" are considered subsets of type safety (e.g. Rust and Pony achieve thread safety through properties of the type system).

$\endgroup$
1
  • $\begingroup$ I would say that being Turing complete logically precludes any sort of absolute safety. Tautologically stated, if you can write any program then you can write any program. $\endgroup$
    – Max Power
    Commented Nov 20, 2023 at 5:48
0
$\begingroup$

Take this C code:

int* p;
*p = 1;

C is a very unsafe language. p is uninitialised, so it could point anywhere. Likely effects are an immediate crash, overwriting memory somewhere with no bad side effect, or overwriting something important, and then Things Go Wrong. Now we change it to

int* p = NULL;
*p = 1;

On many implementations this will crash because that’s storing to NULL will crash on those implementations. But you may have an optimising compiler that throws this and any following code away!

Now in a safe language, the first example would not compile. The compiler must prove that any variable has been assigned a value before it is used. In the second example, assigning NULL is only allowed if p is a nullable variable, but then assigning through that pointer is not allowed unless the value is tested first successfully so the assignment could not happen.

A safe language also needs to handle the case that a pointer becomes invalid. Examples:

int* p = malloc(100);
int* q = p;
free(p);

Now both p and q are invalid and assignment through either will be unsafe; a safe language prevents this. Or

int* p;
{ int x; p = &x; }
*p = 2;

At the time of the assignment p has changed from a valid pointer to an invalid one because x doesn’t exist anymore. A safe language will prevent this.

$\endgroup$
1
  • $\begingroup$ There are very rare cases where one would want to purposefully do what you labeled an invalid pointer operation. It may not be used in a typical desktop app but there is a use somewhere, which is why the compiler allows it. see Chesterton's Fence. In any case it is good to remember that by the time the source code has been preprocessed, parsed, compiled, optimized, linked and decoded into machine micro-operations, all high-level language safety mechanisms have been stripped out. $\endgroup$
    – Max Power
    Commented Nov 28, 2023 at 4:15
-2
$\begingroup$

All languages are converted to the same instruction set for execution so at a very low-level view no language can be more or less inherently safe than another.

The words "safe" and "safety" have in recent decades been mutilated by certain politically oriented parts of English-speaking society, such that the common usage has almost no definition. However, for technical matters I define "safety" and "safe" as a device(physical or procedural) that prevents the unintentional use of something or that makes accidental use substantially more difficult; and the state of being under the protection of such a device.

So, a safe language has some device to limit a particular class of bugs. Limits by their nature often come with some other inconvenience or outright lack of ability. This inconvenience or inability to do certain things may or may not be an acceptable tradeoff, and must be judged for each situation.

This also should not be taken to imply that "unsafe" languages will result in bugs. For example, I don't have safety corks on my forks and yet I have, without much effort, managed for decades to avoid stabbing my eye while eating. It is certainly less effort to use bare forks with a bit of care and skill than the effort that would be needed to use the fork-cork. (The fork-cork is an old obscure pop culture reference.) The saved effort frees up resources I can use to improve other parts of my life.

$\endgroup$
2
  • $\begingroup$ Moving code from less safe Objective-C to safer Swift the code often became shorter and more readable. Probably helped by lots of syntactic sugar that made safety easy. $\endgroup$
    – gnasher729
    Commented Nov 23, 2023 at 17:15
  • $\begingroup$ @gnasher729 Overall syntax is mostly independent of safety mechanisms, but I can see how being readable reduces risk. (Hazard mitigation and risk reduction are not synonymous with safe.) The benefits seem to be worth the costs in your case.(Maybe obj-c is just a poor language all around, I don't know it well.) Those languages are highly abstracted from the underlying machine and more concerned with the efficacy of writing code than the efficiency of execution, or ease of writing a compiler for new systems. eg They aren't good picks at all for running bare metal without the support of an OS. $\endgroup$
    – Max Power
    Commented Nov 28, 2023 at 4:02

Not the answer you're looking for? Browse other questions tagged or ask your own question.