Is it possible to programmatically evaluate safety for arbitrary code?

Question

I've been thinking a lot lately about safe code. Thread-safe. Memory-safe. Not-going-to-explode-in-your-face-with-a-segfault safe. But for the sake of clarity in the question, let's use Rust's safety model as our going definition.

Often, ensuring safety is a bit of a how-big-is-the-net problem, because, as proven by Rust's need for unsafe, there are some very reasonable programming ideas, like concurrency, that cannot be implemented in Rust without using the unsafe keyword. Even though concurrency can be made perfectly safe with locks, mutexes, channels and memory isolation or what have you, this requires working outside of Rust's safety model with unsafe, and then manually assuring the compiler that, "Yeah, I know what I'm doing. This looks unsafe, but I've proven mathematically it's perfectly safe."

But typically, this comes down to manually making models of these things and proving they're safe with theorem provers. From both a computer science perspective (is it possible) and a practicality perspective (is it going to take the life of the universe), is it reasonable to imagine a program that takes arbitrary code in an arbitrary language and evaluates whether or not it's "Rust safe"?

Caveats:

An easy out on this one is to point out that the program could be be unhalting and therefore the halting problem fails us. Let's say any program fed to the reader is guaranteed to halt
While "arbitrary code in an arbitrary language" is the goal, I am of course aware that this depends on the program's familiarity with the chosen language, which we'll take as a given

Arbitrary code? No. I imagine you can’t even prove the safety of most useful code because of I/O and hardware exceptions. — Telastyn, Commented Nov 8, 2018 at 5:56
Why are you disregarding the Halting Problem? Every single one of the examples you mentioned, and many more, have been proven to be equivalent to solving the Halting Problem, the Function Problem, Rice's Theorem, or any of the other many Undecidability Theorems: pointer-safety, memory-safety, thread-safety, exception-safety, purity, I/O-safety, lock-safety, progress guarantees, etc. The Halting Problem is one of the simplest possible static properties that you could possibly want to know, everything else you list is much harder. — Jörg W Mittag, Commented Nov 8, 2018 at 10:32
If you only care about false positives, and are willing to accept false negatives, I have an algorithm that classifies everything: "Is it safe? No" — Caleth, Commented Nov 8, 2018 at 12:18
You absolutely do not need to use unsafe Rust to write concurrent code. They are several different mechanisms available, ranging from synchronization primitives to actor inspired channels. — RubberDuck, Commented Apr 10, 2019 at 21:48

Neil · Accepted Answer · 2018-11-08 10:57:36Z

What we're ultimately talking about here is compile time vs runtime.

Compile time errors, if you think about it, ultimately amount to the compiler being able to determine what problems you have in your program before it even gets run. It's obviously not an "arbitrary language" compiler, but I'll get back to that shortly. The compiler, in all its infinite wisdom, doesn't however list every problem which can be determined by the compiler. This is partially dependent on how well the compiler is written, but the primary reason for this is that a good many things are determined at runtime.

Runtime errors, as you are well familiar with I'm sure as am I, is any type of error which happens during execution of the program itself. This includes dividing by zero, null pointer exceptions, hardware issues, and many other factors.

The nature of runtime errors means you cannot anticipate said errors at compile time. If you could, they would almost be certainly checked at compile time. If you could guarantee a number is zero at compile time, then you could perform certain logical conclusions, such as dividing any number by that number will result in a arithmetic error caused by dividing by zero.

As such, in a very real way, the enemy of programmatically guaranteeing proper functioning of a program is performing runtime checks as opposed to compile time checks. An example of this might be performing a dynamic cast to another type. If this is allowed, you, the programmer, are essentially overriding the compiler's ability to know if that's a safe thing to do. Some programming languages have decided that this is acceptable while others will at least warn you at compile time.

Another good example might be allowing nulls to be part of the language, since null pointer exceptions could happen if you allow nulls. Some languages have eliminated this problem entirely by preventing variables not explicitly declared to be able to hold null values to be declared without immediately being assigned a value (take Kotlin for example). While you cannot eliminate a null pointer exception runtime error, you can prevent it from happening by removing the dynamic nature of the language. In Kotlin, you can force the possibility of holding null values of course, but it goes without saying that this is a metaphorical "buyers beware" as you have to explicitly state it as such.

Could you conceptually have a compiler which could check errors in every language? Yes, but it would likely be a clunky and highly unstable compiler in which you would have to necessarily provide the language being compiled beforehand. It also couldn't know many things about your program, anymore than compilers for specific languages know certain things about it, such as the halting problem as you mentioned. As it turns out, a good many pieces of information that might be interesting to learn about a program are impossible to glean. This has been proven, so it isn't likely to change anytime soon.

Returning to your primary point. Methods are not automatically thread safe. There is a practical reason for this, which is that thread safe methods are also slower even when threads are not being used. Rust decides that they can eliminate runtime problems by making methods thread safe by default, and that is their choice. It comes at a cost though.

It may be possible to mathematically prove the correctness of a program, but it would be with the caveat that you would have literally zero runtime features in the language. You would be able to read this language and know what it does without any surprises. The language would probably look very mathematical in nature, and that is likely no coincidence there. The second caveat is that runtime errors still happen, which may have nothing to do with the program itself. Therefore, the program can be proven correct, assuming a series of assumptions about the computer it is being run on are accurate and don't change, which of course always does happen anyway and often.

amon · Accepted Answer · 2018-11-08 10:46:58Z

Type system are automatically verifiable proofs of some aspects of correctness. For example, Rust's type system can prove that a reference does not outlive the referenced object, or that a referenced object is not modified by another thread.

But type systems are quite restricted:

They quickly run into decidability problems. In particular, the type system itself should be decidable, yet many practical type systems are accidentally Turing Complete (including C++ because of templates and Rust because of traits). Also, certain properties of the program they are verifying might be undecidable in the general case, most famously whether some program halts (or diverges).
Additionally, type systems should run quickly, ideally in linear time. Not all proofs that are possible should be featured in the type system. E.g. whole program analysis is usually avoided, and proofs are scoped to single modules or functions.

Because of these limitations, type systems tend to only verify fairly weak properties that are easy to prove, e.g. that a function is called with values of correct type. Yet even that substantially limits expressiveness, so it is common to have workarounds (like interface{} in Go, dynamic in C#, Object in Java, void* in C) or even to use languages that completely shun static typing.

The stronger properties we verify, the less expressive will the language typically get. If you've written Rust, you will know these “fighting with the compiler” moments where the compiler rejects seemingly correct code, because it wasn't able to prove correctness. In some cases, it is not possible to express a certain program in Rust even when we believe we can prove its correctness. The unsafe mechanism in Rust or C# allows you to escape the confines of the type system. In some cases, deferring checks to runtime can be another option – but this means we cannot reject some invalid programs. This is a matter of definition. A Rust program that panics is safe as far as the type system is concerned, but not necessarily from the perspective of a programmer or user.

Languages are designed together with their type system. It is rare that a new type system is imposed on an existing language (but see e.g. MyPy, Flow, or TypeScript). The language will try to make it easy to write code that conforms to the type system, e.g. by offering type annotations or by introducing easy to prove control flow structures. Different languages may end up with different solutions. E.g. Java has the concept of final variables that are assigned exactly once, similar to Rust's non-mut variables:

final int x;
if (...) { ... }
else     { ... }
doSomethingWith(x);

Java has type system rules to determine whether all paths assign the variable or terminate the function before the variable could be accessed. In contrast, Rust simplifies this proof by not having declared-but-undesigned variables, but lets you return values from control flow statements:

let x = if ... { ... } else { ... };
do_something_with(x)

This looks like a really minor point when figuring out assignment, but clear scoping is extremely important for lifetime-related proofs.

If we were to apply a Rust-style type system to Java, we would have much bigger problems than that: Java objects are not annotated with lifetimes, so we would have to treat them as &'static SomeClass or Arc<dyn SomeClass>. That would weaken any resulting proofs. Java also has no type-level concept of immutability so we cannot distinguish between & and &mut types. We would have to treat any object as a Cell or a Mutex, though this might assume stronger guarantees than Java actually offers (changing a Java field is not threadsafe unless synchronized and volatile). Finally, Rust has no concept of Java-style implementation inheritance.

TL;DR: type systems are theorem provers. But they are limited by decidability problems and performance concerns. You can't simply take one type system and apply it to a different language, as the target's language syntax might not provide necessary information, and because the semantics might be incompatible.

Christophe · Accepted Answer · 2018-11-08 12:32:18Z

How safe is safe ?

Yes it is almost possible to write such a verifyer: your program just has to return the constant UNSAFE. You'll be right 99% of the time

Because even if you run a safe Rust program, someone can still pull the plug during its execution: so your programme might halt even if it is theoretically not supposed to.

And even if your server is running in a faraday cage in a bunker, a neighbour process might execute a rowhammer exploit and make a bit flip in one of your supposedly safe Rust program.

What I'm trying to say, is that your software will run in a non deterministic environment and many outside factors might influence the execution.

Joke aside, automated verification

There are already static code analyzers that can spot risky programming constructs (uninitialised variables, buffer overflows, etc...). These work by creating a graph of your programme and analyzing the propagation of constraints (types, value ranges, sequencing).

This kind of analysis is by the way also performed by some compilers for the sake of optimization.

It is certainly possible to go a step further, and also analyze concurrency, and make deductions about the constraints propagation across several threads, synchronization and racing conditions. However, very quickly you'd run into the problem of combinatorial explosion between the execution paths, and many unknowns (I/O, OS scheduling, user input, behavior of external programs, interruptions, etc) that will dillute known constraints to their bare minimum and make it very difficult to make any useful automated conclusions on arbitrary code.

NovaDenizen · Accepted Answer · 2019-04-08 23:33:21Z

1

Turing addressed this back in 1936 with his paper on the halting problem. One of the results is that, just that it is impossible to write an algorithm that 100% of the time can analyze code and correctly determine if it will halt or not, it is impossible to write an algorithm that can 100% of the time correctly determine if code has any particular property or not, including "safety" however you want to define it.

However, Turing's result does not preclude the possibility of a program that can 100% of the time either (1) absolutely determine code is safe, (2) absolutely determine that code is unsafe, or (3) anthropomorphically throw up its hands and say "Heck, I don't know.". Rust's compiler, generally speaking, is in this category.

answered Apr 8, 2019 at 23:33

NovaDenizen

1974 bronze badges

So as long as you have a “not sure” option, yes?
– TheEnvironmentalist
Commented Apr 8, 2019 at 23:51
1

The takeaway is that it's always possible to write a program capable of confusing a program-analyzing program. Perfection is impossible. Practicality may be possible.
– NovaDenizen
Commented Apr 10, 2019 at 15:09

Add a comment |

user1937198 · Accepted Answer · 2019-04-09 01:03:04Z

If a program is total (the technical name for a program that is guaranteed to halt), then it is theoretically possible to prove any arbitrary property over the program given enough resources. You can just explore every potential state the program could enter, and check whether any of them violate your property. The TLA+ model checking language uses a variant of this approach, using set theory to check your properties against sets of potential program states, rather than computing all states.

Technically, any program executed on any practical physical hardware is either total, or a provable loop due to the fact you only have a limited amount of storage available, so there is only a finite number of states the computer can be in. (Any practical computer is actually a finite state machine, not Turing complete, but the state space is so large its easier to pretend they are turning complete).

The problem with this approach is that it has exponential complexity to the the amount of storage and size of the program, rendering it impractical for anything other than the core of algorithms, and impossible to apply to significant code bases in entirety.

So the vast majority of research is focused at proofs. Curry–Howard correspondence states that a proof of correctness and a type system are one and the same thing, so most of the practical research goes under the name of type systems. Particularly relevant to this discussion are Coq and Idriss, in addition to Rust which you have already mentioned. Coq approaches the underlying engineering problem from the other direction. Taking a proof of correctness of arbitrary code in the Coq language, it can generate code which executes the program proven. Idriss meanwhile uses a dependent type system to prove arbitrary code in a pure Haskell like language. What both these languages do is push the hard problems of generating a workable proof onto the writer, allowing the type-checker to focus on checking the proof. Checking the proof is a much simpler problem, but this does leave the languages much harder to work with.

Both these languages where specifically designed to make proofs easier, using purity to control what state is relevant to which parts of the program. For many mainstream languages, merely proving that a piece of state is irrelevant to a proof of part of the program can be a complex issue due to the nature of side effects and mutable values.

Stack Exchange Network

Is it possible to programmatically evaluate safety for arbitrary code?

5 Answers 5

How safe is safe ?

Joke aside, automated verification

Not the answer you're looking for? Browse other questions tagged
algorithms
testing
computer-science
rust
or ask your own question.

Hot Network Questions

Is it possible to programmatically evaluate safety for arbitrary code?

5 Answers 5

How safe is safe ?

Joke aside, automated verification

Not the answer you're looking for? Browse other questions tagged algorithmstestingcomputer-sciencerust or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
algorithms
testing
computer-science
rust
or ask your own question.