What is safe and unsafe really?

“You are allowed to do this, but you had better know what you are doing.”

- A Rustacean

When we talk about safety in programming languages, it is a property that spans different levels. A language can be memory-safe, type-safe, or it can be concurrent-safe. Memory safety means that a program doesn't write to a forbidden memory address and it doesn't access invalid memory. Type safety means that a program doesn't allow you to assign a number to a string variable and that this check happens at compile time, while concurrent-safe means that the program does not lead to race conditions when multiple threads are executing and modifying a shared state. If a language provides all of these levels of safety by itself, then it is said to be safe. To put it more generally, a program is deemed safe if, in all possible executions of the program and for all possible inputs, it gives correct outputs, does not lead to crashes, and does not clobber or corrupt its internal or external state. With Rust in safe mode, this is indeed true!

An unsafe program is one that violates an invariant at runtime or triggers an undefined behavior. These unsafe effects may be local to a function, or may have propagated later as a global state in the program. Some of them are inflicted by programmers themselves, such as logic errors, while some of them are due to the side effects of the compiler implementation that's used, and sometimes from the language specification itself. Invariants are conditions that must always be true during the execution of the program in all code paths. The simplest example would be that a pointer pointing to an object on the heap should never be null within a certain section of code. If that invariant breaks, code that is dependent on that pointer might dereference it and undergo a crash. Languages such as C/C++ and languages based on them are unsafe because quite a few operations are categorized as an undefined behavior in the compiler specification. An undefined behavior is an effect of hitting a situation in a program for which the compiler specification does not specify what happens at lower levels, and you are free to assume that anything can happen. One example of undefined behavior is using an uninitialized variable. Consider the following C code:

// both_true_false.c

int
main(void) {
bool var;
if (var) {
fputs("var is true! ");
}
if (!var) {
fputs("var is false! ");
}
return 0;
}

The output of this program is not the same with all C compiler implementations because using an uninitialized variable is an undefined operation. On some C compilers with some optimizations enabled, you may even get the following output:

var is true
var is false

Having your code take unpredictable code paths like this is something you don't want to see happen in production. Another example of undefined behavior in C is writing past the end of an array of size n. When the write happens to n + 1 offset in memory, the program may either crash or it may modify a random memory location. In the best case scenario, the program would crash immediately and you would get to know about this. In the worst case scenario, the program would continue running but may later corrupt other parts of the code and give faulty results. Undefined behaviors in C exist in the first place to allow compilers to optimize code for performance and go with the assumption that a certain corner case never happens and to not add error-checking code for these situations, just to avoid the overhead associated with error handling. It would be great if undefined behavior could be converted to compile time errors, but detecting some of these behaviors at compile time sometimes becomes resource intensive, and so not doing so keeps the compiler implementation simple.

Now, when Rust has to interact with these languages, it knows very little about how function calls and how types are represented at lower levels in these languages and because undefined behavior can occur at unexpected places, it sidesteps from all of these gotchas and instead provides us with a special unsafe {} block for interacting with things that come from other languages. In unsafe mode, you get some extra abilities to do things, which would be considered undefined behavior in C/C++. However, with great power comes great responsibility. A a developer who uses unsafe in their code has to be careful of the operations that are performed within the unsafe block. With Rust in unsafe mode, the onus is on you. Rust places trust in the programmer to keep operations safe. Fortunately, this unsafe feature is provided in a very controlled manner and is easily identifiable by reading the code, because unsafe code is always annotated with the unsafe keyword or unsafe {} blocks. This is unlike C, where most things are likely to be unsafe.

Now, it's important to mention that, while Rust offers to protect you from major unsafe situations in programs, there are also cases where Rust can't save you, even if the program you wrote is safe. These are the cases where you have logical errors such as the following:

  • A program uses floating-point numbers to represent currency. However, floating-point numbers are not precise and lead to rounding errors. This error is somewhat predictable (since, given the same input, it always manifests itself in the same way) and easy to fix. This is a logic and implementation error, and Rust offers no protection for such errors.
  • A program to control a spacecraft uses primitive numbers as parameters in functions to calculate distance metrics. However, a library may be providing an API where the distances are interpreted in the metric system, and the user might provide numbers in the imperial system, leading to invalid measurements. A similar error occurred in 1999, in NASA's Mars Climate Orbiter spacecraft, and caused nearly $125 million worth of loss. Rust won't fully protect you from such mistakes, although, with the help of type system abstractions such as enums and the newtype pattern, we can isolate different units from each other and restrict the API's surface to only valid operations, making this error much less likely.
  • A program writes to shared data from multiple threads without the appropriate locking mechanisms. The error manifests itself unpredictably, and finding it can be very difficult since it is non-deterministic. In this case, Rust fully protects you against data races with its ownership and borrowing rules, which are applicable to concurrent code too, but it cannot detect deadlocks for you.
  • A program accesses an object through a pointer, which, in some situations, is a null pointer, causing the program to crash. In safe mode, Rust fully protects you against null pointers. However, when using unsafe mode, the programmer has to make sure that operations with a pointer from other languages are safe.

The unsafe feature of Rust is also needed for situations where the programmer knows better than the compiler and has to implement some of the tricky parts in their code, where the compile-time ownership rules become too restrictive and get in the way. For instance, let's say there's a case where you need to convert a sequence of bytes into a String value and you know that your Vec<u8> is a valid UTF-8 sequence. In this case, you can directly use the unsafe String::from_utf_unchecked method instead of the usual safe String::from_utf8 method to bypass the extra overhead in checking for valid UTF-8 in the from_utf8 method and can gain a bit of speedup. Also, when doing low-level embedded system development or any program that interfaces with the operating system kernel, you need to switch to unsafe mode. However, not everything requires unsafe mode and there are a few select operations that the Rust compiler sees as unsafe. They are as follows:

  • Updating a mutable static variable
  • Dereferencing raw pointers, such as *const T and *mut T
  • Calling an unsafe function
  • Reading values from a union type
  • Invoking a declared function in extern blocks – items from other languages

Some of the memory-safety guarantees are relaxed in the aforementioned situations, but the borrow checker is still active in these operations and all the scoping and ownership rules still apply. The Rust reference about unsafety at https://doc.rust-lang.org/stable/reference/unsafety.html distinguishes between what is considered undefined and what is not unsafe. To easily distinguish this when you are performing the aforementioned operations, Rust requires you to use the unsafe keyword. It allows only a handful of places to be marked as unsafe, such as the following:

  • Functions and methods
  • Unsafe block expressions, such as unsafe {}
  • Traits
  • Implementation blocks
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset