Hands-On Data Structures and Algorithms with Rust

4.5 (2 reviews total)
By Claus Matzinger
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Rust has come a long way and is now utilized in several contexts. Its key strengths are its software infrastructure and resource-constrained applications, including desktop applications, servers, and performance-critical applications, not forgetting its importance in systems' programming. This book will be your guide as it takes you through implementing classic data structures and algorithms in Rust, helping you to get up and running as a confident Rust programmer.

The book begins with an introduction to Rust data structures and algorithms, while also covering essential language constructs. You will learn how to store data using linked lists, arrays, stacks, and queues. You will also learn how to implement sorting and searching algorithms. You will learn how to attain high performance by implementing algorithms to string data types and implement hash structures in algorithm design. The book will examine algorithm analysis, including Brute Force algorithms, Greedy algorithms, Divide and Conquer algorithms, Dynamic Programming, and Backtracking.

By the end of the book, you will have learned how to build components that are easy to understand, debug, and use in different applications.

Publication date:
January 2019
Publisher
Packt
Pages
316
ISBN
9781788995528

 

Chapter 1. Hello Rust!

First, thank you for picking up a copy of this book! Many of you will only have talked about the topic of algorithms and data structures back in university. In fact, regardless of whether this is your first endeavor in programming or not, we worked hard to make this book a great learning experience. Our primary focus will be the unique influence of Rust on algorithm and data structure design, so we would like to start with a recap of important fundamentals.

Starting off with the Rust 2018 edition changes, we will cover how borrowing and ownership, mutability, and concurrency influence how and where data can be held, and what algorithms can be executed. In this chapter, you can look forward to learning about the following:

  • A quick refresh on Rust and what awaits in the 2018 edition (Rust 1.31)
  • The latest and greatest about borrowing and ownership
  • How we can leverage concurrency and mutability properly
  • References (not pointers!) to where Rust lives
 

Rust in 2018


How old is Rust? It started off in 2006 as a side project of Graydon Hoare, an engineer at Mozilla, and was later (in 2009) adopted by the company. Fast forward to less than a decade later to May 15, 2015, and the Rust team announced a stable version 1.0!

During its journey, there have been many features that have been added and removed again (for example, a garbage collector, classes, and interfaces) to help it become the fast and safe language that it is today.

Before getting deeper into borrowing and ownership, mutability, concurrency, safety, and so on in Rust, we would like to recap some major concepts in Rust and why they change architectural patterns significantly.

The 2018 edition

Rust in the 2015 edition is essentially the 1.0 version with a few non-breaking additions. Between 2015 and 2018, however, features and Requests for Comments (RFCs), Rust's way of changing core features with the community, accumulated, and worries about backward compatibility arose.

With the goal of keeping this compatibility, editions were introduced and, with the first additional edition, many major changes made it into the language:

  • Changes to the module path system
  • dyn Trait and impl Trait syntax
  • async/await syntax
  • Simplifications to the lifetime syntax

With these additions, Rust will introduce asynchronous programming into its syntax (async/await keywords) and improve the language's usability. This book uses the Rust 2018, released on December 6, 2018 (https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html) edition by default, so all the following snippets will already include these new language features!

The Rust language

Many of the established programming languages today are multi-paradigm languages, but still remain focused on the principles of object orientation. This means that they have classes, methods, interfaces, inheritance, and so on, none of which can be found in Rust, giving it a steep learning curve for many established developers.

Note

More experienced readers will miss many aspects of what makes Rust an excellent language, such as static versus dynamic method invocation, memory layouts, and so on. I recognize the importance of those things, yet for brevity and focus chose to leave it to you to explore these things further. Check the Further reading section for resources.

As a multi-paradigm language, Rust has many functional concepts and paradigms that guide it, but they make traditional object-oriented patterns more difficult to apply. Other than organizing code without classes and interfaces, there are various methods to handle errors, change the code itself, or even work with raw pointers.

In the following sections, we want to explore a few concepts that make Rust unique and have a major influence on the way we develop algorithms and data structures.

Objects and behavior

Organizing code in Rust is a bit different from regular object-oriented languages such as C#. There, an object is supposed to change its own state, interfaces are simple contract definitions, and specialization is often modeled using class inheritance:

class Door {
    private bool is_open = false;

    public void Open() {
        this.is_open = true;
    }
}

With Rust, this pattern would require constant mutability of any Door instance (thereby requiring explicit locking for thread safety), and without inheritance GlassDoor would have to duplicate code, making it harder to maintain.

Instead, it's recommended to create traits to implement (shared) behavior. Traits have a lot in common with abstract classes in traditional languages (such as default implementations of methods/functions), yet any struct in Rust can (and should) implement several of those traits:

struct Door {
    is_open: bool
}

impl Door {
    fn new(is_open: bool) -> Door {
        Door { is_open: is_open }
    }
}

trait Openable {
    fn open(&mut self);
}

impl Openable for Door {
    fn open(&mut self) {
        self.is_open = true;
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn open_door() {
        let mut door = Door::new(false);
        door.open();
        assert!(door.is_open);
    }
}

This pattern is very common in the standard library, and often third-party libraries will even add behavior to existing types by implementing traits in their code (also known as extension traits).

Other than a typical class, where data fields and methods are in a single construct, Rust emphasizes the separation between those by declaring a struct for data and an impl part for the methods/functions. Traits name and encapsulate behaviors so they can easily be imported, shared, and reused.

Going wrong

Other than classes, Rust comes without another well-known companion: null. In the absence of pointers and with a very different memory management model, there is no typical null pointer/reference.

Instead, the language works with Option and Result types that let developers model success or failure. In fact, there is no exception system either, so any failed execution of a function should be indicated in the return type. Only in rare cases when immediate termination is required does the language provide a macro for panicking: panic!().

Option<T> and Result<T, E> both encapsulate one (Option<T>) or two (Result<T, E>) values that can be returned to communicate an error or whether something was found or not. For example, a find()functioncould returnOption<T>, whereas something likeread_file()would typically have aResult<T, E>return type to communicate the content or errors:

fn find(needle: u16, haystack: Vec<u16>) -> Option<usize> {
    // find the needle in the haystack
}

fn read_file(path: &str) -> Result<String, io::Error> {
    // open the path as a file and read it
}

Handling those return values is often done with match or if let clauses in order to handle the cases of success or failure:

match find(2, vec![1,3,4,5]) {
    Some(_) => println!("Found!"),
    None => println!("Not found :(")
}

// another way
if let Some(result) = find(2, vec![1,2,3,4]) {
    println!("Found!")
}

// similarly for results!
match read_file("/tmp/not/a/file") {
    Ok(content) => println!(content),
    Err(error) => println!("Oh no!")
}

This is due to Option<T> and Result<T, E> both being enumerations that have generic type parameters; they can assume any type in their variants. Matching on their variants provides access to their inner values and types to allow a branch of the code to be executed and handle the case accordingly. Not only does this eliminate the need for constructs such as try/catch with multiple—sometimes cast—exception arms, it makes failure part of the normal workflow that needs to be taken care of.

Macros

Another aspect of Rust is the ability to do metaprogramming—basically programming programming—using macros! Macros are expanded in Rust code before compilation, which gives them more power than a regular function. The generated code can, for instance, create functions on the fly or implement traits for a structure.

These pieces of code make everyday life a lot easier by reducing the need to create and then initialize vectors, deriving the ability to clone a structure, or simply printing stuff to the command line.

This is a simplified example for the declarative vec![] macro provided in the Rust Book (second edition, Appendix D):

#[macro_export]
macro_rules! vec {
     ( $( $x:expr ),* ) => {
         {             
letmut temp_vec = Vec::new();              
             $( temp_vec.push($x); )*      
            temp_vec
         }     
    }; 
}

Declarative macros work on patterns and run code if that pattern matches; the previous example matches 0 - n expressions (for example, a number, or a function that returns a number) and inserts temp_vec.push(...)n times, iterating over the provided expressions as a parameter.

The second type, procedural macros, operate differently and are often used to provide a default trait implementation. In many code bases, the #[derive(Clone, Debug)] statement can be found on top of structures to implement the Clone and Debug traits automatically.

Later in this chapter, we are going to use a structure, FileName, to illustrate reference counting, but for printing it to the command line using the debug literal "{:?}", we need to derive Debug, which recursively prints all members to the command line:

#[derive(Debug)]
structFileName {
    name: Rc<String>,
    ext: Rc<String>
}

The Rust standard library provides several macros already, and by creating custom macros, you can minimize the boilerplate code you have to write.

Unsafe

Rust's code is "safe" because the compiler checks and enforces certain behavior when it comes to memory access and management. However, sometimes these rules have to be forgone, making the code unsafe. unsafe is a keyword in Rust and declares a section of code that can do most of the things the C programming language would let you do. For example, it lets the user do the following (from the Rust Book, chapter 19.1):

  • Dereference a raw pointer
  • Call an unsafe function or method
  • Access or modify a mutable static variable
  • Implement an unsafe trait

These four abilities can be used for things such as very low-level device access, language interoperability (the compiler can't know what native libraries do with their memory), and so on. In most cases, and certainly in this book, unsafe is not required. In fact, the Rustonomicon (https://doc.rust-lang.org/nomicon/what-unsafe-does.html) defines a list of issues the language is trying to prevent from happening by providing the safe part:

  • Dereferencing null, dangling, or unaligned pointers.
  • Reading uninitialized memory.
  • Breaking the pointer aliasing rules.
  • Producing invalid primitive values:
    • Dangling/null references
    • Null fn pointers
    • A bool that isn't 0 or 1
    • An undefined enum discriminant
    • A char outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
    • A non-UTF8 string
  • Unwinding into another language.
  • Causing a data race.

The fact that these potential issues are prevented in safe Rust certainly makes the life of a developer easier, especially when designing algorithms or data structures. As a consequence, this book will always work with safe Rust.

 

Borrowing and ownership


Rust is famous for its memory management model, which replaces runtime garbage collection with compile-time checks for memory safety. The reason why Rust can work without a garbage collector and still free the programmer from error-prone memory management is simple (but not easy): borrowing and ownership.

While the particulars are quite complex, the high-level view is that the compiler inserts any "provide x amounts of memory" and "remove x amounts of memory" (somewhat like malloc() and free() for C programmers) statements for the developer. Yet how can it do that?

Note

The rules of ownership are as follows:

  • The owner of a value is a variable
  • At any time, only a single owner is allowed
  • The value is lost once the owner goes out of scope

This is where Rust's declarative syntax comes into play. By declaring a variable, the compiler knows—at compile time—that a certain amount of memory needs to be reserved. The lifetime is clearly defined too, from the beginning to end of a block or function, or as long as the struct instance lives. If the size of this variable is known at compile time, the compiler can provide exactly the necessary amount of memory to the function for the time required. To illustrate, let's consider this snippet, where two variables are allocated and removed in a deterministic order:

fn my_func() {
    // the compiler allocates memory for x 
    let x = LargeObject::new(); 
    x.do_some_computation();
    // allocate memory for y
    let y = call_another_func();
    if y > 10 {
        do_more_things();
    }
} // deallocate (drop) x, y

Is this not what every other compiler does? The answer is yes—and no. At compile time, the "provide x amounts of memory" part is fairly simple; the tricky part is keeping track of how much is still in use when references can be passed around freely. If, during the course of a function, a particular local reference becomes invalid, a static code analysis will tell the compiler about the lifetime of the value behind the reference. However, what if a thread changes that value at an unknown time during the function's execution?

At compile time, this is impossible to know, which is why many languages do these checks at runtime using a garbage collector. Rust forgoes this, with two primary strategies:

  • Every variable is owned by exactly one scope at any time
  • Therefore, the developer is forced to pass ownership as required

Especially when working with scopes, the nature of stack variables comes in handy. There are two areas of memory, stack and heap, and, similar to other languages, the developer uses types to decide whether to allocate heap (Box, Rc, and so on) or stack memory.

Stack memory is usually short-lived and smaller, and operates in a first-in, last-out manner. Consequently, a variable's size has to be known before it is put on the stack:

Heap memory is different; it's a large portion of the memory, which makes it easy to allocate more whenever needed. There is no ordering, and memory is accessed by using an addresses. Since the pointer to an address on the heap has a known size at compile time, it fits nicely on the stack:

Stack variables are typically passed by value in other languages, which means that the entire value is copied and placed into the stack frame of the function. Rust does the same, but it also invalidates further use of that variable in the—now parent—scope. Ownership moves into the new scope and can only be transferred back as a return value. When trying to compile this snippet, the compiler will complain:

fn my_function() {
    let x = 10;
    do_something(x); // ownership is moved here
    let y = x;       // x is now invalid!
}

Borrowing is similar but, instead of copying the entire value, a reference to the original value is moved into the new scope. Just like in real life, the value continues to be owned by the original scope; scopes with a reference are just allowed to use it as it was provided. Of course, this comes with drawbacks for mutability, and some functions will require ownership for technical and semantic reasons, but it also has advantages such as a smaller memory footprint.

Note

These are the rules of borrowing:

  • Owners can have immutable or mutable references, but not both
  • There can be multiple immutable references, but only one mutable reference
  • References cannot be invalid

By changing the previous snippet to borrow the variable to do_something() (assuming this is allowed, of course), the compiler will be happy:

fn my_function() {
    let x = 10;
    do_something(&x); // pass a reference to x
    let y = x;        // x is still valid!
}

Borrowed variables rely heavily on lifetimes. The most basic lifetime is the scope it was created in. However, if a reference should go into a struct field, how can the compiler know that the underlying value has not been invalidated? The answer is explicit lifetimes!

Exceptional lifetimes

Some lifetimes are different and Rust denominates them with a '. While this could be the predefined 'static, it's equally possible to create your own, something that is often required when working with structures.

This makes sense when thinking about the underlying memory structure: if an input parameter is passed into the function and returned at the end, its lifetime surpasses the function's. While the function owns this part of the memory during its lifetime, it cannot borrow a variable for longer than it actually exists. So, this snippet cannot work:

fn another_function(mut passing_through: MyStruct) -> MyStruct {
    let x = vec![1, 2, 3];

    // passing_through cannot hold a reference 
    // to a shorter lived x!
    // the compiler will complain.
    passing_through.x = &x;

    return passing_through;
} // x's life ends here

The reason is that the passing_throughvariableoutlivesx. There are several solutions to this problem:

  • Change the type definition of MyStruct to require ownership. This way, the structure now owns the variable and it will live as long as the structure:
fn another_function(mut passing_through: MyStruct) -> MyStruct {
    let x = vec![1, 2, 3];

    // passing_through owns x and it will be
    // dropped together with passing_through.
    passing_through.x = x;

    return passing_through;
}
  • Clone x to pass ownership into passing_through:
fn another_function(mut passing_through: MyStruct) -> MyStruct {
    let x = vec![1, 2, 3];
    let y = &x;

    // passing_through owns a deep copy of x'value that is be
    // dropped together with passing_through.
    passing_through.x = y.clone();

    return passing_through;
}
  • In this case, vec![] is statically defined, so it could make sense to add it as a function parameter. This is not only more allocation-efficient, but also can enforce an appropriate lifetime:
fn another_function<'a>(mut passing_through: MyStruct<'a>, x: &'a Vec<u32>) -> MyStruct<'a> {

    // The compiler knows and expects a lifetime that is 
    // at least as long as the struct's 
    // of any reference passed in as x. 
    passing_through.x = x;

    return passing_through;
}

Lifetimes cause a lot of strange errors for many Rust users, and in the 2018 edition there is one less to worry about. With the introduction of non-lexical lifetimes, the borrow checker got a lot smarter and it is now able to check—up to a certain degree—semantically whether the variable was used. Recall from the rules of borrowing that, if a mutable reference is created, no immutable references can exist.

This code did not compile before Rust 1.31:

fn main() {     
    let mut a = 42;     
    let b = &a; // borrow a     
    let c = &mut a; // borrow a again, mutably         
    // ... but don't ever use b
}

Now it will compile since the compiler does not just check the beginning and ending of a scope, but also if the reference was used at all.

Multiple owners

As powerful as single ownership is, it does not work for every use case. Large objects or shared objects that other instances need to own are examples where immutable ownership makes life easier. Consider a function that requires an owned object to be passed in:

#[derive(Debug)]
struct FileName {
    name: String,
    ext: String 
}

fn no_ref_counter() {
    let name =String::from("main");
    let ext = String::from("rs");

for _ in0..3 {
        println!("{;?}", FileName {
                    name: name, 
                    ext: ext 
        });
}
}

When trying to compile no_ref_counter(), the compiler creates a scope for each iteration of the loop and owns any value that is used within it. This works exactly once, since afterward, the variable has been moved and is inaccessible for subsequent iterations.

Consequently, these values (in this case, name and ext) are gone and compilation will yield two errors, one for each "second" move of a string:

error[E0382]: use of moved value: `name`
 --> src/main.rs:63:33
 |
63 | let _ = FileName { name: name, ext: ext };
 |                            ^^^^ value moved here in previous iteration of loop
 |
 = note: move occurs because `name` has type `std::string::String`, which does not implement the `Copy` trait

error[E0382]: use of moved value: `ext`
 --> src/main.rs:63:44
 |
63 | let _ = FileName { name: name, ext: ext };
 |                                       ^^^ value moved here in previous iteration of loop
 |
 = note: move occurs because `ext` has type `std::string::String`, which does not implement the `Copy` trait

One solution is to clone the object in every iteration, but that causes a lot of slow memory allocations. For this, the Rust standard library provides a solution: reference counting.

A reference counter (std::rc::Rc<T>) encapsulates a variable of type T allocated on the heap and returns an immutable reference when created. This reference can be cloned with low overhead (it's only a reference count that is incremented) but never transformed into a mutable reference. Regardless, it acts just like owned data, passing through function calls and property lookups.

While this requires a change to the variable types, a call to clone() is now far cheaper than cloning the data directly:

use std::rc::Rc;

#[derive(Debug)]
struct FileName {
    name: Rc<String>,
    ext: Rc<String> 
}

fnref_counter() {
    let name = Rc::new(String::from("main"));
    let ext = Rc::new(String::from("rs")));

for _ in0..3 {
        println!("{;?}", FileName {
                    name: name.clone(), 
                    ext: ext.clone() 
        });
}
}

Running this snippet prints the debug version of the FileName object three times:

FileName { name: "main", ext: "rs" }
FileName { name: "main", ext: "rs" }
FileName { name: "main", ext: "rs" }

This approach works great for single-threaded and immutable scenarios, but will refuse to compile multithreaded code. The solution to this will be discussed in the next section.

 

Concurrency and mutability


Rust's approach to managing memory is a powerful concept. In fact, it is powerful enough to also facilitate concurrency and parallel execution. However, first things first: how do threads work in the Rust standard library?

Note

Concurrency and parallelism are two different modes of execution. While concurrency means that parts of a program run independently of each other, parallelism refers to these parts executing at the same time. For simplicity, we will refer to both concepts as concurrency.

Due to its low-level nature, Rust provides an API to the operating system's threading capabilities (for example, POSIX on Linux/Unix systems). If no variables are passed into the scope, their usage is very straightforward:

use std::thread; 

fn threading() { 
    // The to pipes (||) is the space where parameters go,
    // akin to a function signature's parameters, without
    // the need to always declare types explicitly. 
    // This way, variables can move from the outer into the inner scope
let handle = thread::spawn(|| { 
println!("Hello from a thread");
    }); 
    handle.join().unwrap(); 
}

However, when passing data back and forth, more work has to be done to hold up Rust's safety guarantees, especially when mutability comes into play. Before getting into that, it is important to recap immutability.

Immutable variables

Rust—like many functional languages—embraces immutable variables. They are the default, and changing mutability requires explicit declaration with mut, which tells the compiler what the variable is going to be used for (reading or writing).

Functional programming languages are known for facilitating the ability to work concurrently, thanks to immutability guarantees; reading data does not produce side effects! Requiring explicit mutability gives the compiler a chance to check where and if mutability is required, and therefore whether a data race may occur.

This results in compile-time warnings and errors instead of crashes and strange race conditions at runtime, something that many production users appreciate. In short, it's easier to think through your code if mutability is a (rare) option instead of the norm.

Shadowing

Instead of changing variable properties, it's often more readable to overwrite a variable with a different value (for example, a changed copy of the original). This technique is called shadowing.

Typically, this is used to reuse a variable name, even though the actual value has changed, to work in the current situation. This snippet sanitizes String and, by using the same name throughout the function, it's always clear that it's the input parameter that is changed:

fn sanitize(s: String) -> String {
    let s = s.trim();
    let s = s.replace(" ", "_");
    s
} 

While this is akin to changing the value of a variable, shadowing does not replace mutability, especially when it's less costly to actually change properties of that variable; Rust has a specific design pattern for that!

Interior mutability

Can a variable be immutable and mutable at the same time? Of course. Boxed variables (Box, Rc, and so on) are an immutable reference to the heap and they contain the actual value.

For these kinds of containers, there is no reason why the inner variable cannot be changed—a task that can be done safely in Rust using RefCell. RefCell maintains single ownership of a value but allows mutable borrowing checked at runtime. Instead of compiler errors, violating the rules of borrowing will lead to a runtime panic!, crashing the program.

This entire concept is called interior mutability and is often used in combination with Rc in order to provide a value to multiple owners with mutability at will. Clearly, to provide a great user experience, it is strongly recommended to make sure the borrowing rules can't be violated in other ways.

Wrapping a RefCell in an Rc acts as the gatekeeper for having multiple owners, including a way to change the contents. This is actually similar to more traditional programming languages such as Java or C#, where typically references are moved between method calls, pointing to the object's instance on the heap memory.

This pattern is very important for implementing complex programs and data structures, since ownership of a specific variable is not always clear. For example, later in the book we will examine doubly linked lists, which famously have a pointer to the preceding and succeeding node. Which node should have ownership of which pointer? Interior mutability allows us to say both. Consider the node declaration we will uselater:

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Clone)]
struct Node {
    value: String,
    next: Link,
    prev: Link,
}

type Link = Option<Rc<RefCell<Node>>>;

With this list declaration, we can see the pattern in this simpler version of the append function:

pub fn append(&mut self, value: String) {
    let new = Rc::new(RefCell::new(Node::new(value)));
    match self.tail.take() {
        Some(old) => {
            old.borrow_mut().next = Some(new.clone());
            new.borrow_mut().prev = Some(old);
        }
        None => self.head = Some(new.clone()),
    };
}

This code adds a new node at the front (head) of the list, which contains all data in the form of nodes stored on the heap. In order to add a node at the head of the list, the references have to be set properly, so the previous and next pointers actually refer to the same nodes instead of copies. A more detailed exploration is going to be covered in Chapter 3, Lists, Lists, and More Lists. For now, the important part is setting the variables using borrow_mut(). This mutable reference only lives as long as the assignment takes, thereby ruling out creating a too-large scope and violating the borrowing rules.

By using the RefCell function's borrow_mut(), it will check for and enforce borrowing rules and panic in the case of a violation. Later on, we will also talk about the Mutex type, which is essentially a multithreaded version of these cells.

Moving data

The introductory snippet showed code that spawns a thread but did not pass any data into the scope. Just like any other scope, it requires either ownership of a value or at least a borrowed reference in order to work with that data. In this case, passing ownership is what we want, something that is called moving data into the scope.

If we change the snippet from the introduction to include a simple variable to print from within the thread, compilation is going to fail:

use std::thread; 

fn threading() { 
    let x = 10;
let handle = thread::spawn(|| { 
println!("Hello from a thread, the number is {}", x);
    }); 
    handle.join().unwrap(); 
}

The reason for this is simple: the compiler cannot determine the lifetimes of each of the scopes (will x still be there when the thread needs it?), so it refuses to compile the code:

Compiling ch1 v0.1.0 (file:///code/ch1) 
error[E0373]: closure may outlive the current function, but it borrows `x`, which is owned by the current function 
 --> src/main.rs:5:32 
  | 
5 |     let handle = thread::spawn(|| {  
  |                                ^^ may outlive borrowed value `x` 
6 |         println!("Hello from a thread, the number is {}", x); 
  |                                                           - `x` is borrowed here 
help: to force the closure to take ownership of `x` (and any other referenced variables), use the `move` keyword 
  | 
5 |     let handle = thread::spawn(move || { 
  |                                ^^^^^^^

As the compiler messages indicate, adding the move keyword will solve the issue! This keyword lets a thread pass ownership to a different thread; it "moves" the memory area:

fn threading() { 
    let x = 10;
    let handle = thread::spawn(move || {
        println!("Hello from a thread, the number is {}", x);
    }); 
    handle.join().unwrap(); 
} 

When running this snippet, the output is as follows:

Hello from a thread, the number is 10

However, for passing multiple messages into a thread or implementing an actor model, the Rust standard library offers channels. Channels are single-consumer, multi-producer queues that let the caller send messages from multiple threads.

This snippet will spawn 10 threads and have each send a number into the channel, where it will be collected into a vector after the senders have finished executing:

use std::sync::mpsc::{channel, Sender, Receiver};

fn channels() {
    const N: i32 = 10;
    let (tx, rx): (Sender<i32>, Receiver<i32>) = channel();
    let handles = (0..N).map(|i| {
        let _tx = tx.clone();
        thread::spawn(move || { 
            // don't use the result
            let _ = _tx.send(i).unwrap(); 
        })
    });
    // close all threads
    for h in handles {
        h.join().unwrap();
    }
    // receive N times
    let numbers: Vec<i32> = (0..N).map(|_|
            rx.recv().unwrap()
        ).collect();

    println!("{:?}", numbers);
}

As expected, the output is as follows:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

With these tools, a multithreaded application can move data between threads without the need for manual locking or the dangers of inadvertently creating side effects.

Sharing data

Other than sending data into threadsone way, many programs operate on a shared state where multiple execution streams have to access and change one or more shared variables. Typically, this warrants a mutex (short for mutual exclusion), so that any time something is accessed within this locked mutex, it is guaranteed to be a single thread.

This is an old concept and implemented in the Rust standard library. How does that facilitate accessing a variable? Wrapping a variable into a Mutex type will provide for the locking mechanism, thereby making it accessible from multiple concurrent writers. However, they don't have ownership of that memory area yet.

In order to provide that ownership across threads—similar to what Rc does within a single thread—Rust provides the concept of an Arc, an atomic reference counter. Using this Mutex on top, it's the thread-safe equivalent of an Rc wrapping a RefCell, a reference counter that wraps a mutable container. To provide an example, this works nicely:

use std::thread;
use std::sync::{Mutex, Arc};

fn shared_state() {
    let v = Arc::new(Mutex::new(vec![]));
    let handles = (0..10).map(|i| {
        let numbers = Arc::clone(&v);
        thread::spawn(move || {
            let mut vector = numbers
                .lock()
                .unwrap();
            (*vector).push(i);
        })
    });

    for handle in handles {
        handle.join().unwrap();
    }
    println!("{:?}", *v.lock().unwrap());
}

When running this example, the output is this:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

While the preferred way of doing concurrent programming is still to use immutable variables as often as possible, safe Rust provides the tools for working with shared data without side effects.

Send and Sync

These marker traits are fundamental to Rust's multithreading policies. They have distinct purposes:

  • Send: A data type is safe to send (move) from one thread to the other
  • Sync: The data type can be shared across threads without manual locks or mutex areas

These marker traits are implemented in all basic types of the standard library and can be inherited for custom types (if all properties of a type are Sync, then the type itself is Sync too).

Implementing Sync or Send is unsafe because there is no way for the compiler to know if you are right and the code can be shared/sent between threads, which is why it's very unusual to do this.

In case your program requires this depth of Rust programming, be sure to read up on this topic in the Rust Book, chapter 16 (https://doc.rust-lang.org/1.31.0/book/ch16-04-extensible-concurrency-sync-and-send.html).

 

Deeper into Rust


Another one of Rust's strong points is its thriving community. Many users actively participate in their local community (by going to or organizing meetups) or online via working groups, the IRC/Discord channel, or the official forum. The most important online resources are as follows:

Other than that, users have created additional content, such as podcasts, blogs, and various tools and libraries. The most impressive user contributions, however, can be found in the core language!

Rust's official GitHub repository at https://github.com/rust-lang holds the source code for many of the resources (for example, the website, blog, book, and documentation), and contributions are very welcome.

Mozilla has an impressive record of creating and fostering open source communities, and Rust is no different. As active members of these communities, we encourage everyone to take part and help make Rust the most enjoyable and useful language around!

Requests for Comments (RFCs)

Due to the open source nature of Rust, there are some governance rules in place to maintain stable and flexible interfaces, yet encourage change and discussion as the language evolves.

For something as sensitive as a programming language and its standard library, a more rigid process than the regular pull request approval is required to have deeper discussions. Imagine the impact of changing a single keyword and how many projects would stop working immediately!

This is where RFCs come in. They provide a way for all stakeholders to contribute to the discussion with an equal chance to comment. A typical workflow for integrating change in open source projects uses the fork and pull method where the contributor creates a pull request (PR) to propose changes (https://help.github.com/articles/about-pull-requests/). Unlike in the RFC process, this gets hard to manage in larger code bases and only starts the discussion after a solution has been proposed, narrowing the focus considerably.

A repository of active and past RFCs can be found here: https://github.com/rust-lang/rfcs.

 

Summary


Rust is a multi-paradigm language with exceptional concepts: the language emphasizes data and behavior separation with structures and traits, uses macros for metaprogramming, and leverages explicit ownership of memory to determine variable lifetimes. Knowing these lifetimes removes the need for runtime garbage collection and, at the same time, greatly facilitates concurrency by allowing mutable borrowing only in certain circumstances.

Consequently, threads and other asynchronous processes can change variables only when they have mutable ownership of them, something that is mostly enforced at compile time, but can also be done at runtime! Therefore, safe Rust is effectively free of data races.

Another strong point of the Rust ecosystem is its diverse and welcoming community. Sponsored by Mozilla, development is guided by RFCs, events are organized and centrally advertised, and learning resources are available online. Another way to be a part of the ecosystem is to contribute packages to crates.io (https://crates.io/), Rust's public package repository. Read the next chapter to find out more about cargo, Rust's universal tool to build and package.

 

Questions


  • What are traits and how are they different from interfaces?
  • Why doesn't Rust have a garbage collector?
  • Name three examples of how lifetimes are created in Rust (explicitly and implicitly)!
  • Why is immutability for variables important?
  • What does the Sync marker trait do?
  • Where can you go to participate in the Rust community?
  • Why are RFCs preferred over PRs?
 

Further reading


Refer to the following books for more information:

  • Hands-On Concurrency with Rust by Brian L. Troutwine (Packt)
  • Functional Programming in Rust by Andrew Johnson (Packt)

About the Author

  • Claus Matzinger

    Claus Matzinger is a software engineer with a very diverse background. After working in a small company maintaining code for embedded devices, he joined a large corporation to work on legacy Smalltalk applications. This led to a great interest in programming languages early on, and Claus became the CTO for a health games start-up based on Scala technology. Since then, Claus' roles have shifted toward customer-facing roles in the IoT database-technology start-up crate.io and, most recently, Microsoft. There, he hosts a podcast, writes code together with customers, and blogs about the solutions arising from these engagements. For more than 5 years, Claus has implemented software to help customers innovate, achieve, and maintain success.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Looks as good as it promised to be
this is useful - relevant to my own project

Recommended For You

Book Title
Unlock this full book FREE 10 day trial
Start Free Trial