The Complete Rust Programming Reference Guide

5 (2 reviews total)
By Rahul Sharma , Vesa Kaihlavirta , Claus Matzinger
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Getting Started with Rust

About this book

Rust is a powerful language with a rare combination of safety, speed, and zero-cost abstractions. This Learning Path is filled with clear and simple explanations of its features along with real-world examples, demonstrating how you can build robust, scalable, and reliable programs.

You’ll get started with an introduction to Rust data structures, algorithms, and essential language constructs. Next, you will understand how to store data using linked lists, arrays, stacks, and queues. You’ll also learn to implement sorting and searching algorithms, such as Brute Force algorithms, Greedy algorithms, Dynamic Programming, and Backtracking. As you progress, you’ll pick up on using Rust for systems programming, network programming, and the web. You’ll then move on to discover a variety of techniques, right from writing memory-safe code, to building idiomatic Rust libraries, and even advanced macros.

By the end of this Learning Path, you’ll be able to implement Rust for enterprise projects, writing better tests and documentation, designing for performance, and creating idiomatic Rust code.

This Learning Path includes content from the following Packt products:

  • Mastering Rust - Second Edition by Rahul Sharma and Vesa Kaihlavirta
  • Hands-On Data Structures and Algorithms with Rust by Claus Matzinger
Publication date:
May 2019
Publisher
Packt
Pages
698
ISBN
9781838828103

 

Chapter 1. Getting Started with Rust

Learning a new language is like building a house – the foundation needs to be strong. With a language that changes the way you think and reason about your code, there's always more effort involved in the beginning, and it's important to be aware of that. The end result, however, is that you get to shift your thinking with these new-found concepts and tools.

This chapter will give you a whirlwind tour on the design philosophy of Rust, an overview of its syntax and the type system. We assume that you have a basic knowledge of mainstream languages such as C, C++, or Python, and the ideas that surround object-oriented programming. Each section will contain example code, along with an explanation of it. There will be ample code examples and output from the compiler, that will help you become familiar with the language. We'll also delve into a brief history of the language and how it continues to evolve.

Getting familiar with a new language requires perseverance, patience, and practice. I highly recommend to all readers that you manually write and don't copy/paste the code examples listed here. The best part of writing and fiddling with Rust code is the precise and helpful error messages you get from the compiler, which the Rust community often likes to call error-driven development. We'll see these errors frequently throughout this book to understand how the compiler thinks of our code.

In this chapter, we will cover the following topics:

  • What is Rust and why should you care?
  • Installing the Rust compiler and the toolchain
  • A brief tour of the language and its syntax
  • A final exercise, where we'll put what we've learned together

 

 

 

What is Rust and why should you care?


" Rust is technology from the past came to save the future from itself. "                                                                                                                                                                                                                                                                                                                                                                                                                                               - Graydon Hoare

Rust is a fast, concurrent, safe, and empowering programming language originally started and developed by Graydon Hoare in ­2006. It's now an open source language that's developed mainly by a team from Mozilla with collaboration from lots of open source folks. The first stable version, 1.0, was released in May 2015. The project began with the hope of mitigating memory safety issues that came up in gecko with the use of C++. Gecko is the browser engine that's used in Mozilla's Firefox browser. C++ is not an easy language to tame and has concurrency abstractions that can be easily misused. With gecko using C++, a couple of attempts were made (in 2009 and 2011) to parallelize its cascading style sheets (CSS) parsing code to leverage modern parallel CPUs. They failed, as the concurrent C++ code was too hard to maintain and reason about. With a large number of developers collaborating on the mammoth code base that gecko has, writing concurrent code with C++ is not a joyride. In the hope of incrementally removing the painful parts of C++, Rust was born and, with it, Servo, a new research project of creating a browser engine from scratch was initiated. The Servo project provides feedback to the language team by using the bleeding edge language features that, in turn, influences the evolution of the language. Around November 2017, parts of the Servo project, particularly the stylo project (a parallel CSS parser in Rust) started shipping to the latest Firefox release (Project Quantum), which is a great feat in such a short amount of time. Servo's end goal is to incrementally replace components in gecko with its components.

Rust is inspired by a multitude of languages, the notable ones being Cyclone (a safe dialect of C language) for its ideas on region-based memory management techniques; C++ for its RAII principle, and Haskell for its type system, error handling types, and typeclasses.

Note

RAII stands for Resource Acquisition Is Initialization, a paradigm suggesting that resources must be acquired during the initialization of an object and must be released when their destructors are called or when they are deallocated.

The language has a very minimal runtime, does not need garbage collection, and prefers stack allocation by default over heap allocation (an overhead) for any value that's declared in a program. We'll explain all of this in Chapter 5, Memory Management and Safety. The Rust compiler, rustc, was originally written in Ocaml (a functional language) and became a self-hosting one in 2011 after being written in itself.

Note

Self-hosting is when a compiler is built by compiling its own source code. This process is known as bootstrapping a compiler. Compiler its own source code acts as a really good test case for the compiler.

Rust is openly developed on GitHub at https://github.com/rust-lang/rust and continues to evolve at a fast pace. New features are added to the language through a community-driven Request For Comments (RFC) process where anybody can propose new language features. These are then described in detail in an RFC document. A consensus is then sought after for the RFC and if agreed upon, the implementation phase begins for the feature. The implemented feature then gets reviewed by the community, where it is eventually merged to the master branch after undergoing several tests by users in nightly releases. Getting feedback from the community is crucial for the language's evolution. Every six weeks, a new stable version of the compiler is released. Along with fast moving incremental updates, Rust also has this notion of editions, which is proposed to provide a consolidated update to the language. This includes tooling, documentation, its ecosystem, and to phase in any breaking changes. So far, there have been two editions: Rust 2015, which had a focus on stability, and Rust 2018, which is the current edition at the time of writing this book and focuses on productivity.

While being a general purpose multi-paradigm language, it is aiming for systems programming domain where C and C++ have been predominant. This means that you can write operating systems, game engines, and many performance critical applications with it. At the same time, it is also expressive enough that you can build high-performance web applications, network services, type-safe database Object Relational Mapper (ORM) libraries, and can also run on the web by compiling down to WebAssembly. Rust has also gained a fair share of interest in building safety-critical, real-time applications for embedded platforms such as the Arm's Cortex-M based microcontrollers, a domain mostly dominated by C at present. This gamut of applicability in various domains – which Rust exhibits quite well – is something that very rare to find in a single programming language. Moreover, established companies Cloudflare, Dropbox, Chuckfish, npm, and many more are already using it in production for their high-stakes projects.

Rust is characterized as a statically and strongly typed language. The static property means that the compiler has information about all of the variables and their types at compile time and does most of its checks at compile time, leaving very minimal type checking at runtime. Its strong nature means that it does not allow things such as auto-conversion between types, and that a variable pointing to an integer cannot be changed to point to a string later in code. For example, in weakly typed languages such as JavaScript, you can easily do something like two = "2"; two = 2 + two;. JavaScript weakens the type of 2 to be a string at runtime, thus storing 22 as a string in two, something totally contrary to your intent and meaningless. In Rust, the same code, that is, let mut two = "2"; two = 2 + two;, would get caught at compile time, throwing the following error: cannot add `&str` to `{integer}`. This property enables safe refactoring of code and catches most bugs at compile time rather than causing issues at runtime.

Programs written in Rust are very expressive as well as performant, in the sense that you can have most of the features of high-level functional style languages such as higher-order functions and lazy iterators, yet it compiles down to efficient code like a C/C++ program. The defining principles that underline many of its design decisions are compile-time memory safety, fearless concurrency, and zero cost abstractions. Let's elaborate on these ideas.

Compile time memory safety: The Rust compiler can track variables owning a resource in your program at compile time and does all of this without a garbage collector.

Note

Resources can be memory address, a variable holding a value, shared memory reference, file handles, network sockets, or database connection handles.

This means that you can't have infamous problems with pointers use after free, double free, or dangling pointers at runtime. Reference types in Rust (types with & before them) are implicitly associated with a lifetime tag ('foo) and sometimes annotated explicitly by the programmer. Through lifetimes, the compiler can track places in code where a reference is safe to use, reporting an error at compile time if it's illegal. To achieve this, Rust runs a borrow/reference checking algorithm by using these lifetime tags on references to ensure that you can never access a memory address that has been freed. It also does this so that you cannot free any pointer while it is being used by some other variable. We will go into the details of this in Chapter 5, Memory management and Safety.

 

 

Zero-cost abstractions: Programming is all about managing complexity, which is facilitated by good abstractions. Let's go through a fine example of abstraction in both Rust and Kotlin (a language targeting Java virtual machines (JVM) that lets us write high-level code and is easy to read and reason about. We'll compare Kotlin's streams and Rust's iterators in manipulating a list of numbers and contrast the zero cost abstraction principle that Rust provides. The abstraction here is to be able to use methods that take other methods as arguments to filter numbers based on a condition without using manual loops. Kotlin is used here for its visual similarity with Rust. The code is fairly simple to understand and we aim to give a high-level explanation. We'll be glossing over the details in code as the whole point of this example is to understand the zero cost property.

First, let's look at the code in Kotlin (the following code can be run online: https://try.kotlinlang.org):

1. import java.util.stream.Collectors
2. 
3. fun main(args: Array<String>) {
5.     // Create a stream of numbers
6.     val numbers = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).stream()
7.     val evens = numbers.filter { it -> it % 2 == 0 } 
8.     val evenSquares = evens.map { it -> it * it }  
9.     val result = evenSquares.collect(Collectors.toList())
10.    println(result)       // prints [4,16,36,64,100]
11.    
12.    println(evens)
13.    println(evenSquares)
14. }

We create a stream of numbers (line 6) and call a chain of methods (filter and map) to transform the elements to collect only squares of even numbers. These methods can take a closure or a function (that is, it -> it * it at line 8) to transform each element in the collection. In functional style languages, when we call these methods on the stream/iterator, for every such call, the language creates an intermediate object to keep any state or metadata in regard to the operation being performed. As a result, evens and evenSquares will be two different intermediate objects that are allocated on the JVM heap. Allocating things on the heap incurs a memory overhead. That's the extra cost of abstraction we have to pay in Kotlin !

When we print the value of evens and evenSquares, we indeed get different objects, as show here:

[email protected] 

[email protected]

The hex value after the @ is the object's hash code on the JVM. Since the hash codes are different, they are different objects.

In Rust, we do the same thing (the following code can be run online: https://gist.github.com/rust-play/e0572da05d999cfb6eb802d003b33ffa):

1. fn main() {
2.     let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
3.     let evens = numbers.filter(|x| *x % 2 == 0);
4.     let even_squares = evens.clone().map(|x| x * x);
5.     let result = even_squares.clone().collect::<Vec<_>>();
6.     println!("{:?}", result);      // prints [4,16,36,64,100]
7.     println!("{:?}\n{:?}", evens, even_squares);
8. }

Glossing over the details, on line 2 we call vec![] to create a list of numbers on the heap, followed by calling into_iter() to make it a iterator/stream of numbers. The into_iter() method creates a wrapper Iterator type, IntoIter([1,2,3,4,5,6,7,8,9,10]), out of a collection (here, Vec <i32> is a list of signed 32 bit integers). This iterator type references the original list of numbers. We then perform filter and map transformations (lines 3 and 4), just like we did in Kotlin. Lines 7 and 8 print the type of evens and even_squares, as follows (some details have been omitted for brevity):

evens:

 Filter { iter: IntoIter( <numbers> ) } 

even_squares:

 Map { iter: Filter { iter: IntoIter( <numbers> ) }}

 

The intermediate objects, Filter and Map, are wrapper types (not allocated on the heap) on the base iterator structure, which itself is a wrapper that holds a reference to the original list of numbers at line 2. The wrapper structures on lines 4 and 5 that get created on calling filter and map, respectively, do not have any pointer indirection in between and impose no heap allocation overhead, as was the case with Kotlin. All of this boils down to efficient assembly code, which would be equivalent to the manually written version using loops.

Fearless concurrency: When we said Rust is concurrent-safe, we meant that the language has Application Programming Interface (API) and abstractions that make it really easy to write correct and safe concurrent code. Contrasting this with C++, the possibility of making mistakes in concurrent code is quite high. When synchronizing data access to multiple threads in C++, you are responsible for calling mutex.lock() every time you enter the critical section, and mutex.unlock() when you exit this section:

// C++

mutex.lock();                         // Mutex locked, good to go 
 // Do super critical stuff
mutex.unlock();                       // We're done

Note

Critical section: This is a group of instructions/statements that need to be executed atomically. Here, atomically means no other thread can interrupt the currently executing thread in the critical section, and no intermediate value is perceived by any thread during execution of code in the critical section.

In a large code base with many developers collaborating on the code, you might forget to call mutex.lock() before accessing the shared object from multiple threads, which can lead to data races. Others cases, you might forget to unlock the mutex and starve the other threads that want access to the data.

Rust has a different take on this. Here, you wrap your data in a Mutex type to ensuring synchronized mutable access to data from multiple threads:

// Rust

use std::sync::Mutex;

fn main() {
    let value = Mutex::new(23);
    *value.lock().unwrap() += 1;   // modify
}                                  // unlocks here automatically

 

 

In the preceding code, we were able to modify the data after calling lock() on value. Rust uses the notion of protecting the shared data itself and not code. The interaction with Mutex and the protected data is not independent, as is the case with C++. You cannot access the inner data without calling lock on the Mutex type. What about releasing the lock ? Well, calling lock() returns something called  MutexGuard, which automatically releases the lock when the variable goes out of scope. It's one of the many safe concurrency abstractions Rust provides. We'll go into detail on them in Chapter 8, Concurrency. Another novel idea is the notion of marker traits, which validate and ensure synchronized and safe access to data in concurrent code at compile time. Traits are described in detail in Chapter 4, Types, Generics, and Traits. Types are annotated with marker traits called Send and Sync to indicate whether they are safe to send to threads or safe to share between threads, respectively. When a program sends a value to a thread, the compiler checks whether the value implements the required marker trait and forbids the usage of the value if it isn't the case. In this way, Rust allows you to write concurrent code without fear, where the compiler catches mistakes in multi-threaded code at compile time. Writing concurrent code is already hard. With C/C++, it gets even harder and more arcane. CPUs aren't getting more clock rates; instead, we have more cores being added. As a result, concurrent programming is the way forward. Rust makes it a breeze to write concurrent code and lowers the bar for many people to get into writing safe, concurrent code.

Rust also employs C++'s RAII idiom for resource initialization. This technique basically ties a resource's lifetime to objects' lifetimes, whereas the deallocation of heap allocated types is performed through the drop method, which is provided by the drop trait. This is automatically called when the variable goes out of scope. It also replaces the concept of null pointers with Result and Option types, which we'll go into detail in Chapter 6, Error Handling. This means that Rust doesn't allow null/undefined values in code, except when interacting with other languages through foreign function interfaces and when using unsafe code. The language also puts emphasis on composition over inheritance and has a trait system, which is implemented by data types and is similar to Haskell typeclasses, also known as Java interfaces on steroids. Traits in Rust are the backbone to many of its features, as we'll see in upcoming chapters.

Last but not least, Rust's community is quite active and friendly, and the language has comprehensive documentation, which can be found at https://doc.rust-lang.org. For the third year in a row (2016, 2017, and 2018), Stack Overflow's Developer Survey highlights Rust as the most-loved programming language, so it can be said that the overall programming community is very interested in it. To summarize, you should care about Rust if you aim to write high performing software with less bugs while enjoying many modern language features and an awesome community!

 

Installing the Rust compiler and toolchain


The Rust toolchain has two major components: the compiler, rustc, and the package manager, cargo, which helps manage Rust projects. The toolchain comes in three release channels:

  • Nightly: The daily successful build from the master development branch. This contains all the latest features, many of which are unstable.
  • Beta: This is released every six weeks. A new beta branch is taken from nightly. It contains only features that are flagged as stable.
  • Stable: This is released every six weeks. The previous beta branch becomes the new stable release.

Developers are encouraged to use the stable release channel. However, the nightly version enables bleeding edge features, and some libraries and programs require it. You can change to the nightly toolchain easily with rustup. We'll see how we can do that in a moment.

Using rustup.rs

Rustup is a tool to that installs the Rust compiler on all supported platforms. To make it easier for developers on different platforms to download and use the language, the Rust team developed rustup. It's a command-line tool written in Rust that provides an easy way to install pre-built binaries of the compiler and binary builds of the standard library for cross compiling needs. It can also install other components, such as the Rust source code, documentation, Rust formatting tool (rustfmt), Rust Language Server (RLS for IDEs), and other developer tools, and it runs on all platforms, including Windows.

From their official page at https://rustup.rs, the recommended way to install the toolchain is to run the following command:

curl https://sh.rustup.rs -sSf | sh

By default, the installer installs the stable version of the Rust compiler, its package manager, Cargo, and the language's standard library documentation so that it can be viewed offline. These are installed by default under the ~/.cargo directory. Rustup also updates your PATH environment variable to point to this directory.

The following is a screenshot of running the preceding command on Ubuntu 16.04:

If you need to make any changes to your installation, choose 2. However, the defaults are fine for us, so we'll go ahead and choose 1. Here's the output after the installation:

 

Rustup also has other capabilities, such as updating the toolchain to the latest version, which can be done by running rustup update. It can also update itself via rustup self update. It also provides directory-specific toolchain configuration. The default toolchain is set globally to whatever toolchain gets installed, which in most cases is the stable toolchain. You can view the default one by invoking rustup show. If you want to use the latest nightly toolchain for one of your projects, you can tell rustup to switch to nightly for that particular directory by running rustup override set nightly. If, for some reason, someone wants to use an older version of the toolchain or downgrade (say, the nightly build on 2016-06-03), rustup can also download that if we were to run rustup install nightly-2016-06-03, followed by setting the same using the override sub-command. More information on rustup can be found at https://github.com/rust-lang-nursery/rustup.rs.

Note

Note: All of the code examples and projects in this book are based on compiler version rustc 1.32.0 (9fda7c223 2019-01-16).

Now, you should have everything you need to compile and run programs written in Rust. Let's get Rusty!

 

A tour of the language


For the fundamental language features, Rust does not stray far from what you are used to in other languages. At a high level, a Rust program is organized into modules, with the root module containing a main() function. For executables, the root module is usually a main.rs file and for libraries, a lib.rs file. Within a module, you can define functions, import libraries, define types, create constants, write tests and macros, or even create nested modules. We'll see all of them, but let's start with the basics. Here's a simple Rust program that greets you:

// greet.rs

1. use std::env;
2. 
3. fn main() {
4.    let name = env::args().skip(1).next();
5.    match name {
6.       Some(n) => println!("Hi there ! {}", n),
7.       None => panic!("Didn't receive any name ?")
8.    }
9. }

Let's compile and run this program. Write it to a file called greet.rs and run rustc with the file name, and pass your name as the argument. I passed the name Ferris, Rust's unofficial mascot, and got the following output on my machine:

Awesome! It greets Ferris. Let's get a cursory view of this program, line by line.

On line 1, we import a module called env from the std crate (libraries are called crates). std is the standard library for Rust. On line 3, we have our usual function main. Then, on line 4, we call the function args() from the env module, which returns an iterator (sequence) of arguments that has been passed to our program. Since the first argument contains our program name, we want to skip it, so we call skip and pass in a number, which is how many elements (1) we want to skip. As iterators are lazy and do not pre-compute things in Rust, we have to explicitly ask it to give the next element, so we call next(), which returns an enum type called Option. This can be either a Some(value) value or a None value because a user might forget to provide an argument.

On line 5, we use Rust's awesome match expression on the variable name and check whether it's a Some(n) or a None value. match is like the if else construct, but more powerful. On line 6, when it's a Some(n), we call println!(), passing in our inner string variable n (this gets auto-declared when using match expressions), which then greets our user. The println! call is not a function, but a macro (they all end with a !). Finally, on line 7, if it's a None variant of the enum, we just panic!() (another macro), which aborts the program, making it leave an error message.

The println! macro, as we saw, accepts a string, which can contain placeholders for items using the "{}" syntax. These strings are called format strings, while the "{}" in the string are called format specifiers. For printing simple types such as primitives, we can use the "{}" format specifier, whereas for other types, we use the "{:?}" format specifier. There are more details to this, though. When println! encounters a format specifier, that is, "{}", and a corresponding substitution value, it calls a method on that value, which returns a string representation of it. This method is part of a trait. For the "{}" specifier, it calls a method from the Display trait, whereas for "{:?}", it calls a method from the Debug trait. The latter is mostly used for debugging, while the former is for displaying a human readable output of data types. It is somewhat similar to the toString() method in Java. When developing, you usually need to print your data types for debugging. The cases where these methods are not available on a type when using the "{:?}" specifier, we then need to add a #[derive(Debug)]attribute over the type to get those methods. We'll explain attributes in detail in subsequent chapters, but expect to see this in future code examples. We'll also revisit the println! macro in  Chapter 9, Metaprogramming with Macros.

Running rustc manually is not how you will do this for real programs, but it will do for these small programs in this chapter. In subsequent chapters, we will be using Rust's package manager to build and run our programs. Apart from running the compiler locally, another tool that can be used to run the code examples is the official online compiler called Rust playground, which can be found at http://play.rust-lang.org. Following is the screenshot from my machine:

The Rust playground also supports external libraries to be imported and to be used when trying out sample programs.

With the previous example, we got a high-level overview of a basic Rust program, but did not dive into all of the details and the syntax. In the following section, we will explain the language features separately and their syntax. The explanations that follow are here to give you enough context so that you can quickly get up and running in regard to writing Rust programs without going through all of the use cases exhaustively. To make it brief, each section also contains references to chapters that explain these concepts in more detail. Also, the Rust documentation page at https://doc.rust-lang.org/std/index.html will help you get into the details and is very readable with its built-in search feature. You are encouraged to proactively search for any of the constructs that are explained in the following sections. This will help you gain more context about the concepts you're learning about.

 All of the code examples in this chapter can be found in this book's GitHub repository (PacktPublishing/The-Complete-Rust-Programming-Reference-Guide).

Note

Some of the code files are deliberately presented to not compile so that you can fix them yourselves with the help of the compiler.

With that said, let's start with the fundamental primitive types in Rust.

Primitive types

Rust has the following built-in primitive types:

  • bool: These are the usual booleans and can be either true or false .
  • char: Characters, such as e.
  • Integer types: These are characterized by the bit width. Rust supports integers that are up to 128 bits wide:

 

signed

unsigned

i8

u8

i16

u16

i32

u32

i64

u64

i128

u128

  • isize: The pointer-sized signed integer type. Equivalent to i32 on 32-bit CPU and i64 on 64-bit CPU.
  • usize: The pointer-sized unsigned integer type. Equivalent to i32 on 32-bit CPU and i64 on 64-bit CPU.
  • f32: The 32-bit floating point type. Implements the IEEE 754 standard for floating point representation.
  • f64: The 64-bit floating point type.
  • [T; N]: A fixed-size array, for the element type, T, and the non-negative compile-time constant size N.
  • [T]: A dynamically-sized view into a contiguous sequence, for any type T.
  • str: String slices, mainly used as a reference, that is, &str.
  • (T, U, ..): A finite sequence, (T, U, ..) where T and U can be different types.
  • fn(i32) -> i32: A function that takes an i32 and returns an i32. Functions also have a type.

Declaring variables and immutability

Variables allow us to store a value and easily refer to it later in code. In Rust, we use the let keyword to declare variables. We already had a glimpse of it in the greet.rs example in the previous section. In mainstream imperative languages such as C or Python, initializing a variable does not stop you from reassigning it to some other value. Rust deviates from the mainstream here by making variables immutable by default, that is, you cannot assign the variable to some other value after you have initialized it. If you need a variable to point to something else (of the same type) later, you need to put the mut keyword before it. Rust asks you to be explicit about your intent as much as possible. Consider the following code:

// variables.rs

fn main() {
    let target = "world";
    let mut greeting = "Hello";
    println!("{}, {}", greeting, target);
    greeting = "How are you doing";
    target = "mate";
    println!("{}, {}", greeting, target);
}

We declared two variables, target and greeting. target is an immutable binding, while greeting has a mut before it, which makes it a mutable binding. If we run this program, though, we get the following error:

 

As you can see from the preceding error message, Rust does not let you assign to target again. To make this program compile, we'll need to add mut before target in the let statement and compile and run it again. The following is the output when you run the program:

$ rustc variables.rs
$ ./variables
Hello, world
How are you doing, mate

let does much more than assign variables. It is a pattern-matching statement in Rust. In Chapter 7, Advanced Concepts, we'll take a closer look at let. Next, we'll look at functions.

Functions

Functions abstract a bunch of instructions into named entities, which can be invoked later by other code and help manage complexity. We already used a function in our greet.rs program, that is, the main function. Let's look at how we can define another one:

// functions.rs

fn add(a: u64, b: u64) -> u64 {
    a + b
}

fn main() {
    let a: u64 = 17;
    let b = 3;
    let result = add(a, b);
    println!("Result {}", result);
}

In the preceding code, we created a new function named add. The fn keyword is used to create functions followed by its name, add, its parameters inside parentheses a and b, and the function body inside {} braces. The parameters have their type on the right, after the colon :. Return types in functions are specified using a ->, followed by the type, u64, which can be omitted if the function has nothing to return. Functions also have types. The type of our add function is denoted as fn(u64, u64) -> u64. They can also be stored in variables and passed to other functions.

If you look at the body of add, we don't need a return keyword to return a + b as in other languages. The last expression is returned automatically. However, we do have the return keyword available for early returns. Functions are basically expressions that return a value, which is a () (Unit) type by default, akin to the void return type in C/C++. They can also be declared within other functions. The use case for that is when you have a functionality within a function (say, foo) that is hard to reason as a sequence of statements. In this case, one can extract those lines in a local function, bar, which is then defined within the parent function, foo.

In main, we declared two variables, a and b, using the let keyword. As is the case with b, we can even omit specifying the type as Rust is able to infer types of variables in most cases by examining your code. This is also the case with the result, which is a u64 value. This feature helps prevent type signature clutter and improves the readability of code, especially when your types are nested inside several other types that have long names.

Note

Rust's type inference is based on the Hindly Milner type system. It's a set of rules and algorithms that enable type inference in a programming language. It's an efficient type inference method that performs in linear time, making it practical to type check large programs.

We can also have functions that modify their arguments. Consider the following code:

// function_mut.rs

fn increase_by(mut val: u32, how_much: u32) {
    val += how_much;
    println!("You made {} points", val);
}

fn main() {
    let score = 2048;
    increase_by(score, 30);
}

We declare a  score variable with 2048 as the value, and call the increase_by function, passing score and the value 30 as the second argument. In the increase_by function, we have specified the first parameter as mut val, indicating that the parameter should be taken as mutable, which allows the variable to be mutated from inside the function. Our increase_by function modifies the val binding and prints the value. Following is the output when running the program:

$ rustc function_mut.rs 
$ ./function_mut 
You made 2078 points

Next, let's look at closures.

Closures

Rust also has support for closures. Closures are like functions but have more information of the environment or scope in which they are declared. While functions have names associated with them, closures are defined without a name, but they can be assigned to a variable. Another advantage of Rust's type inference is that, in most cases, you can specify parameters for a closure without their type. Here's the the simplest possible closure: let my_closure = || ();. We just defined a no-parameter closure that does nothing. We can call this by invoking my_closure(), just like functions. The two vertical bars || hold the parameters for the closure (if any), such as |a, b|. Specifying the types of parameters (|a: u32|) is sometimes required when Rust cannot figure out the proper types. Like functions, closures can also be stored in variables and invoked later or passed to other functions. The body of the closure, however, can either have a single line expression or a pair of braces for multi-line expressions. A more involved closure would be as follows:

// closures.rs

fn main() {
    let doubler = |x| x * 2;
    let value = 5;
    let twice = doubler(value);
    println!("{} doubled is {}", value, twice);

    let big_closure = |b, c| {
        let z = b + c;
        z * twice
    };

    let some_number = big_closure(1, 2);
    println!("Result from closure: {}", some_number);
}

In the preceding code, we have defined two closures: doubler and big_closure. doubler doubles a value given to it; in this case, it is passed value from the parent scope or environment, that is, the function main. Similarly, in big_closure, we use the variable twice from its environment. This closure has multi-line expressions within braces and needs to end with a semi-colon to allow us to assign it to the  big_closure variable. Later, we call big_closure, passing in 1, 2, and print some_number.

The major use case for closures are as parameters to higher-order functions. A higher-order function is a function that takes another function or closure as its argument. For example, the thread::spawn function from the standard library takes in a closure where you can write code you want to run in another thread. Another example where closures provide a convenient abstraction is when you have a function that operates on collection such as Vec and you want to filter the items based on some condition. Rust's Iterator trait has a method called filter, which takes in a closure as an argument. This closure is defined by the user and it returns either true or false, depending on how the user wants to filter the items in the collection. We'll get more in-depth with closures in Chapter 7, Advanced Concepts.

Strings

Strings are one of the most frequently used data types in any programming language. In Rust, they are usually found in two forms: the &str type (pronounced stir) and the String type. Rust strings are guaranteed to be valid UTF-8 encoded byte sequences. They are not null terminated as in C strings and can contain null bytes in-between them. The following program shows the two types in action:

// strings.rs

fn main() {
    let question = "How are you ?";            // a &str type
    let person: String = "Bob".to_string();
    let namaste = String::from("नमस्ते");        // unicodes yay!

    println!("{}! {} {}", namaste, question, person);
}

In the preceding code, person and namaste are of type String, while question is of type &str. There are multiple ways you can create String types. Strings are allocated on the heap, while &str types are usually pointers to an existing string, which could either be on stack, the heap, or a string in the data segment of the compiled object code. The & is an operator that is used to create a pointer to any type. After initializing the strings in the preceding code, we then use the println! macro to print them together using format strings. That's the very basics of strings. Strings are covered in detail in Chapter 7,  Advanced Concepts.

Conditionals and decision making

Conditionals are also similar to how they're found in other languages. They follow the C-like if {} else {} structure:

// if_else.rs

fn main() {
    let rust_is_awesome = true;
    if rust_is_awesome {
        println!("Indeed");
    } else {
        println!("Well, you should try Rust !");
    }
}

In Rust, the if construct is not a statement, but an expression. In general programming parlance, statements do not return any value, but an expression does. This distinction means that if else conditionals in Rust always return a value. The value may be an empty () unit type, or it may be an actual value. Whatever remains in the last line inside the braces becomes the return value of the if else expression. It is important to note that both if and else branches should have the same return type. Also, we don't need parentheses around the if condition expression, as you can see in the preceding code. We can even assign the value of if else blocks to a variable:

// if_assign.rs

fn main() {
    let result = if 1 == 2 { 
        "Wait, what ?" 
    } else { 
        "Rust makes sense" 
    };

    println!("You know what ? {}.", result);
}

When assigning values that have been returned from an if else expression, we need to end them with a semicolon. For example, if { ... is an expression, while let is a statement that expects us to have a semicolon at the end. In the case of assignment, if we were to remove the else {} block from the preceding code, the compiler would throw an error, like so:

Without the else block, if the if condition evaluates to false, then the result will be (), and there would be two possible values for the result variable, that is, () and &str. Rust does not allow multiple types to be stored in one variable. So, in this case, we need both the if {} and else {} blocks returning the same types. Also, adding a semicolon in the conditional branches changes the meaning of the code. By adding a semicolon after the strings in the if block in the following code, the compiler would interpret it as you wanting to throw the value away:

// if_else_no_value.rs

fn main() { 
    let result = if 1 == 2 { 
        "Nothing makes sense"; 
    } else { 
        "Sanity reigns"; 
    };

    println!("Result of computation: {:?}", result); 
}

In this case, the result will be an empty (), which is why we had to change the println! expression slightly (the {:?}); this type cannot be printed out in the regular way. Now, for the more complex multi-valued decision making; Rust has another powerful construct called match expressions, which we'll look at next.

Match expressions

Rust's match expressions are quite a joy to use. It's basically C's switch statement on steroids and allows you to make decisions, depending on what value the variable has and whether it has advanced filtering capabilities. Here's a program that uses match expressions:

// match_expression.rs

fn req_status() -> u32 {
    200
}

fn main() {
    let status = req_status();
    match status {
        200 => println!("Success"),
        404 => println!("Not Found"),
        other => {
            println!("Request failed with code: {}", other);
            // get response from cache
        }
    }
}

In the preceding code, we have a  req_status, function that returns a dummy HTTP request status code of 200, which we call in main and assign to status. We then match on this value using the match keyword, followed by the variable we want to check the value of (status), followed by a pair of braces. Within braces, we write expressions – these are called match arms. These arms represent the possible values that the variable being matched can take. Each match arm is written by writing the possible value of the variable, followed by a =>, and then the expression on the right. To the right, you can either have a single line expression or a multi-line expression within {} braces. When written in a single line expression, they need to be delimited with a comma. Also, every match arm must return the same type. In this case, each match arm returns a Unit type ().

 

 

Another nice feature or you can call guarantee of match expressions is that we have to match exhaustively against all possible cases of the value we are matching against. In our case, this would be listing all the numbers up until the maximum value of i32. However, practically, this is not possible, so Rust allows us to either ignore the rest of the possibilities by using a catch all variable (here, this is other) or an _ (underscore) if we want to ignore the value. Match expressions are a primary way to make decisions around values when you have more than one possible value and they are very concise to write. Like if else expressions, the return value of a match expression can also be assigned to a variable in a let statement when it's delimited with a semicolon, with all match arms returning the same types.

Loops

Repeating things in Rust can be done using three constructs, namely loop, while, and for. In all of them, we have the usual continue and break keywords, which allow you to skip and break out of a loop, respectively. Here's an example of using loop, which is equivalent to C's while(true):

// loops.rs 

fn main() { 
    let mut x = 1024;
    loop { 
        if x < 0 { 
            break; 
        } 
        println!("{} more runs to go", x); 
        x -= 1; 
    } 
}

loop represents an infinite loop. In the preceding code, we simply decrement the value x until it hits the if condition x < 0, where we break out of the loop. An extra feature of using loop in Rust is being able to tag the loop block with a name. This can be used in cases where you have two or more nested loops and want to break out from any one of them and not just the loop immediately enclosing the break statement. The following is an example of using loop labels to break out of the loop:

// loop_labels.rs

fn silly_sub(a: i32, b: i32) -> i32 {
    let mut result = 0;
    'increment: loop {
        if result == a {
            let mut dec = b;
            'decrement: loop {
                if dec == 0 {
                    // breaks directly out of 'increment loop
                    break 'increment;
                } else {
                    result -= 1;
                    dec -= 1;
                }
            }
        } else {
            result += 1;
        }
    }
    result
}

fn main() {
    let a = 10;
    let b = 4;
    let result = silly_sub(a, b);
    println!("{} minus {} is {}", a, b, result);
}

In the preceding code, we are doing a very inefficient subtraction just to demonstrate the usage of labels with nested loops. In the inner 'decrement label, when dec equals 0, we can pass a label to break (here, this is 'increment) and break out of the outer 'increment loop instead.

Now, let's take a look at while loops. Nothing fancy here:

// while.rs 

fn main() { 
    let mut x = 1000; 
    while x > 0 { 
        println!("{} more runs to go", x); 
        x -= 1;     
    }
}

Rust also has a for keyword and is similar to for loops used in other languages, but they are quite different in their implementation. Rust's for is basically a syntax sugar for a more powerful repetition construct known as iterators. We'll discuss them in more detail in Chapter 7, Advanced Concepts. To put it simply, for loops in Rust only work on types that can be converted into iterators. One such type is the Range type. The Range type can refer to a range of numbers, such as (0..10). They can be used in for loops like so:

// for_loops.rs

fn main() {
    // does not include 10
    print!("Normal ranges: ");
    for i in 0..10 {
        print!("{},", i);
    }

    println!();       // just a newline
    print!("Inclusive ranges: ");
    // counts till 10
    for i in 0..=10 {
        print!("{},", i);
    }
}

Apart from the normal range syntax, that is, 0..10, which does not include 10, Rust also has inclusive range syntax 0..=10, which iterates all the way until 10, as can be seen in the second for loop. Now, let's move on to user-defined data types.

User-defined types

As the name says, user-defined types are types that are defined by you. These can be composed of several types. They may either be a wrapper over a primitive type or a composition of several user defined types. They come in three forms: structures, enumerations, and unions, or more commonly known as structs, enums, and unions. They allow you to easily express you data. The naming convention for user-defined types follows the CamelCase style. Structs and enums are more powerful than C's structs and enums, while unions in Rust are very close to C and are there mainly to interact with C code bases. We'll cover structs and enums in this section, while unions are covered in Chapter 7, Advanced Concepts.

 

 

Structs

In Rust, there are three forms of structs that we can declare. The simplest of them is the unit struct, which is written with the struct keyword, followed by its name and a semicolon at the end. The following code example defines a unit struct:

// unit_struct.rs

struct Dummy;

fn main() {
    let value = Dummy;
}

We have defined a unit struct called Dummy in the preceding code. In main, we can initialize this type using only its name. value now contains an instance of Dummy and is a zero sized value. Unit structs do not take any size at runtime as they have no data associated with them. There are very few use cases for unit structs. They can be used to model entities with no data or state associated with them. Another use case is to use them to represent error types, where the struct itself is sufficient to understand the error without needing a description of it. Another use case is to represent states in a state machine implementation. Next, let's look at the second form of structs.

The second form of struct is the tuple struct, which has associated data. Here, the individual fields are not named, but are referred to by their position in the definition. Let's say you are writing a color conversion/calculation library for use in your graphics application and want to represent RGB color values in code. We can represent our Color type and the related items like so:

// tuple_struct.rs 

struct Color(u8, u8, u8);

fn main() {
    let white = Color(255, 255, 255);

    // You can pull them out by index
    let red = white.0;
    let green = white.1;
    let blue = white.2;

    println!("Red value: {}", red);
    println!("Green value: {}", green);
    println!("Blue value: {}\n", blue);

    let orange = Color(255, 165, 0);

    // You can also destructure the fields directly
    let Color(r, g, b) = orange;
    println!("R: {}, G: {}, B: {} (orange)", r, g, b);

    // Can also ignore fields while destructuring
    let Color(r, _, b) = orange;
}

In the preceding code, Color(u8, u8, u8) is a tuple struct that was created and stored in white. We then access the individual color components in white using the white.0 syntax. Fields within the tuple struct can be accessed by the variable.<index> syntax, where the index refers to the position of the field in the struct, which starts with 0. Another way to access the individual fields of a struct is by destructuring the struct using the let statement. In the second part, we created a color orange. Following that, we wrote the let statement with Color(r, g, b) on the left-hand side and to the right we put our orange. This results in three fields in orange getting stored within the r, g, and b variables. The types of r, g, and b are also inferred automatically for us.

The tuple struct is an ideal choice when you need to model data that has less than four or five attributes. Anything more than that hinders readability and reasoning. For a data type that has more than three fields cases, it's recommended to use a C-like struct, which is the third form and the most commonly used one. Consider the following code:

// structs.rs

struct Player {
    name: String,
    iq: u8,
    friends: u8,
    score: u16
}

fn bump_player_score(mut player: Player, score: u16) {
    player.score += 120;
    println!("Updated player stats:");
    println!("Name: {}", player.name);
    println!("IQ: {}", player.iq);
    println!("Friends: {}", player.friends);
    println!("Score: {}", player.score);
}

fn main() {
    let name = "Alice".to_string();
    let player = Player { name,
                          iq: 171,
                          friends: 134,
                          score: 1129 };

   bump_player_score(player, 120);
}

In the preceding code, structs  are created in the same way as tuple structs, that is, by writing the struct keyword followed by the name of the struct. However, they start with braces and their field declarations are named. Within braces, we can write fields as field: type comma-separated pairs. Creating an instance of a struct is also simple; we write Player, followed by a pair of braces, which contains comma-separated field initializations. When initializing a field from a variable that has the same name as the field name, we can use the field init shorthand feature, which is the case with the name field in the preceding code. We can then access the fields from the created instance easily by using the struct.field_name syntax. In the preceding code, we also have a function called bump_player_score, which takes the struct Player as a parameter. Function arguments are immutable by default, so when we want to modify the score of the player, we need to change the parameter to mut player in our function, which allows us to modify any of its fields. Having a mut on the struct implies mutability for all of its fields.

The advantage of using a struct rather than a tuple struct is that we can initialize the fields in any order. It also allows us to provide meaningful names to the fields. As a side note, the size of a struct is simply the sum of its individual field members, along with any data alignment padding, if required. They don't have any extra metadata size overhead associated with them. Next, let's look at enumerations, also known as enums.

Enums

When you need to model something that can be of different kinds, enums are the way to go. They are created using the enum keyword, followed by the name of the enum, followed by a pair of braces. Within braces, we can write all the possibilities of the type, which are called variants. These variants can be defined with or without data contained in them, and the data contained can be any primitive type, structs, tuple structs, or even an enum. However, in the recursive case, where you have an enum, Foo, and also a variant which holds Foo, the variant needs to be behind a pointer (Box, Rc, and so on) type to avoid having recursively infinite type definitions. Because enums can also be created on the stack, they need to have a predetermined size, and infinite type definitions makes it impossible to determine the size at compile time. Now, let's take a look at how to create one:

// enums.rs

enum Direction { 
    N, 
    E, 
    S, 
    W
}

enum PlayerAction {
    Move {
        direction: Direction,
        speed: u8
    },
    Wait, 
    Attack(Direction)   
}

fn main() {
    let simulated_player_action = PlayerAction::Move {
        direction: Direction::N,
        speed: 2,
    };
    match simulated_player_action {
        PlayerAction::Wait => println!("Player wants to wait"),
        PlayerAction::Move { direction, speed } => {
          println!("Player wants to move in direction {:?} with speed {}",
                direction, speed)
        }
        PlayerAction::Attack(direction) => {
            println!("Player wants to attack direction {:?}", direction)
        }
    };
}

The preceding code defines two enum types: Direction and PlayerAction. We then create an instance of them by choosing any variant, such as Direction::N or PlayerAction::Wait using the double colon :: in between. Note that we can't have something like an uninitialized enum, and it needs to be one of the variants. Given an enum value, to see what variant an enum instance has, we use pattern matching by using match expressions. When we match on enums, we can directly destructure the contents of the variants by putting variables in place of fields such as direction in PlayerAction::Attack(direction), which in turn means that we can use them inside our match arms.

As you can see in our preceding Direction enum, we have a #[derive(Debug)] annotation. This is an attribute and it allows Direction instances to be printed using the {:?} format string in println!(). This is done by generating methods from a trait called Debug. The compiler tells us whether the Debug trait is missing and gives suggestions about how to fix it, and so we need the attribute there:

 

From a functional programmer's perspective, structs and enums are also known as Algebraic Data Types (ADTs) because the possible range of values they can represent can be expressed using the rules of algebra. For instance, an enum is called a sum type because the range of values that it can hold is basically the sum of the range of values of its variants, while a struct is called a product type because its range of possible values is the cartesian product of their individual fields' range of values. We'll sometime refer to them as ADTs when talking about them in general.

Functions and methods on types

Types without behavior can be limiting, and it's often the case that we want to have functions or methods on types so that we can return new instances of them rather than constructing them manually or so that we have the ability to the manipulate fields of a user-defined type. We can do this via implblocks, which is read as providing implementations for a type. We can provide implementations for all user-defined types or any wrapper type. First, let's take a look at how to write implementations for a struct.

Impl blocks on structs

We can add behavior to our previously defined Player struct with two functionalities: a constructor-like function that takes a name and sets default values for the remaining fields in Person, and getter and setter methods for the friend count of Person:

// struct_methods.rs

struct Player {
    name: String,
    iq: u8,
    friends: u8
}

impl Player {
    fn with_name(name: &str) -> Player {
        Player {
            name: name.to_string(),
            iq: 100,
            friends: 100
        }
    }

    fn get_friends(&self) -> u8 {
        self.friends
    }

    fn set_friends(&mut self, count: u8) {
        self.friends = count;
    }
}

fn main() {
    let mut player = Player::with_name("Dave");
    player.set_friends(23);
    println!("{}'s friends count: {}", player.name, player.get_friends());
    // another way to call instance methods.
    let _ = Player::get_friends(&player);
}

 

 

We use the impl keyword, followed by the type we are implementing the methods for, followed by braces. Within braces, we can write two kinds of methods:

  • Associated methods: Methods without a self type as their first parameter. The with_name method is called an associated method because it does not have self as the first parameter. It is similar to a static method in object-oriented languages. These methods are available on the type themselves and do not need an instance of the type to invoke them. Associated methods are invoked by prefixing the method name with the struct name and double colons, like so:
      Player::with_name("Dave");
  • Instance methods: Functions that take a self value as its first argument. The self symbol here is similar to self in Python and points to the instance on which the method is implemented (here, this is Player). Therefore, the get_friends() method can only be called on already created instances of the struct:
      let player = Player::with_name("Dave");
      player.get_friends();

If we were to call get_friends with the associated method syntax, that is, Player::get_friends(), the compiler gives the following error:

The error is misleading here, but it indicates that instance methods are basically associated methods with self as the first parameter and that instance.foo() is a syntax sugar. This means that we can call it like this, too: Player::get_friends(&player);. In this invocation, we pass the method an instance of Player, that is, &self is &player.

There are three variants of instance methods that we can implement on types:

  • self as the first parameter. In this case, calling this method won't allow you to use the type later.
  • &self as the first parameter. This method only provides read access to the instance of a type.
  • &mut self as the first parameter. This method provides mutable access to the instance of a type.

Our set_friends method is a &mut self method, which allows us to mutate the fields of player. We need the & operator before self, meaning that self is borrowed for the duration of the method, which is exactly what we want here. Without the ampersand, the caller would move the ownership to the method, which means that the value would get de-allocated after get_friends returns and we would not get to use our Player instance anymore. Don't worry if the terms move and borrowing does not make sense as we explain all of this in Chapter 5, Memory Management and Safety.

Now, onto implementations for enums.

Impl blocks for enums

We can also provide implementations for enums. For example, consider a payments library built in Rust, which exposes a single API called pay:

// enum_methods.rs

enum PaymentMode {
    Debit,
    Credit,
    Paypal
}

// Bunch of dummy payment handlers

fn pay_by_credit(amt: u64) {
    println!("Processing credit payment of {}", amt);
}
fn pay_by_debit(amt: u64) {
    println!("Processing debit payment of {}", amt);
}
fn paypal_redirect(amt: u64) {
    println!("Redirecting to paypal for amount: {}", amt);
}

impl PaymentMode {
    fn pay(&self, amount: u64) {
        match self {
            PaymentMode::Debit => pay_by_debit(amount),
            PaymentMode::Credit => pay_by_credit(amount),
            PaymentMode::Paypal => paypal_redirect(amount)
        }
    }
}

fn get_saved_payment_mode() -> PaymentMode {
    PaymentMode::Debit
}

fn main() {
    let payment_mode = get_saved_payment_mode();
    payment_mode.pay(512);
}

The preceding code has a method called get_saved_payment_mode(), which returns a user's saved payment mode. This can either be a CreditCard, Debit Card, or Paypal. This is best modeled as an enum, where different payment methods can be added as its variants. The library then provides us with a single pay() method to which we can conveniently provide an amount to pay. This method determines which variant of the enum it is and dispatches methods accordingly to the correct payment service provider, without the library consumer worrying about checking which payment method to use.

Enums are also widely used for modeling state machines, and when combined with match statements, they make state transition code very concise to write. They are also used to model custom error types. When enum variants don't have any data associated with them, they can be used like C enums, where the variants implicitly have integer values starting with 0, but can also be manually tagged with integer (isize) values. This is useful when interacting with foreign C libraries.

Modules, imports, and use statements

Languages often provide a way to split large code bases into multiple files to manage complexity. Java follows the convention of a single public class per .java file, while C++ provides us with header files and include statements. Rust is no different and provides us with modules. Modules are a way to namespace or organize code in a Rust program. To allow flexibility in organizing our code, there are multiple ways to create modules. Modules are a complex topic to understand and to make it brief for this section, we'll highlight only the important aspects about using them. Modules are covered in detail in Chapter 2, Managing Projects with Cargo. The following are the key takeaways about modules in Rust:

  • Every Rust program needs to have a root module. In executables, it is usually the main.rs file, and for libraries, it is lib.rs.
  • Modules can be declared within other modules or can be organized as files and directories.
  • To let the compiler know about our module, we need to declare it using the mod keyword, as in mod my_module;, in our root module.
  • To use any of the items within the module, we need to use the use keyword, along with the name of the module. This is known as bringing the item into scope.
  • Items defined within modules are private by default, and you need to use the pub keyword to expose them to their consumers.

That was modules in brief. Some of the advanced aspects of modules are also covered in Chapter 7, Advanced Concepts. Next, let's look at the commonly used collection types that are available in the standard library.

Collections

It's often the case that your program has to process more than one instance of data. For that, we have collection types. Depending on what you want and where your data resides in memory, Rust provides many kinds of built-in types to store a collection of data. First, we have arrays and tuples. Then, we have dynamic collection types in the standard library, of which we'll cover the most commonly used ones, that is, vectors (list of items) and maps (key/value items). Then, we also have references to collection types, called slices, which are basically a view into a contiguous piece of data owned by some other variable. Let's start with arrays first.

Arrays

Arrays have a fixed length that can store items of the same type. They are denoted by [T, N], where T is any type and N is the number of elements in array. The size of the array cannot be a variable, but has to be a literal usize value:

// arrays.rs

fn main() { 
    let numbers: [u8; 10] = [1, 2, 3, 4, 5, 7, 8, 9, 10, 11]; 
    let floats = [0.1f64, 0.2, 0.3]; 

    println!("Number: {}", numbers[5]);
    println!("Float: {}", floats[2]);
}

In the preceding code, we declared an array, numbers, which contains 10 elements for which we specified the type on the left. In the second array, floats, we specified the type as a suffix to the first item of the array, that is, 0.1f64. This is another way to specify types. Next, let's look at tuples.

Tuples

Tuples differ from arrays in the way that elements of an array have to be of the same type, while items in a tuple can be a mix of types. They are heterogeneous collections and are useful for storing distinct types together. They can also be used when returning multiple values from a function. Consider the following code that uses tuples:

// tuples.rs

fn main() { 
    let num_and_str: (u8, &str) = (40, "Have a good day!");
    println!("{:?}", num_and_str);
    let (num, string) = num_and_str;
    println!("From tuple: Number: {}, String: {}", num, string);
}

In the preceding code, num_and_str is a tuple of two items, (u8, &str). We can also extract values from an already declared tuple into individual variables. After printing the tuple, we destructure it on the next line into the  num and string variables, and their types are inferred automatically. That's pretty neat.

Vectors

Vectors are like arrays, except that their content or length doesn't need to be known in advance and can grow on demand. They are allocated on the heap. They can be created by either calling the Vec::new constructor or by using the vec![] macro:

// vec.rs

fn main() {
    let mut numbers_vec: Vec<u8> = Vec::new(); 
    numbers_vec.push(1); 
    numbers_vec.push(2); 

    let mut vec_with_macro = vec![1]; 
    vec_with_macro.push(2);
    let _ = vec_with_macro.pop();    // value ignored with `_`

    let message = if numbers_vec == vec_with_macro {
        "They are equal"
    } else {
        "Nah! They look different to me"
    };

    println!("{} {:?} {:?}", message, numbers_vec, vec_with_macro); 
}

In the preceding code, we created two vectors, numbers_vec and vec_with_macro, in different ways. We can push elements to our vector using push() method and can remove elements using pop(). There are more methods for you to explore if you go to their documentation page: https://doc.rust-lang.org/std/vec/struct.Vec.html . Vectors can also be iterated using the for loop syntax as they also implement the Iterator trait.

Hashmaps

Rust also provides us with maps, which can be used to store key-value data. They come from the std::collections module and are named HashMap. They are created with the HashMap::new constructor function:

// hashmaps.rs

use std::collections::HashMap; 

fn main() { 
    let mut fruits = HashMap::new(); 
    fruits.insert("apple", 3);
    fruits.insert("mango", 6);
    fruits.insert("orange", 2);
    fruits.insert("avocado", 7);
    for (k, v) in &fruits {
        println!("I got {} {}", v, k);
    }

    fruits.remove("orange");
    let old_avocado = fruits["avocado"];
    fruits.insert("avocado", old_avocado + 5);
    println!("\nI now have {} avocados", fruits["avocado"]);
}

In the preceding code, we created a new HashMap called fruits. We then insert some fruits into our fruits map, along with their count, using the insert method. Following that, we iterate over the key value pairs using for loop, where in we take a reference to our fruit map by &fruits, because we only want read access to the key and value. By default, the value will be consumed by the for loop. The for loop in this case returns a two field tuple ((k ,v)). There are also seperate methods keys() and values() available to iterate over just keys and values, respectively. The hashing algorithm used for hashing the keys of the HashMap type is based on the Robin hood open addressing scheme, but can be replaced with a custom hasher depending on the use case and performance. That's about it.

Next, let's look at slices.

Slices

Slices are a generic way to get a view into a collection type. Most use cases are to get a read only access to a certain range of items in a collection type. A slice is basically a pointer or a reference that points to a continuous range in an existing collection type that's owned by some other variable. Under the hood, slices are fat pointers to existing data somewhere in the stack or the heap. By fat pointer, it means that they also have information on how many elements they are pointing to, along with the pointer to the data.

 

 

Slices are denoted by &[T], where T is any type. They are quite similar to arrays in terms of usage:

// slices.rs

fnmain() {
    let mut numbers: [u8; 4] = [1, 2, 3, 4];
    {
        let all: &[u8] = &numbers[..];
        println!("All of them: {:?}", all);
    }

    {
        let first_two: &mut [u8] = &mut numbers[0..2];
        first_two[0] = 100;
        first_two[1] = 99;
    }

    println!("Look ma! I can modify through slices: {:?}", numbers);
}

In the preceding code, we have an array of numbers, which is a stack allocated value. We then take a slice into the array numbers using the &numbers[..] syntax and store in all, which has the type &[u8]. The [..] at the end means that we want to take a full slice of the collection. We need the & here as we can't have slices as bare values – only behind a pointer. This is because slices are unsized types. We'll cover them in detail in Chapter 7, Advanced Concepts. We can also provide ranges ([0..2]) to get a slice from anywhere in-between or all of them. Slices can also be mutably acquired. first_two is a mutable slice through which we can modify the original numbers array.

To the astute observer, you can see that we have used extra pair of braces in the preceding code when taking slices. They are there to isolate code that takes mutable reference of the slice from the immutable reference. Without them, the code won't compile. These concepts will be made clearer to you in Chapter 5, Memory Management and Safety.

Note

Note: The &str type also comes under the category of a slice type (a [u8]). The only distinction from other byte slices is that they are guaranteed to be UTF-8. Slices can also be taken on Vecs or Strings.

Next, let's look at iterators.

 

 

Iterators

An iterator is a construct that provides an efficient way to act on elements of collection types. They are not a new concept, though. In many imperative languages, they are implemented as objects that are constructed from collection types such as lists or maps. For instance, Python's iter(some_list) or C++'s vector.begin() are ways to construct iterators from an existing collection. The main motivation for iterators to exist in the first place is that they provide a higher level abstraction of walking through items of a collection instead of using manual for loops, which are very much prone to off by one errors. Another advantage is that iterators do not read the whole collection in memory and are lazy. By lazy, we mean that the iterator only evaluates or accesses an element in a collection when needed. Iterators can also be chained with multiple transformation operations, such as filtering elements based on a condition, and do not evaluate the transformations until you need them. To access these items when you need them, iterators provide a next() method, which tries to read the next item from the collection. This occurs when the iterator evaluates the chain of computation.

In Rust, an iterator is any type that implements the Iterator trait. This type can then be used in a for loop to walk over its items. They are implemented for most standard library collection types such as Vector, HashMap, BTreeMap, and many more and one can also implement it for their own types.

Note

Note: It only makes sense to implement the Iterator trait if the type has a collection, such as semantics. For instance, it doesn't make sense to implement the iterator trait for a () unit type.

Iterators are frequently used whenever we are dealing with collection types in Rust. In fact, Rust's for loop is desugared into a normal match expression with next calls on the object being iterated over. Also, we can convert most collection types into an iterator by calling iter() or into_iter() on them. That's enough information on iterators – now, we can tackle the following exercise. We'll go deep into iterators and implement one ourselves in Chapter 7, Advanced Concepts.

 

 

 

Exercise – fixing the word counter


Armed with the basics, it's time to put our knowledge to use! Here, we have a program that counts instances of words in a text file, which is passed to it as an argument. It's almost complete, but has a few bugs that the compiler catches and a couple of subtle ones. Here's our incomplete program:

// word_counter.rs

use std::env;
use std::fs::File;
use std::io::prelude::BufRead;
use std::io::BufReader;

#[derive(Debug)]
struct WordCounter(HashMap<String, u64>);

impl WordCounter {
    fn new() -> WordCounter {
        WordCounter(HashMap::new());
    }

    fn increment(word: &str) {
        let key = word.to_string();
        let count = self.0.entry(key).or_insert(0);
        *count += 1;
    }

    fn display(self) {
        for (key, value) in self.0.iter() {
            println!("{}: {}", key, value);
        }
    }
}

fn main() {
    let arguments: Vec<String> = env::args().collect();
    let filename = arguments[1];
    println!("Processing file: {}", filename);

    let file = File::open(filenam).expect("Could not open file");
    let reader = BufReader::new(file);

    let mut word_counter = WordCounter::new();

    for line in reader.lines() {
        let line = line.expect("Could not read line");
        let words = line.split(" ");
        for word in words {
            if word == "" {
                continue
            } else {
                word_counter.increment(word);
            }
        }
    }

    word_counter.display();
}

Go ahead and type the program into a file; try to compile and fix all the bugs with the help of the compiler. Try to fix one bug at a time and get feedback from the compiler by recompiling the code. The point of this exercise, in addition to covering the topics of this chapter, is to make you more comfortable with the error messages from the compiler, which is an important mental exercise in getting to know more about the compiler and how it analyzes your code. You might also be surprised to see how the compiler is quite smart in helping you removing errors from the code.

 

Summary


We covered so many topics in this chapter. We got to know a bit about the history of Rust and the motivations behind the language. We had a brief walkthrough on its design principles and the basic features of the language. We also got a glimpse of how Rust provides rich abstractions through its type system. We learned how to install the language toolchain, and how to use rustc to build and run trivial example programs.

In the next chapter, we'll take a look at the standard way of building Rust applications and libraries using its dedicated package manager, and also set up our Rust development environment with a code editor, which will provide the foundation for all the subsequent exercises and projects in this book.

About the Authors

  • Rahul Sharma

    Rahul Sharma is passionately curious about teaching programming. He has been writing software for the last two years. He got started with Rust with his work on Servo, a browser engine by Mozilla Research as part of his GSoC project. At present, he works at AtherEnergy, where he is building resilient cloud infrastructure for smart scooters. His interests include systems programming, distributed systems, compilers and type theory. He is also an occasional contributor to the Rust language and does mentoring of interns on the Servo project by Mozilla.

    Browse publications by this author
  • Vesa Kaihlavirta

    Vesa Kaihlavirta has been programming since he was five, beginning with C64 Basic. His main professional goal in life is to increase awareness of programming languages and software quality in all industries that use software. He's an Arch Linux Developer Fellow, and has been working in the telecom and financial industry for a decade. Vesa lives in Jyvaskyla, central Finland.

    Browse publications by this author
  • Claus Matzinger

    Claus Matzinger is a software engineer with a very diverse background. After working in a small company maintaining code for embedded devices, he joined a large corporation to work on legacy Smalltalk applications. This led to a great interest in programming languages early on, and Claus became the CTO for a health games start-up based on Scala technology. Since then, Claus' roles have shifted toward customer-facing roles in the IoT database-technology start-up crate.io and, most recently, Microsoft. There, he hosts a podcast, writes code together with customers, and blogs about the solutions arising from these engagements. For more than 5 years, Claus has implemented software to help customers innovate, achieve, and maintain success.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Excellent
Packt books are a great source to learn a topic quickly.

Recommended For You

Mastering PostgreSQL 12 - Third Edition

Master PostgreSQL 12 features such as advanced indexing, high availability, monitoring, and much more to efficiently manage and maintain your database

By Hans-Jürgen Schönig
The Python Workshop

Cut through the noise and get real results with a step-by-step approach to learning Python 3.X programming

By Andrew Bird and 4 more
C++ Data Structures and Algorithm Design Principles

Get started with C++ programming by learning how to build applications using its data structures and algorithms

By John Carey and 2 more
Serverless Architectures with Kubernetes

Deploy, orchestrate, and monitor serverless applications using Kubernetes

By Onur Yılmaz and 1 more