Rust Standard Library Cookbook

4.5 (2 reviews total)
By Jan Nils Ferner , Daniel Durante
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Learning the Basics

About this book

Mozilla’s Rust is gaining much attention with amazing features and a powerful library. This book will take you through varied recipes to teach you how to leverage the Standard library to implement efficient solutions.

The book begins with a brief look at the basic modules of the Standard library and collections. From here, the recipes will cover packages that support file/directory handling and interaction through parsing. You will learn about packages related to advanced data structures, error handling, and networking. You will also learn to work with futures and experimental nightly features. The book also covers the most relevant external crates in Rust.

By the end of the book, you will be proficient at using the Rust Standard library.

Publication date:
March 2018
Publisher
Packt
Pages
360
ISBN
9781788623926

 

Chapter 1. Learning the Basics

In this chapter, we will cover the following recipes:

  • Concatenating strings
  • Using the format! macro
  • Providing a default implementation
  • Using the constructor pattern
  • Using the builder pattern
  • Parallelism through simple threads
  • Generating random numbers
  • Querying with regexes
  • Accessing the command line
  • Interacting with environment variables
  • Reading from stdin
  • Accepting a variable number of arguments
 

Introduction


There are some code snippets and patterns of thought that prove time and again to be the bread and butter of a certain programming language. We will start this book by looking at a handful of such techniques in Rust. They are so quintessential for elegant and flexible code that you will use at least some of them in just about any project you tackle.

The next chapters will then build on this foundation and work hand in hand with Rust's zero costs abstractions, which are as powerful as the ones in higher-level languages. We are also going to look at the intricate inner aspects of the standard library and implement our own similar constructs with the help of fearless concurrency and careful use of unsafe blocks, which enable us to work at the same low level that some system languages, such as C, operate at.

 

Concatenating strings


String manipulation is typically a bit less straightforward in system programming languages than in scripting languages, and Rust is no exception. There are multiple ways to do it, all managing the involved resources differently.

Getting ready

We will assume for the rest of the book that you have an editor open, the newest Rust compiler ready, and a command line available. As of the time of writing, the newest version is 1.24.1. Because of Rust's strong guarantees about backward compatibility, you can rest assured that all of the recipes shown (with the exception of Chapter 10, Using Experimental Nightly Features) are always going to work the same way. You can download the newest compiler with its command-line tools at https://www.rustup.rs.

How to do it...

  1. Create a Rust project to work on during this chapter with cargo new chapter-one
  2. Navigate to the newly created chapter-one folder. For the rest of this chapter, we will assume that your command line is currently in this directory
  3. Inside the src folder, create a new folder called bin
  4. Delete the generated lib.rs file, as we are not creating a library
  5. In the src/bin folder, create a file called concat.rs
  6. Add the following code and run it with cargo run --bin concat:
1  fn main() {
2   by_moving();
3   by_cloning();
4   by_mutating();
5  }
6
7  fn by_moving() {
8   let hello = "hello ".to_string();
9   let world = "world!";
10
11   // Moving hello into a new variable
12   let hello_world = hello + world;
13   // Hello CANNOT be used anymore
14   println!("{}", hello_world); // Prints "hello world!"
15  }
16
17  fn by_cloning() {
18   let hello = "hello ".to_string();
19   let world = "world!";
20
21   // Creating a copy of hello and moving it into a new variable
22   let hello_world = hello.clone() + world;
23   // Hello can still be used
24   println!("{}", hello_world); // Prints "hello world!"
25  }
26
27  fn by_mutating() {
28   let mut hello = "hello ".to_string();
29   let world = "world!";
30
31   // hello gets modified in place
32   hello.push_str(world);
33   // hello is both usable and modifiable
34   println!("{}", hello); // Prints "hello world!"
35  }

How it works...

In all functions, we start by allocating memory for a string of variable length. We do this by creating a string slice (&str) and applying the to_string function on it [8, 18 and 28]. The first way to concatenate strings in Rust, as shown in the by_moving function, is by taking said allocated memory and moving it, together with an additional string slice, into a new variable [12]. This has a couple of advantages:

  • It's very straightforward and clear to look at, as it follows the common programming convention of concatenating with the + operator
  • It uses only immutable data. Remember to always try to write code in a style that creates as little stateful behavior as possible, as it results in more robust and reusable code bases
  • It reuses the memory allocated by hello [8], which makes it very performant

As such, this way of concatenating should be preferred whenever possible. So, why would we even list other ways to concatenate strings? Well, I'm glad you asked, dear reader. Although elegant, this approach comes with two downsides:

  • hello is no longer usable after line [12], as it was moved. This means you can no longer read it in any way
  • Sometimes you may actually prefer mutable data in order to use state in small, contained environments

The two remaining functions address one concern each.by_cloning[17] looks nearly identical to the first function, but it clones the allocated string [22] into a temporary object, allocating new memory in the process, which it then moves, leaving the original hello untouched and still accessible. Of course, this comes at the price of redundant memory allocations at runtime.by_mutating[27] is the stateful way of solving our problem. It performs the involved memory management in-place, which means that the performance should be the same as in by_moving. In the end, it leaves hello mutable, ready for further changes. You may notice that this function doesn't look as elegant as the others, as it doesn't use the + operator. This is intentional, as Rust tries to push you through its design towards moving data in order to create new variables without mutating existing ones. As mentioned before, you should only do this if you really need mutable data or want to introduce state in a very small and manageable context.

 

Using the format! macro


There is an additional way to combine strings, which can also be used to combine them with other data types, such as numbers.

How to do it...

  1. In the src/bin folder, create a file called format.rs
  2. Add the following code and run it with cargo run --bin format
1  fn main() {
2    let colour = "red";
3    // The '{}' it the formatted string gets replaced by the
   parameter
4    let favourite = format!("My favourite colour is {}", colour);
5    println!("{}", favourite);
6     
7    // You can add multiple parameters, which will be
8    // put in place one after another
9    let hello = "hello ";
10   let world = "world!";
11   let hello_world = format!("{}{}", hello, world);
12   println!("{}", hello_world); // Prints "hello world!"
13     
14   // format! can concatenate any data types that
15   // implement the 'Display' trait, such as numbers
16   let favourite_num = format!("My favourite number is {}", 42);
17   println!("{}", favourite_num); // Prints "My favourite number
     is 42"
18     
19   // If you want to include certain parameters multiple times
20   // into the string, you can use positional parameters
21   let duck_duck_goose = format!("{0}, {0}, {0}, {1}!", "duck",
     "goose");
22   println!("{}", duck_duck_goose); // Prints "duck, duck, duck,
     goose!"
23     
24   // You can even name your parameters!
25   let introduction = format!(
26     "My name is {surname}, {forename} {surname}",
27     surname="Bond",
28     forename="James"
29   );
30   println!("{}", introduction) // Prints "My name is Bond, James
     Bond"
31  }

How it works...

The format! macro combines strings by accepting a format string filled with formatting parameters (example, {}, {0}, or {foo}) and a list of arguments, which are then inserted into the placeholders. We are now going to show this on the example in line [16]:

format!("My favourite number is {}", 42);

Let's break down the preceding line of code:

  • "My favourite number is {}" is the format string
  • {} is the formatting parameter
  • 42 is the argument

As demonstrated, format! works not only with strings, but also with numbers. In fact, it works with all structs that implement the Display trait. This means that, by providing such an implementation by yourself, you can easily make your own data structures printable however you want. By default, format! replaces one parameter after another. If you want to override this behavior, you can use positional parameters like {0} [21]. With the knowledge that the positions are zero-indexed, the behavior here is pretty straightforward, {0} gets replaced by the first argument, {1} gets replaced by the second, and so on. At times, this can become a bit unwieldy when using a lot of parameters. For this purpose, you can use named arguments [26], just like in Python. Keep in mind that all of your unnamed arguments have to be placed before your named ones. For example, the following is invalid:

format!("{message} {}", message="Hello there,", "friendo")

It should be rewritten as:

format!("{message} {}", "friendo", message="Hello there,")
 // Returns "hello there, friendo"

There's more...

You can combine positional parameters with normal ones, but it's probably not a good idea, as it can quite easily become confusing to look at. The behavior, in this case, is as follows—imagine that format! internally uses a counter to determine which argument is the next to be placed. This counter is increased whenever format! encounters a {}without a position in it. This rule results in the following:

format!("{1} {} {0} {}", "a", "b") // Returns "b a a b"

There are also a ton of extra formatting options if you want to display your data in different formats. {:?} prints the implementation of the Debug trait for the respective argument, often resulting in a more verbose output. {:.*} lets you specify the decimal precision of floating point numbers via the argument, like so:

format!("{:.*}", 2, 1.234567) // Returns "1.23"

For a complete list, visit https://doc.rust-lang.org/std/fmt/.

All of the information in this recipe applies to println! and print! as well, as it is essentially the same macro. The only difference is that println! doesn't return its processed string but instead, well, prints it!

 

Providing a default implementation


Often, when dealing with structures that represent configurations, you don't care about certain values and just want to silently assign them a standard value.

How to do it...

  1. In the src/bin folder, create a file called default.rs

  2. Add the following code and run it with cargo run --bin default:

1  fn main() {
2    // There's a default value for nearly every primitive type
3    let foo: i32 = Default::default();
4    println!("foo: {}", foo); // Prints "foo: 0"
5 
6 
7    // A struct that derives from Default can be initialized like
     this
8    let pizza: PizzaConfig = Default::default();
9    // Prints "wants_cheese: false
10   println!("wants_cheese: {}", pizza.wants_cheese);
11 
12   // Prints "number_of_olives: 0"
13   println!("number_of_olives: {}", pizza.number_of_olives);
14 
15   // Prints "special_message: "
16   println!("special message: {}", pizza.special_message);
17 
18   let crust_type = match pizza.crust_type {
19     CrustType::Thin => "Nice and thin",
20     CrustType::Thick => "Extra thick and extra filling",
21   };
22   // Prints "crust_type: Nice and thin"
23   println!("crust_type: {}", crust_type);
24 
25 
26   // You can also configure only certain values
27   let custom_pizza = PizzaConfig {
28     number_of_olives: 12,
29     ..Default::default()
30   };
31 
32   // You can define as many values as you want
33   let deluxe_custom_pizza = PizzaConfig {
34     number_of_olives: 12,
35     wants_cheese: true,
36     special_message: "Will you marry me?".to_string(),
37     ..Default::default()
38   };
39
40 }
41 
42  #[derive(Default)]
43  struct PizzaConfig {
44    wants_cheese: bool,
45    number_of_olives: i32,
46    special_message: String,
47    crust_type: CrustType,
48  }
49 
50  // You can implement default easily for your own types
51  enum CrustType {
52    Thin,
53    Thick,
54  }
55  impl Default for CrustType {
56    fn default() -> CrustType {
57      CrustType::Thin
58    }
59  }

How it works...

Nearly every type in Rust has a Default implementation. When you define your own struct that only contains elements that already have a Default, you have the option to derive from Default as well [42]. In the case of enums or complex structs, you can easily write your own implementation of Default instead [55], as there's only one method you have to provide. After this, the struct returned by Default::default() is implicitly inferrable as yours, if you tell the compiler what your type actually is. This is why in line [3] we have to write foo: i32, or else Rust wouldn't know what type the default object actually should become.

If you only want to specify some elements and leave the others at the default, you can use the syntax in line [29]. Keep in mind that you can configure and skip as many values as you want, as shown in lines [33 to 37].

 

Using the constructor pattern


You may have asked yourself how to idiomatically initialize complex structs in Rust, considering it doesn't have constructors. The answer is simple, there is a constructor, it's just a convention rather than a rule. Rust's standard library uses this pattern very often, so we need to understand it if we want to use the std effectively.

Getting ready

In this recipe, we are going to talk about how a user interacts with a struct. When we say user in this context, we don't mean the end user that clicks on the GUI of the app you're writing. We're referring to the programmer that instantiates and manipulates the struct.

How to do it...

  1. In the src/bin folder, create a file called constructor.rs

  2. Add the following code and run it with cargo run --bin constructor:

1  fn main() {
2    // We don't need to care about
3    // the internal structure of NameLength
4    // Instead, we can just call it's constructor
5    let name_length = NameLength::new("John");
6 
7    // Prints "The name 'John' is '4' characters long"
8    name_length.print();
9  }
10 
11  struct NameLength {
12    name: String,
13    length: usize,
14  }
15 
16  impl NameLength {
17    // The user doesn't need to setup length
18    // We do it for him!
19    fn new(name: &str) -> Self {
20      NameLength {
21        length: name.len(),
22        name,
23      }
24    }
25 
26    fn print(&self) {
27      println!(
28        "The name '{}' is '{}' characters long",
29          self.name,
30          self.length
31      );
32    }
33  }

How it works...

If a struct provides a method called new that returns Self, the user of the struct will not configure or depend upon the members of the struct, as they are considered to be in an internal hidden state.

In other words, if you see a struct that has a new function, always use it to create the structure. This has the nice effect of enabling you to change as many members of the struct as you want without the user noticing anything, as they are not supposed to look at them anyway.

The other reason to use this pattern is to guide the user to the correct way of instantiating a struct. If one has nothing but a big list of members that have to be filled with values, one might feel a bit lost. If one, however, has a method with only a few self-documenting parameters, it feels way more inviting.

There's more...

You might have noticed that for our example we really didn't need a length member and could have just calculated a length whenever we print. We use this pattern anyway, to illustrate the point of its usefulness in hiding implementations. Another good use for it is when the members of a struct themselves have their own constructors and one needs to cascade the constructor calls. This happens, for example, when we have a Vec as a member, as we will see later in the book, in the, Using a vector section in Chapter 2, Working with Collections.

Sometimes, your structs might need more than one way to initialize themselves. When this happens, try to still provide a new() method as your default way of construction and name the other options according to how they differ from the default. A good example of this is again vector, which not only provides a Vec::new() constructor but also a Vec::with_capacity(10), which initializes it with enough space for 10 items. More on that again in the Using a vector section in Chapter 2, Working with Collections.

When accepting a kind of string (either &str, that is, a borrowed string slice, or String, that is, an owned string) with plans to store it in your struct, like we do in our example, also considering a Cow. No, not the big milk animal friends. A Cow in Rust is a Clone On Write wrapper around a type, which means that it will try to borrow a type for as long as possible and only make an owned clone of the data when absolutely necessary, which happens at the first mutation. The practical effect of this is that, if we rewrote our NameLength struct in the following way, it would not care whether the called passed a &str or a String to it, and would instead try to work in the most efficient way possible:

use std::borrow::Cow;
struct NameLength<'a> {
    name: Cow<'a, str>,
    length: usize,
}

impl<'a> NameLength<'a> {
    // The user doesn't need to setup length
    // We do it for him!
    fn new<S>(name: S) -> Self
    where
        S: Into<Cow<'a, str>>,
    {
        let name: Cow<'a, str> = name.into();
        NameLength {
            length: name.len(),
            name,
        }
    }

    fn print(&self) {
        println!(
            "The name '{}' is '{}' characters long",
            self.name, self.length
        );
    }
}

If you want to read more about Cow, check out this easy-to-understand blog post by Joe Wilm: https://jwilm.io/blog/from-str-to-cow/.

The Into trait used in the Cow code is going to be explained in the Converting types into each other section in Chapter 5, Advanced Data Structures.

See also

  • Using a vector recipe inChapter 2, Working with Collections
  • Converting types into each other recipe inChapter 5, Advanced Data Structures
 

Using the builder pattern


Sometimes you need something between the customization of the constructor and the implicitness of the default implementation. Enter the builder pattern, another technique frequently used by the Rust standard library, as it allows a caller to fluidly chain together configurations that they care about and lets them ignore details that they don't care about.

How to do it...

  1. In the src/bin folder, create a file called builder.rs

  2. Add all of the following code and run it with cargo run --bin builder:

1  fn main() {
2    // We can easily create different configurations
3    let normal_burger = BurgerBuilder::new().build();
4    let cheese_burger = BurgerBuilder::new()
       .cheese(true)
       .salad(false)
       .build();
5    let veggie_bigmac = BurgerBuilder::new()
       .vegetarian(true)
       .patty_count(2)
       .build();
6
7    if let Ok(normal_burger) = normal_burger {
8      normal_burger.print();
9    }
10   if let Ok(cheese_burger) = cheese_burger {
11     cheese_burger.print();
12   }
13   if let Ok(veggie_bigmac) = veggie_bigmac {
14     veggie_bigmac.print();
15   }
16
17   // Our builder can perform a check for
18   // invalid configurations
19   let invalid_burger = BurgerBuilder::new()
       .vegetarian(true)
       .bacon(true)
       .build();
20   if let Err(error) = invalid_burger {
21     println!("Failed to print burger: {}", error);
22   }
23
24   // If we omit the last step, we can reuse our builder
25   let cheese_burger_builder = BurgerBuilder::new().cheese(true);
26   for i in 1..10 {
27     let cheese_burger = cheese_burger_builder.build();
28     if let Ok(cheese_burger) = cheese_burger {
29       println!("cheese burger number {} is ready!", i);
30       cheese_burger.print();
31     }
32   }
33 }

This is the configurable object:

35 struct Burger {
36    patty_count: i32,
37    vegetarian: bool,
38    cheese: bool,
39    bacon: bool,
40    salad: bool,
41 }
42 impl Burger {
43    // This method is just here for illustrative purposes
44    fn print(&self) {
45        let pretty_patties = if self.patty_count == 1 {
46            "patty"
47        } else {
48            "patties"
49        };
50        let pretty_bool = |val| if val { "" } else { "no " };
51        let pretty_vegetarian = if self.vegetarian { "vegetarian " 
           }
          else { "" };
52        println!(
53            "This is a {}burger with {} {}, {}cheese, {}bacon and
              {}salad",
54            pretty_vegetarian,
55            self.patty_count,
56            pretty_patties,
57            pretty_bool(self.cheese),
58            pretty_bool(self.bacon),
59            pretty_bool(self.salad)
60        )
61    }
62 }

And this is the builder itself. It is used to configure and create a Burger:

64  struct BurgerBuilder {
65    patty_count: i32,
66    vegetarian: bool,
67    cheese: bool,
68    bacon: bool,
69    salad: bool,
70  }
71  impl BurgerBuilder {
72    // in the constructor, we can specify
73    // the standard values
74    fn new() -> Self {
75      BurgerBuilder {
76        patty_count: 1,
77        vegetarian: false,
78        cheese: false,
79        bacon: false,
80        salad: true,
81      }
82    }
83
84    // Now we have to define a method for every
85    // configurable value
86    fn patty_count(mut self, val: i32) -> Self {
87      self.patty_count = val;
88      self
89    }
90
91    fn vegetarian(mut self, val: bool) -> Self {
92      self.vegetarian = val;
93      self
94    }
95    fn cheese(mut self, val: bool) -> Self {
96      self.cheese = val;
97      self
98    }
99    fn bacon(mut self, val: bool) -> Self {
100     self.bacon = val;
101     self
102   }
103   fn salad(mut self, val: bool) -> Self {
104     self.salad = val;
105     self
106   }
107
108   // The final method actually constructs our object
109   fn build(&self) -> Result<Burger, String> {
110     let burger = Burger {
111       patty_count: self.patty_count,
112       vegetarian: self.vegetarian,
113       cheese: self.cheese,
114       bacon: self.bacon,
115       salad: self.salad,
116   };
117   // Check for invalid configuration
118   if burger.vegetarian && burger.bacon {
119     Err("Sorry, but we don't server vegetarian bacon
             yet".to_string())
120     } else {
121       Ok(burger)
122     }
123   }
124 }

How it works...

Whew, that's a lot of code! Let's start by breaking it up.

In the first part, we illustrate how to use this pattern to effortlessly configure a complex object. We do this by relying on sensible standard values and only specifying what we really care about:

let normal_burger = BurgerBuilder::new().build();
let cheese_burger = BurgerBuilder::new()
    .cheese(true)
    .salad(false)
    .build();
let veggie_bigmac = BurgerBuilder::new()
    .vegetarian(true)
    .patty_count(2)
    .build();

The code reads pretty nicely, doesn't it?

In our version of the builder pattern, we return the object wrapped in a Result in order to tell the world that there are certain invalid configurations and that our builder might not always be able to produce a valid product. Because of this, we have to check the validity of our burger before accessing it[7, 10 and 13].

Our invalid configuration is vegetarian(true) and bacon(true). Unfortunately, our restaurant doesn't serve vegetarian bacon yet! When you start the program, you will see that the following line will print an error:

if let Err(error) = invalid_burger {
    println!("Failed to print burger: {}", error);
}

If we omit the final build step, we can reuse the builder in order to build as many objects as we want. [25 to 32]

Let's see how we implemented all of this. The first thing after the main function is the definition of our Burger struct. No surprises here, it's just plain old data. The print method is just here to provide us with some nice output during runtime. You can ignore it if you want.

The real logic is in the BurgerBuilder[64]. It should have one member for every value you want to configure. As we want to configure every aspect of our burger, we will have the exact same members as Burger. In the constructor [74], we can specify some default values. We then create one method for every configuration. In the end, in build() [109], we first perform some error checking. If the configuration is OK, we return a Burger made out of all of our members [121]. Otherwise, we return an error [119].

There's more...

If you want your object to be constructable without a builder, you could also provide Burger with a Default implementation. BurgerBuilder::new() could then just return Default::default().

In build(), if your configuration can inherently not be invalid, you can, of course, return the object directly without wrapping it in a Result.

 

Parallelism through simple threads


Every year, parallelism and concurrency become more important as processors tend to have more and more physical cores. In most languages, writing parallel code is tricky. Very tricky. Not so in Rust, as it has been designed around the principle of fearless concurrency since the beginning.

How to do it...

  1. In the src/bin folder, create a file called parallelism.rs

  2. Add the following code and run it with cargo run --bin parallelism

1   use std::thread;
2
3   fn main() {
4     // Spawning a thread lets it execute a lambda
5     let child = thread::spawn(|| println!("Hello from a new
      thread!"));
6     println!("Hello from the main thread!");
7     // Joining a child thread with the main thread means
8     // that the main thread waits until the child has
9     // finished it's work
10    child.join().expect("Failed to join the child thread");
11 
12    let sum = parallel_sum(&[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
13    println!("The sum of the numbers 1 to 10 is {}", sum);
14  }
15 
16  // We are going to write a function that
17  // sums the numbers in a slice in parallel
18  fn parallel_sum(range: &[i32]) -> i32 {
19    // We are going to use exactly 4 threads to sum the numbers
20    const NUM_THREADS: usize = 4;
21     
22    // If we have less numbers than threads,
23    // there's no point in multithreading them
24    if range.len() < NUM_THREADS {
25      sum_bucket(range)
26    } else {
27        // We define "bucket" as the amount of numbers
28        // we sum in a single thread
29        let bucket_size = range.len() / NUM_THREADS;
30        let mut count = 0;
31        // This vector will keep track of our threads
32        let mut threads = Vec::new();
33        // We try to sum as much as possible in other threads
34        while count + bucket_size < range.len() {
35          let bucket = range[count..count +
                               bucket_size].to_vec();
36          let thread = thread::Builder::new()
37            .name("calculation".to_string())
38            .spawn(move || sum_bucket(&bucket))
39            .expect("Failed to create the thread");
40          threads.push(thread);
41             
42          count += bucket_size
43    }
44    // We are going to sum the rest in the main thread
45    let mut sum = sum_bucket(&range[count..]);
46         
47    // Time to add the results up
48    for thread in threads {
49      sum += thread.join().expect("Failed to join thread");
50    }
51    sum
52  }
53 }
54 
55  // This is the function that will be executed in the threads
56  fn sum_bucket(range: &[i32]) -> i32 {
57    let mut sum = 0;
58    for num in range {
59      sum += *num;
60    }
61     sum
62  }

How it works...

You can create a new thread by calling thread::spawn, which will then begin executing the provided lambda. This will return a JoinHandle, which you can use to, well, join the thread. Joining a thread means waiting for the thread to finish its work. If you don't join a thread, you have no guarantee of it actually ever finishing. This might be valid though when setting up threads to do tasks that never complete, such as listening for incoming connections.

Keep in mind that you cannot predetermine the order in which your threads will complete any work. In our example, it is impossible to foretell whether Hello from a new thread! or Hello from the main thread! is going to be printed first, although most of the time it will probably be the main thread, as the operating system needs to put some effort into spawning a new thread. This is the reason why small algorithms can be faster when not executed in parallel. Sometimes, the overhead of letting the OS spawn and manage new threads is just not worth it.

As demonstrated by line [49], joining a thread will return a Result that contains the value your lambda returned.

Threads can also be given names. Depending on your OS, in case of a crash, the name of the responsible thread will be displayed. In line [37], we call our new summation threads calculation. If one of them were to crash, we would be able to quickly identify the issue. Try it out for yourself, insert a call to panic!(); at the beginning of sum_bucket in order to intentionally crash the program and run it. If your OS supports named threads, you will now be told that your thread calculation panicked with an explicit panic.

parallel_sum is a function that takes a slice of integers and adds them together in parallel on four threads. If you have limited experience in working with parallel algorithms, this function will be hard to grasp at first. I invite you to copy it by hand into your text editor and play around with it in order to get a grasp on it. If you still feel a bit lost, don't worry, we will revisit parallelism again later.

Adapting algorithms to run in parallel normally comes at the risk of data races. A data race is defined as the behavior in a system where the output is dependent on the random timing of external events. In our case, having a data race would mean that multiple threads try to access and modify a resource at the same time. Normally, programmers have to analyze their usage of resources and use external tools in order to catch all of the data races. In contrast, Rust's compiler is smart enough to catch data races at compile time and stops if it finds one. This is the reason why we had to call .to_vec() in line [35]:

let bucket = range[count..count + bucket_size].to_vec();

We will cover vectors in a later recipe (the Using a vector section in Chapter 2, Working with Collections), so if you're curious about what is happening here, feel free to jump to Chapter 2, Working with Collections and come back again. The essence of it is that we're copying the data into bucket. If we instead passed a reference into sum_bucket in our new thread, we would have a problem, the memory referenced by range is only guaranteed to live inside of parallel_sum, but the threads we spawn are allowed to outlive their parent threads. This would mean that in theory, if we didn't join the threads at the right time, sum_bucket might get unlucky and get called late enough for range to be invalid.

This would then be a data race, as the outcome of our function would depend on the uncontrollable sequence in which our operating system decides to launch the threads.

But don't just take my word for it, try it yourself. Simply replace the aforementioned line with let bucket = &range[count..count + bucket_size]; and try to compile it.

There's more...

If you're experienced with parallelism, you might have noticed how suboptimal our algorithm here is. This is intentional, as the elegant and efficient way of writing parallel_sum would require using techniques we have not discussed yet. We will revisit this algorithm in Chapter 7Parallelism and Rayon, and rewrite it in a professional manner. In that chapter, we will also learn how to concurrently modify resources using locks.

See also

  • Access resources in parallel with RwLocks, recipe in  Chapter 7Parallelism and Rayon
 

Generating random numbers


As described in the preface, the Rust core team left some functionality intentionally out of the standard and put it into its own external crate. Generating pseudo-random numbers is one such functionality.

How to do it...

  1. Open theCargo.tomlfile that was generated earlier for you
  2. Under [dependencies], add the following line:
rand = "0.3"
  1. If you want, you can go to rand's crates.io page (https://crates.io/crates/rand) to check for the newest version and use that one instead
  2. In the bin folder, create a file called rand.rs

  3. Add the following code and run it with cargo run --bin rand:

1   extern crate rand;
2 
3   fn main() {
4     // random_num1 will be any integer between
5     // std::i32::MIN and std::i32::MAX
6     let random_num1 = rand::random::<i32>();
7     println!("random_num1: {}", random_num1);
8     let random_num2: i32 = rand::random();
9     println!("random_num2: {}", random_num2);
10    // The initialization of random_num1 and random_num2
11    // is equivalent.
12 
13    // Every primitive data type can be randomized
14    let random_char = rand::random::<char>();
15    // Altough random_char will probably not be
16    // representable on most operating systems
17    println!("random_char: {}", random_char);
18 
19 
20    use rand::Rng;
21    // We can use a reusable generator
22    let mut rng = rand::thread_rng();
23    // This is equivalent to rand::random()
24    if rng.gen() {
25      println!("This message has a 50-50 chance of being
                  printed");
26    }
27    // A generator enables us to use ranges
28    // random_num3 will be between 0 and 9
29    let random_num3 = rng.gen_range(0, 10);
30    println!("random_num3: {}", random_num3);
31 
32    // random_float will be between 0.0 and 0.999999999999...
33    let random_float = rng.gen_range(0.0, 1.0);
34    println!("random_float: {}", random_float);
35 
36    // Per default, the generator uses a uniform distribution,
37    // which should be good enough for nearly all of your
38    // use cases. If you require a particular distribution,
39    // you specify it when creating the generator:
40    let mut chacha_rng = rand::ChaChaRng::new_unseeded();
41    let random_chacha_num = chacha_rng.gen::<i32>();
42    println!("random_chacha_num: {}", random_chacha_num);
43  }

How it works...

Before you can use rand, you have to tell Rust that you're using the crate by writing:

extern crate rand;

After that, rand will provide a random generator. We can access it by either calling rand::random(); [6] or by accessing it directly with rand::thread_rng(); [22].

If we go the first route, the generator will need to be told what type to generate. You can either explicitly state the type in the method call [6] or annotate the type of the resulting variable [8]. Both are equal and result in the exact same thing. Which one you use is up to you. In this book, we will use the first convention.

As you can see in lines [29 and 33], you need neither if the type is unambiguous in the called context.

The generated value will be between its type's MIN and MAX constants. In the case of i32, this would be std::i32::MIN and std::i32::MAX, or, in concrete numbers, -2147483648 and 2147483647. You can verify these numbers easily by calling the following:

println!("min: {}, max: {}", std::i32::MIN, std::i32::MAX);

As you can see, these are very big numbers. For most purposes, you will probably want to define custom limits. You can go the second route discussed earlier and use rand::Rng for that[22]. It has a gen method, which is actually implicitly called by rand::random(), but also a gen_range() that accepts a minimum and maximum value. Keep in mind that this range is non-inclusive, which means that the maximum value can never be reached. This is why in line [29], rng.gen_range(0, 10) will only generate the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, without the 10.

All of the described ways of generating random values use uniform distribution, which means that every number in the range has the same chance of being generated. In some contexts, it makes sense to use other distributions. You can specify a generator's distribution during its creation[40]. As of the time of publication, the rand crate supports the ChaCha and ISAAC distributions.

There's more...

If you want to randomly populate an entire struct, you use the rand_derive helper crate in order to derive it from Rand. You can then generate your own struct, just as you would generate any other type.

 

Querying with regexes


When parsing simple data formats, it is often easier to write regular expressions (or regex for short) than use a parser. Rust has pretty decent support for this through its regex crate.

Getting ready

In order to really understand this chapter, you should be familiar with regexes. There are countless free online resources for this, like regexone (https://www.regexone.com/).

Note

This recipe will not conform to clippy, as we kept the regexes intentionally too simple because we want to keep the focus of the recipe on the code, not the regex. Some of the examples shown could have been rewritten to use .contains() instead.

How to do it...

  1. Open the Cargo.toml file that was generated earlier for you

  2. Under [dependencies], add the following line:
regex = "0.2"
  1. If you want, you can go to regex's crates.io page (https://crates.io/crates/regex) to check for the newest version and use that one instead
  2. In the bin folder, create a file called regex.rs

  3. Add the following code and run it with cargo run --bin regex:

1   extern crate regex;
2
3   fn main() {
4     use regex::Regex;
5     // Beginning a string with 'r' makes it a raw string,
6     // in which you don't need to escape any symbols
7     let date_regex =
        Regex::new(r"^\d{2}.\d{2}.\d{4}$").expect("Failed
          to create regex");
8     let date = "15.10.2017";
9     // Check for a match
10    let is_date = date_regex.is_match(date);
11    println!("Is '{}' a date? {}", date, is_date);
12
13    // Let's use capture groups now
14    let date_regex = Regex::new(r"(\d{2}).(\d{2})
        .(\d{4})").expect("Failed to create regex");
15    let text_with_dates = "Alan Turing was born on 23.06.1912 and
          died on 07.06.1954. \
16      A movie about his life called 'The Imitation Game' came out
          on 14.11.2017";
17    // Iterate over the matches
18    for cap in date_regex.captures_iter(text_with_dates) {
19      println!("Found date {}", &cap[0]);
20      println!("Year: {} Month: {} Day: {}", &cap[3], &cap[2],
          &cap[1]);
21    }
22    // Replace the date format
23    println!("Original text:\t\t{}", text_with_dates);
24    let text_with_indian_dates =
        date_regex.replace_all(text_with_dates, "$1-$2-$3");
25    println!("In indian format:\t{}", text_with_indian_dates);
26
27    // Replacing groups is easier when we name them
28    // ?P<somename> gives a capture group a name
29    let date_regex = Regex::new(r"(?P<day>\d{2}).(?P<month>\d{2})
        .(?P<year>\d{4})")
30      .expect("Failed to create regex");
31    let text_with_american_dates =
        date_regex.replace_all(text_with_dates,
          "$month/$day/$year");
32    println!("In american format:\t{}", 
      text_with_american_dates);
33    let rust_regex = Regex::new(r"(?i)rust").expect("Failed to
        create regex");
34    println!("Do we match RuSt? {}", 
      rust_regex.is_match("RuSt"));
35    use regex::RegexBuilder;
36    let rust_regex = RegexBuilder::new(r"rust")
37      .case_insensitive(true)
38      .build()
39      .expect("Failed to create regex");
40    println!("Do we still match RuSt? {}",
        rust_regex.is_match("RuSt"));
41  }

How it works...

You can construct a regex object by calling Regex::new() with a valid regex string[7]. Most of the time, you will want to pass a raw string in the form of r"...". Raw means that all symbols in the string are taken at literal value without being escaped. This is important because of the backslash (\) character that is used in regex to represent a couple of important concepts, such as digits(\d) or whitespace (\s). However, Rust already uses the backslash to escape special non-printable symbols, such as the newline (\n) or the tab (\t)[23]. If we wanted to use a backslash in a normal string, we would have to escape it by repeating it ( \\). Or the regex on line [14] would have to be rewritten as:

"(\\d{2}).(\\d{2}).(\\d{4})"

Worse yet, if we wanted to match for the backslash itself, we would have to escape it as well because of regex. With normal strings, we would have to quadruple-escape it! ( \\\\) We can save ourselves the headache of missing readability and confusion by using raw strings and write our regex normally. In fact, it is considered good style to use raw strings in every regex, even when it doesn't have any backslashes [33]. This is a help for your future self if you notice down the line that you actually would like to use a feature that requires a backslash.

We can iterate over the results of our regex [18]. The object we get on every match is a collection of our capture groups. Keep in mind that the zeroeth index is always the entire capture [19]. The first index is then the string from our first capture group, the second index is the string of the second capture group, and so on. [20]. Unfortunately, we do not get a compile-time check on our index, so if we accessed &cap[4], our program would compile but then crash during runtime.

When replacing, we follow the same concept: $0 is the entire match, $1 the result of the first capture group, and so on. To make our life easier, we can give the capture groups names by starting them with ?P<somename>[29] and then use this name when replacing [31].

There are many flags that you can specify, in the form of (?flag), for fine-tuning, such as i, which makes the match case insensitive [33], or x, which ignores whitespace in the regex string. If you want to read up on them, visit their documentation (https://doc.rust-lang.org/regex/regex/index.html). Most of the time though, you can get the same result by using the RegexBuilder that is also in the regex crate [36]. Both of the rust_regex objects we generate in lines [33] and [36] are equivalent. While the second version is definitely more verbose, it is also way easier to understand at first glance.

There's more...

The regexes work by compiling their strings into the equivalent Rust code on creation. For performance reasons, you are advised to reuse your regexes instead of creating them anew every time you use them. A good way of doing this is by using the lazy_static crate, which we will look at later in the book, in the Creating lazy static objects section in Chapter 5, Advanced Data Structures.

Note

Be careful not to overdo it with regexes. As they say, "When all you have is a hammer, everything looks like a nail." If you parse complicated data, regexes can quickly become an unbelievably complex mess. When you notice that your regex has become too big to understand at first glance, try to rewrite it as a parser.

See also

  • Creating lazy static objects recipe inChapter 5, Advanced Data Structures
 

Accessing the command line


Sooner or later, you'll want to interact with the user in some way or another. The most basic way to do this is by letting the user pass parameters while calling the application through the command line.

How to do it...

  1. In the bin folder, create a file called cli_params.rs

  2. Add the following code and run it with cargo run --bin cli_params some_option some_other_option:

1   use std::env;
2
3   fn main() {
4     // env::args returns an iterator over the parameters
5     println!("Got following parameters: ");
6     for arg in env::args() {
7       println!("- {}", arg);
8     }
9
10    // We can access specific parameters using the iterator API
11    let mut args = env::args();
12    if let Some(arg) = args.nth(0) {
13      println!("The path to this program is: {}", arg);
14    }
15    if let Some(arg) = args.nth(1) {
16        println!("The first parameter is: {}", arg);
17    }
18    if let Some(arg) = args.nth(2) {
19        println!("The second parameter is: {}", arg);
20    }
21
22    // Or as a vector
23    let args: Vec<_> = env::args().collect();
24    println!("The path to this program is: {}", args[0]);
25    if args.len() > 1 {
26        println!("The first parameter is: {}", args[1]);
27    }
28    if args.len() > 2 {
29        println!("The second parameter is: {}", args[2]);
30    }
31  }

How it works...

Calling env::args() returns an iterator over the provided parameters[6]. By convention, the first command-line parameter on most operating systems is the path to the executable itself [12].

We can access specific parameters in two ways: keep them as an iterator [11] or collect them into a collection such as Vec[23]. Don't worry, we are going to talk about them in detail in Chapter 2, Working with Collections. For now, it's enough for you to know that:

  • Accessing an iterator forces you to check at compile time whether the element exists, for example, an if let binding [12]

  • Accessing a vector checks the validity at runtime

This means that we could have executed lines [26] and [29] without checking for their validity first in [25] and [28]. Try it yourself, add the &args[3]; line at the end of the program and run it.

Note

We check the length anyways because it is considered good style to check whether the expected parameters were provided. With the iterator way of accessing parameters, you don't have to worry about forgetting to check, as it forces you to do it. On the other hand, by using a vector, you can check for the parameters once at the beginning of the program and not worry about them afterward.

There's more...

If you are building a serious command-line utility in the style of *nix tools, you will have to parse a lot of different parameters. Instead of reinventing the wheel, you should take a look at third-party libraries, such as clap (https://crates.io/crates/clap).

 

Interacting with environment variables


According to the Twelve-Factor App (https://12factor.net/), you should store your configuration in the environment (https://12factor.net/config). This means that you should pass values that could change between deployments, such as ports, domains, or database handles, as environment variables. Many programs also use environment variables to communicate with each other.

How to do it...

  1. In the bin folder, create a file called env_vars.rs

  2. Add the following code and run it with cargo run --bin env_vars:

1   use std::env;
2
3   fn main() {
4     // We can iterate over all the env vars for the current
      process
5     println!("Listing all env vars:");
6     for (key, val) in env::vars() {
7       println!("{}: {}", key, val);
8     }
9
10    let key = "PORT";
11    println!("Setting env var {}", key);
12    // Setting an env var for the current process
13    env::set_var(key, "8080");
14
15    print_env_var(key);
16
17    // Removing an env var for the current process
18    println!("Removing env var {}", key);
19    env::remove_var(key);
20
21    print_env_var(key);
22  }
23
24  fn print_env_var(key: &str) {
25    // Accessing an env var
26    match env::var(key) {
27      Ok(val) => println!("{}: {}", key, val),
28      Err(e) => println!("Couldn't print env var {}: {}", key, e),
29    }
30  }

How it works...

With env::vars(), we can access an iterator over all the env var that were set for the current process at the time of execution [6]. This list is pretty huge though, as you'll see when running the code, and for the most part, irrelevant for us.

It's more practical to access a single env var with env::var() [26], which returns an Err if the requested var is either not present or doesn't contain valid Unicode. We can see this in action in line [21], where we try to print a variable that we just deleted.

Because your env::var returns a Result, you can easily set up default values for them by using unwrap_or_default. One real-life example of this, involving the address of a running instance of the popular Redis (https://redis.io/) key-value storage, looks like this:

redis_addr = env::var("REDIS_ADDR")
    .unwrap_or_default("localhost:6379".to_string());

Keep in mind that creating an env var with env::set_var() [13] and deleting it with env::remove_var() [19] both only change the env var for our current process. This means that the created env var are not going to be readable by other programs. It also means that if we accidentally remove an important env var, the rest of the operating system is not going to care, as it can still access it.

There's more...

At the beginning of this recipe, I wrote about storing your configuration in the environment. The industry standard way to do this is by creating a file called .env that contains said config in the form of key-value-pairs, and loading it into the process at some point during the build. One easy way to do this in Rust is by using the dotenv (https://crates.io/crates/dotenv) third-party crate.

 

Reading from stdin


If you want to create an interactive application, it's easy to prototype your functionality with the command line. For CLI programs, this will be all the interaction you need.

How to do it...

  1. In the src/bin folder, create a file called stdin.rs

  2. Add the following code and run it with cargo run --bin stdin:

1   use std::io;
2   use std::io::prelude::*;
3
4   fn main() {
5     print_single_line("Please enter your forename: ");
6     let forename = read_line_iter();
7
8     print_single_line("Please enter your surname: ");
9     let surname = read_line_buffer();
10
11    print_single_line("Please enter your age: ");
12    let age = read_number();
13
14    println!(
15      "Hello, {} year old human named {} {}!",
16      age, forename, surname
17    );
18  }
19
20  fn print_single_line(text: &str) {
21    // We can print lines without adding a newline
22    print!("{}", text);
23    // However, we need to flush stdout afterwards
24    // in order to guarantee that the data actually displays
25    io::stdout().flush().expect("Failed to flush stdout");
26  }
27
28  fn read_line_iter() -> String {
29    let stdin = io::stdin();
30    // Read one line of input iterator-style
31    let input = stdin.lock().lines().next();
32    input
33      .expect("No lines in buffer")
34      .expect("Failed to read line")
35      .trim()
36      .to_string()
37  }
38
39  fn read_line_buffer() -> String {
40    // Read one line of input buffer-style
41    let mut input = String::new();
42    io::stdin()
43      .read_line(&mut input)
44      .expect("Failed to read line");
45    input.trim().to_string()
46  }
47
48  fn read_number() -> i32 {
49    let stdin = io::stdin();
50    loop {
51      // Iterate over all lines that will be inputted
52      for line in stdin.lock().lines() {
53        let input = line.expect("Failed to read line");
54        // Try to convert a string into a number
55        match input.trim().parse::<i32>() {
56          Ok(num) => return num,
57            Err(e) => println!("Failed to read number: {}", e),
58        }
59      }
60    }
61  }

How it works...

In order to read from the standard console input, stdin, we first need to obtain a handle to it. We do this by calling io::stdin() [29]. Imagine the returned object as a reference to a global stdin object. This global buffer is managed by a Mutex, which means that only one thread can access it at a time (more on that later in the book, in the Parallelly accessing resources with Mutexes section inChapter 7,Parallelism and Rayon). We get this access by locking (usinglock()) the buffer, which returns a new handle [31]. After we have done this, we can call thelinesmethod on it, which returns an iterator over the lines the user will write [31 and 52]. More on iterators in the Accessing collections as Iterators section inChapter 2,Working with Collections.

Finally, we can iterate over as many submitted lines as we want until some kind of break condition is reached, otherwise the iteration would go on forever. In our example, we break the number-checking loop as soon as a valid number has been entered [56].

If we're not particularly picky about our input and just want the next line, we have two options:

  • We can continue using the infinite iterator provided by lines(), but simply call next on it in order to just take the first one. This comes with an additional error check as, generally speaking, we cannot guarantee that there is a next element.

  • We can use read_line in order to populate an existing buffer [43]. This doesn't require that we lock the handler first, as it is done implicitly.

Although they both result in the same end effect, you should choose the first option. It is more idiomatic as it uses iterators instead of a mutable state, which makes it more maintainable and readable.

On a side note, we are using print! instead of println! in some places in this recipe for aesthetic reasons [22]. If you prefer the look of newlines before user input, you can refrain from using them.

There's more...

This recipe is written with the assumption that you want to use stdin for live interaction over the cli. If you plan on instead piping some data into it (for example, cat foo.txt | stdin.rs on *nix), you can stop treating the iterator returned by lines() as infinite and retrieve the individual lines, not unlike how you retrieved the individual parameters in the last recipe.

There are various calls to trim() in our recipe [35, 45 and 55]. This method removes leading and trailing whitespace in order to enhance the user-friendliness of our program. We are going to look at it in detail in the Using a string section in Chapter 2, Working with Collections.

See also

  • Interacting with environment variables recipe inChapter 1, Learning the Basics
  • Using a string and Accessing collections as iterators recipe inChapter 2, Working with Collections
  • Parallelly accessing resources with Mutexes recipe inChapter 7, Parallelism and Rayon
 

Accepting a variable number of arguments


Most of the time, when you want to operate on a dataset, you will design a function that takes a collection. In some cases, however, it is nice to have functions that just accept an unbound amount of parameters, like JavaScript's rest parameters. This concept is called variadic functions and is not supported by Rust. However, we can implement it ourselves by defining a recursive macro.

Getting started

The code in this recipe might be small, but it will look like gibberish if you're not familiar with macros. If you have not yet learned about macros or need a refresh, I recommend that you take a quick look at the relevant chapter in the official Rust book (https://doc.rust-lang.org/stable/book/first-edition/macros.html).

How to do it...

  1. In the src/bin folder, create a file called variadic.rs

  2. Add the following code and run it with cargo run --bin variadic:

1   macro_rules! multiply {
2     // Edge case
3     ( $last:expr ) => { $last };
4
5     ( $head:expr, $($tail:expr), +) => {
6       // Recursive call
7       $head * multiply!($($tail),+)
8     };
9   }
10
11  fn main() {
12    // You can call multiply! with
13    // as many parameters as you want
14    let val = multiply!(2, 4, 8);
15    println!("2*4*8 = {}", val)
16  }

How it works...

Let's start with our intention: we want to create a macro called multiply that accepts an undefined amount of parameters and multiplies them all together. In macros, this is done via recursion. We begin every recursive definition with the edge case, that is, the parameters where the recursion should stop. Most of the time, this is where a function call stops making sense. In our case, this is the single parameter. Think about it, what should multiply!(3) return? It doesn't make sense to multiply it with anything, since we have no other parameter to multiply it with. Our best reaction is to simply return the parameter unmodified.

Our other condition is a match against more than one parameter, a $head and a comma-separated list of parameters inside of a $tail. Here, we just define the return value as the $head multiplied with the multiplication of the $tail. This will call multiply! with the $tail and without the $head, which means that on every call we process one parameter less until we finally reach our edge case, one single parameter.

There's more...

Keep in mind that you should use this technique sparingly. Most of the time, it is clearer to just accept and operate on a slice instead. However, it makes sense to use this in combination with other macros and higher kinds of concepts where the analogy of a graspable list of things breaks down. Finding a good example for this is difficult since they tend to be extremely specific. You can find one of them at the end of the book though.

See also

  • Composing functions recipe inChapter 10, Using Experimental Nightly Features

About the Authors

  • Jan Nils Ferner

    Jan Nils Ferner is a senior software engineer and an active contributor to, and advocator of, the open source community.

    Over the years, he has acquired a deep understanding of systems programming through languages such as C++ and Rust by modernizing and refactoring big and complex codebases. His passions include all things AI and Blockchain. In his free time, he researches innovative approaches to bringing biology and technology closer together, which has led him to develop his own Machine Learning framework. You can follow his projects on GitHub. His username is jnferner.

    Browse publications by this author
  • Daniel Durante

    Daniel Durante is an avid coffee drinker/roaster, motorcyclist, archer, welder, and carpenter whenever he isn’t programming. Right from the age of 12, he has been involved with web and embedded programming with PHP, Node.js, Golang, Rust, and C.

    He has worked on text-based browser games that have reached over 1,000,000 active players, created bin-packing software for CNC machines, embedded programming with cortex-m and PIC circuits, high-frequency trading applications, and helped contribute to one of the oldest ORMs of Node.js (SequelizeJS).

    Browse publications by this author

Latest Reviews

(2 reviews total)
très utile pour fixer la librairie standard
Quality print copy and ebook. Helping me get my head around the most revolutionary computer programming language in years.

Recommended For You

Rust Standard Library Cookbook
Unlock this book and the full library for $5 a month*
Start now