Ranges

Exclusive offer: get 50% off this eBook here
D Cookbook

D Cookbook — Save 50%

Discover the advantages of programming in D with over 100 incredibly effective recipes with this book and ebook.

$29.99    $15.00
by Adam D. Ruppe | May 2014 | Cookbooks Open Source

This article by Adam D. Ruppe, the author of the book D Cookbook, goes into the details of an artefact central to the Phobos standard library with a little core language support: ranges. Ranges are user-defined objects used to build iterators over a collection of items. We will concentrate more on sorting and searching ranges, and using functional tools to query data.

(For more resources related to this topic, see here.)

Sorting ranges efficiently

Phobos' std.algorthm includes sorting algorithms. Let's look at how they are used, what requirements they have, and the dangers of trying to implement range primitives without minding their efficiency requirements.

Getting ready

Let's make a linked list container that exposes an appropriate view, a forward range, and an inappropriate view, a random access range that doesn't meet its efficiency requirements. A singly-linked list can only efficiently implement forward iteration due to its nature; the only tool it has is a pointer to the next element. Implementing any other range primitives will require loops, which is not recommended. Here, however, we'll implement a fully functional range, with assignable elements, length, bidirectional iteration, random access, and even slicing on top of a linked list to see the negative effects this has when we try to use it.

How to do it…

We're going to both sort and benchmark this program.

To sort

Let's sort ranges by executing the following steps:

  1. Import std.algorithm.

  2. Determine the predicate you need. The default is (a, b) => a < b, which results in an ascending order when the sorting is complete (for example, [1,2,3]). If you want ascending order, you don't have to specify a predicate at all. If you need descending order, you can pass a greater-than predicate instead, as shown in the following line of code:

    auto sorted = sort!((a, b) => a > b)([1,2,3]); // results: [3,2,1]

  3. When doing string comparisons, the functions std.string.cmp (case-sensitive) or std.string.icmp (case-insensitive) may be used, as is done in the following code:

    auto sorted = sort!((a, b) => cmp(a, b) < 0)(["b", "c", "a"]); // results: a, b, c

  4. Your predicate may also be used to sort based on a struct member, as shown in the following code:

    auto sorted = sort!((a, b) => a.value < b.value)(structArray);

  5. Pass the predicate as the first compile-time argument. The range you want to sort is passed as the runtime argument.

  6. If your range is not already sortable (if it doesn't provide the necessary capabilities), you can convert it to an array using the array function from std.range, as shown in the following code:

    auto sorted = sort(fibanocci().take(10)); // won't compile, not enough capabilities auto sorted = sort(fibanocci().take(10).array); // ok, good

  7. Use the sorted range. It has a unique type from the input to signify that it has been successfully sorted. Other algorithms may use this knowledge to increase their efficiency.

To benchmark

Let's sort objects using benchmark by executing the following steps:

  1. Put our range and skeleton main function from the Getting ready section of this recipe into a file.

  2. Use std.datetime.benchmark to test the sorting of an array from the appropriate walker against the slow walker and print the results at the end of main. The code is as follows:

    auto result = benchmark!( { auto sorted = sort(list.walker.array); }, { auto sorted = sort(list.slowWalker); } )(100); writefln("Emulation resulted in a sort that was %d times slower.", result[1].hnsecs / result[0].hnsecs);

  3. Run it. Your results may vary slightly, but you'll see that the emulated, inappropriate range functions are consistently slower. The following is the output:

    Emulation resulted in a sort that was 16 times slower.

  4. Tweak the size of the list by changing the initialization loop. Instead of 1000 entries, try 2000 entries. Also, try to compile the program with inlining and optimization turned on (dmd –inline –O yourfile.d) and see the difference. The emulated version will be consistently slower, and as the list becomes longer, the gap will widen.

On my computer, a growing list size led to a growing slowdown factor, as shown in the following table:

List size

Slowdown factor

500

13

1000

16

2000

29

4000

73

How it works…

The interface to Phobos' main sort function hides much of the complexity of the implementation. As long as we follow the efficiency rules when writing our ranges, things either just work or fail, telling us we must call an array in the range before we can sort it. Building an array has a cost in both time and memory, which is why it isn't performed automatically (std.algorithm prefers lazy evaluation whenever possible for best speed and minimum memory use). However, as you can see in our benchmark, building an array is much cheaper than emulating unsupported functions.

The sort algorithms require a full-featured range and will modify the range you pass to it instead of allocating memory for a copy. Thus, the range you pass to it must support random access, slicing, and either assignable or swappable elements. The prime example of such a range is a mutable array. This is why it is often necessary to use the array function when passing data to sort.

Our linked list code used static if with a compile-time parameter as a configuration tool. The implemented functions include opSlice and properties that return ref. The ref value can only be used on function return values or parameters. Assignments to a ref value are forwarded to the original item. The opSlice function is called when the user tries to use the slice syntax: obj[start .. end].

Inside the beSlow condition, we broke the main rule of implementing range functions: avoid loops. Here, we see the consequences of breaking that rule; it ruined algorithm restrictions and optimizations, resulting in code that performs very poorly. If we follow the rules, we at least know where a performance problem will arise and can handle it gracefully.

For ranges that do not implement the fast length property, std.algorithm includes a function called walkLength that determines the length by looping through all items (like we did in the slow length property). The walkLength function has a longer name than length precisely to warn you that it is a slower function, running in O(n) (linear with length) time instead of O(1) (constant) time. Slower functions are OK, they just need to be explicit so that the user isn't surprised.

See also

  • The std.algorithm module also includes other sorting algorithms that may fit a specific use case better than the generic (automatically specialized) function. See the documentation at http://dlang.org/phobos/std_algorithm.html for more information.

Searching ranges

Phobos' std.algorithm module includes search functions that can work on any ranges. It automatically specializes based on type information. Searching a sorted range is faster than an unsorted range.

How to do it…

Searching has a number of different scenarios, each with different methods:

  • If you want to know if something is present, use canFind.

  • Finding an item generically can be done with the find function. It returns the remainder of the range, with the located item at the front.

  • When searching for a substring in a string, you can use haystack.find(boyerMooreFinder(needle)). This uses the Boyer-Moore algorithm which may give better performance.

  • If you want to know the index where the item is located, use countUntil. It returns a numeric index into the range, just like the indexOf function for strings.

  • Each find function can take a predicate to customize the search operation.

  • When you know your range is sorted but the type doesn't already prove it, you may call assumeSorted on it before passing it to the search functions. The assumeSorted function has no runtime cost; it only adds information to the type that is used for compile-time specialization.

How it works…

The search functions in Phobos make use of the ranges' available features to choose good-fit algorithms. Pass them efficiently implemented ranges with accurate capabilities to get best performance.

The find function returns the remainder of the data because this is the most general behavior; it doesn't need random access, like returning an index, and doesn't require an additional function if you are implementing a function to split a range on a given condition. The find function can work with a basic input range, serving as a foundation to implement whatever you need on top of it, and it will transparently optimize to use more range features if available.

Using functional tools to query data

The std.algorithm module includes a variety of higher-order ranges that provide tools similar to functional tools. Here, we'll see how D code can be similar to a SQL query.

A SQL query is as follows:

SELECT id, name, strcat("Title: ", title) FROM users WHERE name LIKE 'A%' ORDER BY id DESC LIMIT 5;

How would we express something similar in D?

Getting ready

Let's create a struct to mimic the data table and make an array with the some demo information. The code is as follows:

struct User { int id; string name; string title; } User[] users; users ~= User(1, "Alice", "President"); users ~= User(2, "Bob", "Manager"); users ~= User(3, "Claire", "Programmer");

How to do it…

Let's use functional tools to query data by executing the following steps:

  1. Import std.algorithm.

  2. Use sort to translate the ORDER BY clause. If your dataset is large, you may wish to sort it at the end. This will likely require a call to an array, but it will only sort the result set instead of everything. With a small dataset, sorting early saves an array allocation.

  3. Use filter to implement the WHERE clause.

  4. Use map to implement the field selection and functions. The std.typecons.tuple module can also be used to return specific fields.

  5. Use std.range.take to implement the LIMIT clause.

  6. Put it all together and print the result.

The code is as follows:

import std.algorithm; import std.range; import std.typecons : tuple; // we use this below auto resultSet = users. sort!((a, b) => a.id > b.id). // the ORDER BY clause filter!((item) => item.name.startsWith("A")). // the WHERE clause take(5). map!((item) => tuple(item.id, item.name,"Title: " ~ item.title));
// the field list and transformations import std.stdio; foreach(line; resultSet) writeln(line[0], " ", line[1], " ", line[2]);

It will print the following output:

1 Alice Title: President

How it works…

Many SQL operations or list comprehensions can be expressed in D using some building blocks from std.algorithm. They all work generally the same way; they take a predicate as a compile-time argument. The predicate is passed one or two items at a time and you perform a check or transformation on it. Chaining together functions with the dot syntax, like we did here, is possible thanks to uniform function call syntax. It could also be rewritten as take(5, filter!pred(map!pred(users))). It depends on author's preference, as both styles work exactly the same way.

It is important to remember that all std.algorithm higher-order ranges are evaluated lazily. This means no computations, such as looping over or printing, are actually performed until they are required. Writing code using filter, take, map, and many other functions is akin to preparing a query. To execute it, you may print or loop the result, or if you want to save it to an array for use later, simply call .array at the end.

There's more…

The std.algorithm module also includes other classic functions, such as reduce. It works the same way as the others.

D has a feature called pure functions. The functions in std.algorithm are conditionally pure, which means they can be used in pure functions if and only if the predicates you pass are also pure. With lambda functions, like we've been using here, the compiler will often automatically deduce this for you. If you use other functions you define as predicates and want to use it in a pure function, be sure to mark them pure as well.

See also

Summary

In this article, we learned how to sort ranges in an efficient manner by using sorting algorithms. We learned how to search a range using different functions. We also learned how to use functional tools to query data (similar to a SQL query).

Resources for Article:


Further resources on this subject:


D Cookbook Discover the advantages of programming in D with over 100 incredibly effective recipes with this book and ebook.
Published: May 2014
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

About the Author :


Adam D. Ruppe

Adam D. Ruppe is a professional software developer living in Watertown, New York. He started programming PCs in high school, writing assembly language, and later C and C++, using the Digital Mars compiler to build programs based on MS DOS on a hand-me-down computer. Programming in the DOS environment with the slow computer gave him early practical experience in low-level and effi cient code—skills he carries on developing today.

After finishing school, he started doing web programming—initially with PHP. While he'd make it work, he often found himself longing for the good old days. One day, he decided to check back with the vendor of his old compiler and discovered the D programming language (well before it reached 1.0!).

He was enamored with it and used it to write some games, and then started writing web libraries to use it for work too, to replace PHP. He found success in this endeavor in early 2009.

Combining his pioneering spirit with his blend of low-level and high-level programming experience, he was able to forge ahead with D, taking it to places many people didn't believe possible

Books From Packt


C++ Application Development with Code::Blocks
C++ Application Development with Code::Blocks

Mastering Windows 8 C++ App Development
Mastering Windows 8 C++ App Development

Boost.Asio C++ Network Programming
Boost.Asio C++ Network Programming

C++ Multithreading Cookbook
C++ Multithreading Cookbook

C Programming for Arduino
C Programming for Arduino

Microsoft Visual C++ Windows Applications by Example
Microsoft Visual C++ Windows Applications by Example

Multithreading in C# 5.0 Cookbook
Multithreading in C# 5.0 Cookbook

Getting Started with C++ Audio Programming for Game Development
Getting Started with C++ Audio Programming for Game Development


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software