In this chapter, we will get started with D and explore some of its core features. You will learn the following recipes:
Installing the compiler and writing a "Hello World" program
Adding additional modules (files) to your program
Using external libraries
Building and processing arrays
Using associative arrays to translate input
Creating a user-defined vector type
Using a custom exception type
Understanding immutability
Slicing a string to get a substring
Creating a tree of classes
The D language borrows from several other programming languages, including statically typed languages such as C, C++, and Java, as well as dynamic languages such as Ruby, Python, and JavaScript. The overall syntax is very similar to C; the use of curly braces to denote blocks, declarations in the form of typename initializer, and more. In fact, a lot of, but not all, C code will compile in D too.
D also is aimed at convenience, productivity, and modeling power. These principles can be illustrated with D's type inference feature. Type inference means you can write code without explicitly thinking of and repeating a variable's type. This gives the convenience of using a dynamic language, without sacrificing the compile-time checks of static typing. You'll use type inference throughout your programs. Any variable declared without a type (typically, the keyword auto
is used to declare a variable without an explicit type) has an inferred type, where the type is automatically determined by the right-hand side of the assignment. D is one of the fastest compiling languages available, and it gives quick edit-run cycles that help rapid development in dynamic languages. Modeling power comes in the form of D's rich code generation, introspection, and user-defined types, which you'll start exploring in this chapter when you look at structs and classes.
You're going to create your first D program; a simple "Hello World" program.
Let's execute the following steps to create your first program:
Download the DMD compiler from http://dlang.org/download.html.
If you use a platform installer, it will install. If you use a ZIP file, simply unzip it and use it in place of the platform installer. The binaries for each operating system are found in
dmd2/your_os_name/bin
. You may choose to add this directory to yourPATH
environment variable so you do not need to use the full path each time you run the compiler.Create a file with your favorite text editor with the following content and name it
hello.d
:import std.stdio : writeln;;; void main() { writeln("Hello, world!"); }
Compile the program. In your command prompt, run the following:
dmd hello.d
Run the program as follows:
hello
You should see the following message appear:
Hello, world!
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
The DMD compiler is the key tool needed to use D. Although IDEs exist, you won't be using them in this book. So, you'll learn how to use the compiler and be better equipped to handle problems encountered during the build process.
D source files are Unicode text, which are compiled into executable programs. The DMD compiler, by default, generates an executable file with the same base name as the first file passed to it. So here, when you called dmd hello.d
, it created a file named hello.exe
on Windows or hello
on Unix systems. You can change the output file with the dmd –of
option, for example dmd –oftest hello.d
will create a file named test.exe
. You'll learn more about the options of dmd
as and when they'll be required.
Next, let's look at each of the lines of hello.d
, beginning with the following import
statement:
import std.stdio;
A D program is composed of modules. Each module is a file, but unlike C or C++, where you use textual #include
directives, D uses a symbolic import
. When you import a module, its public members become available for use. You can import the same module multiple times without any negative effect, and the order of top-level imports does not matter.
In this case, you're importing the module std.stdio
, which is a part of the standard library that provides input and output functions, including the writeln
function you'll use later in the code. Next, let's discuss the following main()
function:
void main()
D programs typically begin execution at the main()
function. D's main()
function can optionally take command-line arguments of type string[]
, and they may return either void or integer values. All forms are equally valid.
Tip
It is possible to write D programs that start somewhere other than main()
, which allows you to bypass D runtime initialization. You'll see this in Chapter 11, D for Kernel Coding.
Here, you're returning void because you aren't returning any specific value. The runtime will automatically return zero to the operating system upon normal termination, and it will return an error code automatically if the program is terminated by an exception. Now, let's look at the following output function:
writeln("Hello, world!");
Finally, you'll call the function writeln
from the std.stdio
module to say Hello, World!. The writeln
function can take any number of arguments of any type, and it will automatically convert them to string for printing. This function automatically adds a newline character to the end of the output.
Here, you used the DMD compiler. There are two other major D compilers available: GDC and LDC. You can learn more about these at http://gdcproject.org/ and http://github.com/ldc-developers/ldc, respectively.
As your program grows, you'll want to break it up across multiple files. D offers a way to do this that is similar, but not identical to other popular programming languages. D source files are also called D modules.
Let's add modules to your program by executing the following steps:
Create an additional file.
Put the
module
declaration in the other file with a package and module name. These names should be meaningful for your project and this specific file, respectively, as shown:module yourpackage.yourmodulename;;;
Import it in the existing file to use it. Import works in different ways—it does not have to be at the top of the file and may appear more than once in the file. The
import
statement should always match the full name given in themodule
statement in the file being imported, as shown in the following code:import yourpackage.yourmodulename;
Use the functions, disambiguating with the full module name if required.
Compile with all files listed on the command line using the following syntax:
dmd file1.d file2.d
This will produce a single executable file out of the passed files. The name of the executable is, by default, the same as the first file passed. So here, the executable file will be named
file1
.
In D, code is organized into modules. There is a one-to-one correspondence between files and modules—every D source file is a module, and every D module is a file. Using a module involves importing it into the current scope, accessing its members in code, and adding it to the build command when compiling.
Tip
If you forget to add a module to the build command, you'll see an error such as cannot find module of name NAME
or undefined symbol _D6module
.
Modules are conceptually similar to static classes with a single instance; they can have constructors, destructors, fields, and so on. Each declaration inside a module may have attributes and protection qualifiers. In D, unlike C++, modules (not classes) are the unit of encapsulation. Therefore, any code can access any entity within the same module (regardless of the entity's protection qualifier).
Modules have logical names that do not need to match the filename. This is set with the module
statement, which must appear at the top of the file. The module
statement is not strictly required. If you leave it off, it will default to the filename. However, it is strongly recommended that you write it, including a package name, (the first part of the dot-separated full name) in every module that may ever be imported, because relying on the default module name will cause trouble if you organize your code into directories. The common error module foo.bar
must be imported as the foo
module is caused by a missing module
statement in the imported module. The typical organization of modules into packages mirrors the source files' directory structures. You are to match package and module names with directory and filenames, but doing so will help other users and build tools understand your project layout.
The import
statement may appear at any point. In module scope, it may appear anywhere and any number of times without changing anything. It can also be used in local scopes, where it must appear before the member is used and visibility is limited to that scope.
Names of members in a module are not required to be unique among all the modules that make up a program. Instead, when necessary, they are disambiguated at the usage point or at the import
statement by the module identifier, as shown in the following code:
import project.foo;; // disambiguate with project.foo import bar; // you can disambiguate calls with the name bar project.foo.func(); // call project.foo.func bar.func(); // call bar.func
The compiler will always issue an error if a reference is ambiguous so that you can specify your intent.
The dmd
distribution also includes a program called rdmd
, which can recursively find dependent modules and compile them all automatically. With rdmd
, you only have to pass the module that contains main
.
The module documentation at http://dlang.org/module.html details how D's module system works, including a list of all the forms of the
import
statement and information about symbol protection
D can use external libraries written in D, as well as other languages, with direct access to C functions, such as those provided by the operating system or the wealth of code written in C that can perform a variety of tasks. Here, you'll learn how to use an external library in your D program.
Let's use external libraries by executing the following steps:
Create or download bindings, a list of function prototypes, and a data structure definition from the library.
On 32-bit Windows with dmd only, get or create an
import
library (.lib
file).If you have a
.lib
file, you can usecoffimplib
.Import the binding by using the following statement:
import package.module.name;
Compile with the library. For Linux, pass
–L-llibname
todmd
. On Windows, pass the.lib
file todmd
when compiling your program. This will link the file with the generated executable, producing a working program.
D is binary compatible with C, but not source compatible. This means you can link directly to C libraries, including most operating system libraries, without any wrapper or invoker code. You do need, however, to port the header files, the function prototypes, and the variable declarations, to D. This process is called binding.
While you can use a library by only providing the prototypes for the functions you need, being minimally type-safe, the recommended way is to port the C header as closely as possible. This will minimize bugs and maximize the ease of use for programmers who are familiar with the usage and documentation of C.
In your code, using the library is the same as using any other module; you import the module, call the functions, and disambiguate the names by using fully-qualified package and module names.
When compiling, the –L
flag to dmd passes the rest of the argument straight to the linker. On 32-bit Windows, using an existing library may be difficult because dmd uses an old library file format called OMF that is incompatible with the newer and more common COFF format. This is where implib
and coffimplib
come into play—these programs generate the format that the linker, optlink
, expects from the more common formats available. The implib
command creates a .lib
file that you can use with D directly from a .dll
file. The implib
command's invocation format is as follows:
implib /s myfile.lib myfile.dll
The coffimplib
command converts the more common COFF .lib
format to the format D requires. The coffimplib
command's invocation format is as follows:
coffimplib myfile.lib
These programs can be separately downloaded from Digital Mars, the small company behind the D programming language and DMD compiler. They are not necessary when building 64-bit Windows programs, or programs on any other operating system.
The DMD compiler supports pragma(lib, "name");
, which will automatically handle the linker flag while building, if you pass the module to dmd's command line. This pragma is not fully supported on GDC, but it doesn't necessarily hurt either. It will issue a warning about an unsupported pragma.
You can also create D interface files for D libraries, the extension .di
is used traditionally. The .di
files can be automatically generated with the dmd –H
option. The D interface files are similar to header files in C or C++; they list the interface definitions, but omit function bodies. The use of D interface files is optional.
Sometimes, using other libraries isn't as simple as calling their function, or you want to improve upon the API somehow. Chapter 4, Integration, explains how to address these issues.
Deimos (https://github.com/d-programming-deimos) is the official repository for translated bindings and common C libraries. It makes no attempt to change the API; it is simply a collection of ports of C library headers that can be used in D, saving you the trouble of recreating the prototypes yourself.
Dub (http://code.dlang.org) is the semi-official D package manager, and code.dlang.org lists community libraries that are available through it. This includes C bindings as well as pure D libraries.
If you are developing for 32-bit Windows, the Basic Utilities Package from Digital Mars (http://digitalmars.com/download/freecompiler.html) contains the
implib
tool as well as others to build advanced Windows.exe
files.The directory
dmd2/src/druntime/import
in the dmd's ZIP file has various D interface files for the D runtime library and the C standard library.
D has three types of arrays built in: static arrays, with a set length known at compile time; dynamic arrays, with a variable length; and associative arrays, which are similar to hashmaps or dictionaries in other languages. D's arrays and array slices are a very flexible and easy-to-use tool used in almost all D programs. Here, you'll look at some of their capabilities by building a list of integers and then finding the sum of the contents.
Let's use the following steps to build and process arrays:
Declare an array variable by using the following statement:
int[] arr;
Append data to it as shown:
arr ~= 1; arr ~= [2, 3];
Create a function that takes a slice and does some processing on it, as shown in the following code:
int sum(in int[] data) { int total = 0; foreach(item; data) total += item; return total; }
Pass a slice of the array to the function, as shown in the following code:
// Dynamic arrays can be passed directly. Static // arrays can be sliced with the [] operator.. writeln("The sum of ", arr, " is ", sum(arr));
D types are always read from right to left. The int[]
array is an array of integers. The string*[]*
pointer is a pointer to an array of pointers to string. The int[][]
array is an array of an array of integers; a staggered array.
There are two kinds of arrays in D: static and dynamic. A static array is a value type that represents a solid, fixed-size block of memory (this corresponds to an array in C). A dynamic array is conceptually a struct with two members: a pointer to the data and the length of the data. Thus, unlike a static array, dynamic arrays and slices have reference semantics. You can access the pointer and length components with the .ptr
and .length
properties, respectively. In the example here, you used a dynamic array, which has the same syntax as a slice.
There are three major operations on an array: appending, indexing, and slicing.
Appending: This is done with the
~=
operator. There's also the binary~
operator (not to be confused with the unary~
operator, which inverts bits), which concatenates two arrays to form a new one. You can append an individual element or another static or dynamic array of compatible type.Indexing: This is done with the
[expr]
operator, for example,arr[0]
. This is very similar to C, but a key difference in D is that arrays know their own length, enabling automatic bounds checking. If you attempt to access an out-of-bounds index, you will see a RangeError.Slicing: This is done with the
[]
operator, for example,arr[]
orarr[0 .. 2]
. This is done to get a view into an array starting at the left-hand index (inclusive) and ending at the right-hand index (exclusive, which means you can use the array's length as an ending bound, which also has a shorthand syntax$
).[][][]
gets a slice into the whole thing, and it is useful to pass static arrays or user-defined array types to functions expecting slices or dynamic arrays. Slicing is a fast, constant-time operation.
You can iterate over an array or slice using the foreach
loop. You put the iteration variable, then a semicolon, and then the variable you want to iterate over. You do not have to explicitly name the variable's type, for example, foreach(item; array)
or foreach(int item; array)
.
In the example code, the function parameter is defined as an in
variable. The in
variable is shorthand keyword to give the parameter the storage classes of const
and scope
. What this means in practice is that you must not modify the array or its contents (this will be a compile error), nor should you keep a copy of or return a reference to the passed array.
D also supports array vector operations like the following:
arr[] = arr[] + 5;;;
This code will add five to every element of the array. You can also create array copies this way: arr2[] = arr[]
. This will copy arr
into arr2
. For this to work, the lengths of the two arrays must already match. To do an array copy without matching lengths, you can write array.dup
.
http://dlang.org/d-array-article.html for details of array memory management and the difference between a dynamic array and a slice.
http://dlang.org/arrays.html for a more complete listing of what D's arrays can do.
The Creating an array replacement recipe in Chapter 5, Resource Management. This will show you how to create a new array type that has the same capabilities as the built-in arrays, with custom behavior or memory allocation strategies.
The Avoiding the garbage collector recipe in Chapter 5, Resource Management. This will discuss the built-in array's memory allocation habits and what to avoid if you don't want to use the garbage collector.
D also has associative arrays, sometimes called maps or dictionaries. An associative array maps arbitrary keys to values. Unlike a regular array, the keys do not need to be sequential and do not need to be integers. Here, you'll explore their functionality by creating a program that translates input strings to other strings.
Let's translate an input by using the following steps:
Declare the associative array with string keys and string values.
Initialize it with initial data.
Loop over input lines. If the line is in the array, show the value and remove it. If not, add this line with a replacement.
When you're done, loop over the array to show your changes.
The code is as follows:
void main() { import std.stdio, std.string; string[string] replacements = ["test" : "passed", "text" : "replaced"]; replacements["foo"] = "bar"; assert(replacements["test"] == "passed"); foreach(line; stdin.byLine()) { line = line.strip(); // cut off whitespace // see if the given line is in the mapping… if(auto replacement = line in replacements) { // if yes, show the replacement, then unmap it writeln(line, " => ", *replacement); replacements.remove(line.idup); } else { // if no, add it to the map writeln(line); replacements[line.idup] = "previously inserted!"; } } foreach(line, replacement; replacements) writeln("Mapping ", line, " => ", replacement);); }
When the program runs out of lines to process, it will print out the current array contents, showing you what has been added and removed as you entered data.
First, you declared your main
function and then imported the std.stdio
and std.string
modules, which contain the I/O and whitespace stripping functions that you used later.
Next, you declared an associative array that maps strings to other strings. The syntax is ValueType[KeyType]
, and both sides can be of any D type. You also initialized the replacements with an associative array literal.
Note
It is also possible to use user-defined types as associative array keys. Using custom types as key types requires that they implement opHash
, opCmp
, and opEquals
.
The syntax of an associative array (AA) literal is [Key:Value, Key:Value, …]
. AA literals can have both compile-time constant and runtime data; ["foo":x]
is legal too.
Next, you can set a value outside the literal and check the value of a key, just for demonstration purposes. Associative arrays have similar syntax to regular arrays: you use the same bracket syntax to get and set elements.
Then, you can enter the replacement loop, reading the standard input by line and then stripping off whitespace and looking for the line in the replacements array. Let's look at this line in more detail, as follows:
if(auto replacement = line in replacements) {
On the right-hand side, you can use the in
operator to do a key lookup. This operator returns a pointer to the element if it is found, and null if it is not found.
Tip
You don't have to use the pointer returned by the in
operator. if(line in replacements)
works just as well. There's also the inverse of in
, which is !in
. The if(line !in replacements)
statement is true if line is not in the replacements array.
On the left-hand side, you can declare and assign the variable right inside the if
statement. This keeps the newly declared variable limited in scope. If the variable replacement is available, you can be certain that it is not null, since the if
statement will not execute otherwise!
In the next example, you'll proceed into the true
branch of the if
statement. This branch uses the dereference operator, *replacement
, to print out the value. The *
operator is necessary because the in
operator returns a pointer to the element rather than the element itself. Then you'll remove this key from the mapping by using the built-in associative array property remove
. Next time you insert that line, it will not be replaced.
After that, the false branch of the if
statement does not have the null
pointer stored in the variable replacement
available to use. Any attempt to access it will be a compile error. Instead, you can add the new line to the replacement map. The .idup
property is required because associative array keys must be immutable, and stdin.byLine
returns a mutable buffer. Array.idup
creates a new, immutable copy of the data.
Finally, once the input has been exhausted, you can loop over the associative array with a foreach
loop. The syntax is foreach(index, value; array)
, and you can print out the current state. The index
parameter is optional if you only need the values.
User-defined types are used everywhere in D to group data, model objects, provide compile-time checks, and more. Here, you'll create a simple vector type with a length and direction to look at some basic capabilities.
Whenever you create a user-defined collection in D, the first decision to make is whether it should be a class, struct, mixin template, or union. Mixin templates are great for code reuse. They define code that can be copied (or mixed in) to another type, with parameterization. Unions are for the cases when you need the same block of memory to have multiple types, and are the least common in typical D code. Classes and structs are the backbone of user-defined types in D, and they have the most in common. The key difference is polymorphic inheritance; if you need it, you probably want a class. Otherwise, structs are lighter weight and give maximum flexibility. Using them, you can precisely define the layout of each byte with no hidden data, overload all operators, use deterministic destruction (the RAII idiom from C++), and use both reference or value semantics depending on your specific needs. D's structs also support a form of subtyping, though not virtual functions, which you'll see in Chapter 6, Wrapped Types.
Let's summarize as follows:
Struct |
Class |
---|---|
This offers precise control over memory layout |
This offers virtual functions and inheritance |
This is ideal for lightweight wrappers of other types |
This is always a reference type |
This offers deterministic destruction |
This is usually managed by the garbage collector |
Since your vector type will not need virtual functions, it will be a struct.
Let's look at creating a vector type using the following steps:
Declare the
struct
variable with a name. This declaration can appear anywhere; but, in your case, you want it to be generally accessible. So, it should go in the top-level scope of your module. Unlike C++, there is no need to put a semicolon at the end of thestruct
definition, as shown in the following code:struct Vector {}
Determine which data members are needed and add them to the struct. Here, you need a magnitude and direction, and they will be floating point types:
struct Vector { float magnitude; float direction; }
Add methods that operate on the data to the struct. In this case, you want to be able to add vectors together and convert from
(x, y)
coordinates. The complete code is as follows:struct Vector { // the data float magnitude; float direction; // the methods /// create a Vector from an (x, y) point static Vector fromPoint(float[2] point) { import std.math; Vector v; float x = point[0]; float y= point[1]; v.magnitude = sqrt(x ^^ 2 + y ^^ 2); v.direction = atan2(y, x); return v; }}} /// converts to an (x,y) point. returns in an array. float[2] toPoint() const { import std.math; float x = cos(direction) * magnitude; float y = sin(direction) * magnitude; return [x, y]; } /// the addition operator Vector opBinary(string op : "+")(Vector rhs) const { auto point = toPoint(), point2 = rhs.toPoint(); point[0] += point2[0]; point[1] += point2[1];];]; return Vector.fromPoint(point);); } }
Use the new type as follows:
auto origin = Vector(0, 0); import std.math; auto result = origin + Vector(1.0, PI); import std.stdio; writeln("Vector result: ", result); writeln(" Point result: ", result.toPoint());
It will print Vector(1.0, 3.14)
and [-1, 0]
, showing the vector sum as magnitude and direction, and then x, y. Your run may have slightly different results printed due to differences in how your computer rounds off the floating point result.
Structs are aggregate types that can contain data members and function methods. All members and methods are defined directly inside the struct, between the opening and closing braces. Data members have the same syntax as a variable declaration: a type (which can be inferred, if there is an initializer), a name, and optionally, an initializer. Initializers must be evaluated at compile time. When you declare a struct, without an explicit initializer, all members are set to the value of their initializers inside the struct
definition.
Methods have the same syntax as functions at module scope, with two differences; they can be declared static and they may have const
, immutable
, or inout
attached, which applies to the variable this
. The this
variable is an automatically declared variable that represents the current object instance in a method. The following recipe on immutability will discuss these keywords in more detail.
Operator overloading in D is done with methods and special names. In this section, you defined opBinary
, which lets you overload the binary operators such as the addition and subtraction operators. It is specialized only on the +
operator. It is also possible to overload casting, assignment, equality checking, and more.
At the usage point, you declared a vector with auto
, using the automatically defined constructor.
Finally, when you write the result, you use the automatic string formatting that prints the name and the values, in the same way as the automatic constructor. It is also possible to take control of this by implementing your own toString
method.
Chapter 6, Wrapped Types, will show more advanced capabilities, including how to use structs to make a reference type and to use constructors, destructors, postblits, and so on.
Inheritance and dynamic class casting will show how to make the most of classes.
Visit http://dlang.org/operatoroverloading.html for the language documentation on operator overloading. It details all the operators available for overloading and how to do it.
D uses exceptions to handle errors, just like many other programming languages. In D, exceptions are always implemented as classes that derive from the class Throwable
, and they are differentiated by their type. So, it is best to generate a new exception subclass for different types of errors your code can generate. This way, users of your code will get the most information and control out of your exceptions.
Let's use a custom exception type by using the following steps:
Declare a class that inherits from
Exception
.Make a constructor that takes, minimally, two parameters:
string file
andsize_t line
, with default values of__FILE__
and__LINE__
, respectively.Have the constructor forward the arguments to the constructor of
Exception
.Use your exception.
The following is the code:
class MyException : Exception { this(string message, string file = __FILE__, size_t line = __LINE__, Throwable next = null) { super(message, file, line, next); } } void main() { import std.stdio; try throw new MyException("message here"); catch(MyException e) writeln("caught ", e); }
D uses exceptions to handle errors. All throwable objects inherit from either Exception
, for recoverable events, or Error
for unrecoverable errors, which generally ought not be caught. The common base class is Throwable
.
Typically, a custom exception inherits from Exception
, then declares, minimally, a constructor that forwards the functionality to super()
. You may also store additional information specific to your use case.
The constructor of Exception
(here, called with super()
) takes four arguments: a string message, a filename, a line number, and optionally, a reference to another exception. The message, filename, and line number are used to construct a message for the user, which is printed to the console if the exception is not caught.
You don't have to specify the file and line number at the throw site; any default argument of __FILE__
or __LINE__
is automatically expanded at the function's call site. This is useful to make the error message more useful by showing exactly where the exception came from.
The fourth parameter, Throwable next
, is used if an exception handler throws an exception. It references the exception that was being handled when this one was generated.
You should check error codes when using the C functions and turn them into exceptions. If it sets errno
for error details, the std.exception
module has a subclass called ErrnoException
that is perfect for the following code:
import core.sys.posix.unistd; // for the low-level Posix functions import core.sys.posix.fnctl // for more low-level Posix functions import std.exception; // for ErrnoException auto fd = open("myfile.txt", O_RDONLY); // open() returns -1 if it was unable to open the file, // and sets errno with error details. Check for that failure. if(fd == -1) throw new ErrnoException("Couldn't open myfile.txt"); // close the file automatically at the end of the scope scope(exit) close(fd); /* read the file here */
Scope guards, discussed in Chapter 5, Resource Management, are convenient for use with exceptions. They let you put clean-up or recovery code near the creation point in an exception-safe way. In the preceding example, you used a scope guard to ensure the file is properly closed when the function returns, even if an exception is thrown.
Here, you'll look at how to use immutability in your functions and data types. Immutability helps us to write code that is easier to understand and maintain because it limits the places where things can change.
First, write a function. Then, look at it and determine what it needs to do. Does it just look at the data passed to it? Does it store or return a reference to data passed in? We'll use these facts about how the function uses its arguments to determine the best-fit qualifiers.
The use of const
and immutable
is slightly different on free functions and object methods.
If you are accepting a value type, const
and immutable
aren't very important.
If you are borrowing a value—going to look at it, but not store it nor modify it—use the in
keyword and, if it is a character string, use char[]
instead of string
(string
is an alias for immutable(char)[])
:
void foo(in char[] lookAtThis) { /* inspect lookAtThis */ }
If you are going to store a reference, it is best to take immutable data, if possible as follows:
void foo(immutable(ubyte)[] data) { stored = data; }
If you are going to modify the data, but not store it, use scope
, but not const
(in
is shorthand for scope const
), as follows:
void foo(scope char[] changeTheContents) { /* change it */ }
If you are not going to modify or store the contents, but will return a reference to it, use inout
as follows:
inout(char)[] substring(inout(char)[] haystack, size_t start, size_t end) { return haystack[start .. end]; }
If you are going to change the value itself (not just the contents it references), use ref
as follows:
void foo(ref string foo) { /* change foo itself */ }
When writing object methods, all of the preceding functions still apply, in addition to putting a qualifier on the this
parameter. The qualifier for this goes either before or after the function as follows:
int foo() const { return this.member; } /* this is const */ const int foo() { return this.member; } /* same as above */
Since the second form can be easily confused with returning a const
value (the correct syntax for that is const(int) foo() { …}
), the first form is preferred. Put qualifiers on this at the end of the function.
D's const
qualifiers is different than that of C++ in two key ways: D has immutable
qualifiers, which means the data will never change, and D's const
and immutable
qualifiers are transitive, that is, everything reachable through a const
/immutable
reference is also const
/immutable
. There is no escape like the mutable
keyword of C++.
These two differences result in a stronger guarantee, which is useful, especially when storing data.
When storing data, you generally want either immutable or mutable data—const
usually isn't very useful on a member variable; although it prevents your class from modifying it, it doesn't prevent other functions from modifying it. Immutable means nobody will ever modify it. You can store that with confidence that it won't change unexpectedly. Of course, mutable member data is always useful to hold the object's own private state.
The guarantee that the data will never change is the strength of immutable data. You can get all the benefits of a private copy, knowing that nobody else can change it, without the cost of actually making a copy. The const
and immutable
qualifiers are most useful on reference types such as pointers, arrays, and classes. They have relatively little benefit on value types such as scalars (int
, float
, and so on) or structs because these are copied when passed to functions anyway.
When inspecting data, however, you don't need such a strong guarantee. That's where const
comes in. The const
qualifier means you will not modify the data, without insisting that nobody else can modify it. The in
keyword is a shorthand that expands to scope const
. The scope
parameters aren't fully implemented as of the time of this writing, but it is a useful concept nonetheless. A scope
parameter is a parameter where you promise that no reference to it will escape. You'll look at the data, but not store a reference anywhere. When combined with const
, you have a perfect combination for input data that you'll look at. Other than that you have the short and convenient in
keyword.
When you do return a reference to const
data, it is important that the constancy is preserved, and this should be easy. This is where D's inout
keyword is used. Consider the standard C function strstr
:
char *strstr(const char *haystack, const char *needle);
This function returns a pointer to haystack
where it finds needle
, or null if needle
is not found. The problem with this prototype is that the const
character attached to haystack
is lost on the return value. It is possible to write to constant data through the pointer returned by strstr
, breaking the type system.
In C++, the solution to this is often to duplicate the function, one version that uses const
, and one version that does not. D aims to fix the system, keeping the strong constancy guarantee that C loses and avoiding the duplication that C++ requires. The appropriate definition for a strstr
style function in D will be as follows:
inout(char)* strstr(inout(char)* haystack, in char* needle);
The inout
method is used on the return value, in place of const
, and is also attached to one or more parameters, or the this
reference. Inside the function, the inout(T)
data is the same as const(T)
data. In the signature, it serves as a wildcard that changes based on the input data. If you pass a mutable haystack, it will return a mutable pointer. A const
haystack returns a const
pointer. Also, an immutable
haystack will return an immutable
pointer. One function, three uses.
D also has the ref
function parameters. These give a reference to the variable itself, as shown in the following code:
void foo(int a) { a = 10; } void bar(ref int a) { a = 10; } int test = 0; foo(test); assert(test == 0); bar(test); assert(test == 10);
In this example, the variable test
is passed to foo
normally. Changes to a
inside the function is not seen outside the function.
Note
If a
was a pointer, changes to a
will not be seen, but changes to *a
will be visible. That's why const
and immutable
are useful there.
With the function bar
, on the other hand, it takes the parameter by reference. Here, the changes made to a
inside the function are seen at the call site; test
becomes 10.
Tip
Some guides recommend passing structs to a function by ref
for performance reasons rather than because they want changes to be seen at the call site. Personally, I do not recommend this unless you have profiled your code and have identified the struct copy as a performance problem. Also, you cannot pass a struct
literal as ref
, because there is no outer variable for it to update. So, ref
limits your options too.
D's strings are actually just an array of characters. This means any operation that you can do on arrays, also works on strings. However, since string is a UTF-8 array, there are some behaviors that you may find surprising. Here, you'll get a substring by slicing and discuss potential pitfalls.
Let's try to get a substring from a string using the following steps:
Declare a string as follows:
string s = "月明かり is some Japanese text.";
Get the correct index for start and end. You'll get the Japanese text out by searching the string for the first space, and slice up to that point by using the following code:
import std.string; string japaneseText = s[0 .. s.indexOf(" ")];
Loop over the string, looking at the UTF-8 code units as well as the Unicode code points. So, you can see the difference in your string by using the following code:
import std.stdio; foreach(idx, char c; japaneseText) writefln("UTF-8 Code unit at index %d is %d", idx, c); foreach(dchar c; japaneseText) writefln("UTF-32 code unit with value %d is %c", c, c);
The program will print out more code units in UTF-8 than in dchars
, because the Japanese text is composed of multibyte characters, unlike English text.
D's implementations of strings uses Unicode. Unicode is a complicated standard that could take up a whole book on its own, but you can use it in D knowing just some basics. D string, as well as D source code, uses UTF-8 encoding. This means you can paste in text from any language into a D source file and process it with D code.
However, UTF-8 has a complication; the length of a single code point is variable. Often, one code point is one character, though Unicode's complexity means graphemes (that is, what you might also call a visible character) may consist of more than one code point! For English text, UTF-8 beautifully maps directly to ASCII, which means that one code unit is one character. However, for other languages, there are too many characters to express in one byte. Japanese is one example where all the characters are multibyte in UTF-8.
So, while there are only four characters in your program, if you slice from s[0 .. 4]
, you won't get all four characters. D's slice operator works on code units. You'll get a partial result here, which may not be usable.
Instead, you found the correct index by using the standard library function indexOf
. This searches the string for the given substring and returns the index, or -1 if it could not be found. The slice [start .. end]
goes from start, including it, to the end, not including that. So, [0 .. indexOf(…)]
goes from the start, up to, but not including, the space. This slice is safe to use, even if it contains multibyte characters.
Finally, you looped over the Japanese text to examine the encodings. The foreach
loop understands UTF encoding. The first variant asks for characters, or UTF-8 code units, and yields them without decoding. The second variant asks for dchars
, which are UTF-32 code units that are numerically equivalent to Unicode code points. Asking for dchars
is slower than iterating over chars
, but has the advantage of removing much of the complexity of handling multibyte characters. The second loop prints only one entry per Japanese character, or any other character that cannot be encoded in a single UTF-8 unit.
D also supports UTF-16 and UTF-32 strings. These are typed wstring
and dstring
, respectively. Let's look at each of these as follows:
wstring
: This is very useful on Windows, because the Windows operating system natively works with UTF-16.dstring
: This eats a lot of memory, about 4 times more than strings for English text, but sidesteps some of the issues discussed here. The reason is that each array index corresponds to one Unicode code point.
Classes are used to provide object-oriented features in D. To explore how they work, you're going to write a small inheritance hierarchy to evaluate basic addition and subtraction operations.
Before writing a class, step back and ask yourself whether it is the best tool for the job. Will you be using inheritance to create objects that are substitutable for their parent? If not, a struct may be more appropriate. If you plan to use inheritance for code reuse without substitutability, a mixin template may be more appropriate. Here, you'll use classes for substitutability, and a mixin template for some code reuse.
Let's create a tree of classes by executing the following steps:
Create a class, with the data and methods it needs. For your expression evaluator, you'll create two classes:
AddExpression
andSubtractExpression
. They will need variables for the left and right-hand side of the expression, and a method to evaluate the result.Move common methods from substitutable classes out to an interface, and make the classes inherit from it by putting a colon after the class name, followed by the interface name. In both
AddExpression
andSubtractExpression
, you will have an evaluate method. You'll move this function signature, but not the function body, to the interface, calledExpression
.If there is still a lot of code duplication, move the identical code out to a mixin template, and mix it in at the usage point.
Functions should operate on interface parameters, if possible, instead of classes, for maximum reusability.
The following is the code you have so far:
interface Expression { // this is the common method from the classes we made int evaluate(); } mixin template BinaryExpression() { // this is the common implementation code from the classes private int a, b; this(int left, int right) { this.a = left; this.b= right; } } // printResult can evaluate and print any expression class // thanks to taking the general interface void printResult(Expression expression) { import std.stdio; writeln(expression.evaluate()); } class AddExpression : Expression { // inherit from the interface mixin BinaryExpression!(); // adds the shared code int evaluate() { return a + b; } // implement the method } class SubtractExpression : Expression { mixin BinaryExpression!(); int evaluate() { return a - b; } }
Let's also add a
BrokenAddExpression
class that uses inheritance to override theevaluate
function ofAddExpression
:class BrokenAddExpression : AddExpression { this(int left, int right) { super(left, right); } // this changes evaluate to subtract instead of add! // note the override keyword override int evaluate() { return a - b; } }
Finally, you'll construct some instances and use them as follows:
auto add = new AddExpression(1, 2); printResult(add); auto subtract = new SubtractExpression(2, 1); printResult(subtract); // same function as above!
The usage will print 3
and 1
, showing the different operations. You can also create a BrokenAddExpression
function and assign it to add
as follows:
add = new BrokenAddExpression(1, 2); printResult(add); // prints -1
Classes in D are similar to classes in Java. They are always reference types, have a single inheritance model with a root object, and may implement any number of interfaces.
Class constructors are defined with the this
keyword. Any time you create a new class, it calls one of the constructors. You may define as many as you want, as long as each has a unique set of parameters.
Note
Classes may have destructors, but you typically should not use them. When a class object is collected by the garbage collector, its child members may have already been collected, which means that they cannot be accessed by the destructor. Any attempt to do so will likely lead to a program crash. Moreover, since the garbage collector may not run at a predictable time (from the class' point of view), it is hard to know when, if ever, the destructor will actually be run. If you need a deterministic destruction, you should use a struct instead, or wrap your class in a struct and call the destructor yourself with the destroy()
function.
Object instances are upcasted implicitly. This is why you could assign BrokenAddException
to the add
variable, which is statically typed as AddExpression
. This is also the reason why you can pass any of these classes to the printResult
function, since they will all be implicitly cast to the interface when needed. However, going the other way, when casting from interface or a base class to a derived class, you must use an explicit cast
. It returns null if the cast
fails. Use the following code to better understand this:
if(auto bae = cast(BrokenAddExpression) expression) { /* we were passed an instance of BrokenAddExpression and can now use the bae variable to access its specific members */ } else { /* we were passed some other class */ }
In classes, all methods are virtual by default. You can create non-virtual methods with the final
keyword, which prevents a subclass from overriding a method. Abstract functions, created with the abstract
keyword, need not to have an implementation, and they must be implemented in a child class if the object is to be instantiated. All methods in an interface that are not marked as final or static are abstract and must be implemented by a non-abstract class.
When you override a virtual
or abstract
function from a parent class, you must use the override
keyword. If a matching function with any method marked override cannot be found, the compiler will issue an error. This ensures that the child class's method is actually compatible with the parent definition, ensuring that it is substitutable for the parent class. (Of course, ensuring the behavior is substitutable too is your responsibility as the programmer!)
The mixin template is a feature of D that neither C++ nor Java have. A mixin template is a list of declarations, variables, methods, and/or constructors. At the usage point, use the following code:
mixin BinaryExpression!();
This will essentially copy and paste the code inside the template to the point of the mixin
statement. The template can take arguments as well, given in the parenthesis. Here, you didn't need any parameterization, so the parentheses are empty. Templates in D, including mixin templates, can take a variety of arguments including values, types, and symbols. You'll discuss templates in more depth later in the book.
Using interfaces and mixin templates, like you did here, can also be extended to achieve a result similar to multiple inheritance in C++, without the inheritance of state and avoiding the diamond inheritance problem that C++ has.
The Simulating inheritance with structs recipe in Chapter 6, Wrapped Types, shows how you can also achieve something subtyping and data extension, similar to inheritance with data, using structs.
The official documentation can be found at http://dlang.org/class.html and it goes into additional details about the capabilities of classes.