Programming | 0 articles | Tech News, Tutorials & Expert Insights

17 May 2018

9 min read

That '70s language: AWK programming

17 May 2018

AWK is an interpreted programming language designed for text processing and report generation. It is typically used for data manipulation, such as searching for items within data, performing arithmetic operations, and restructuring raw data for generating reports in most Unix-like operating systems. Today, we will explore the AWK philosophy and different types of AWK that exist, starting from its original implementation in 1977 at AT&T's Laboratories, Inc. We will also look at the various implementation areas of AWK in data science today. Using AWK programs, one can handle repetitive text-editing problems with very simple and short programs. It is a pattern-action language; it searches for patterns in a given input and, when a match is found, it performs the corresponding action. The pattern can be made of strings, regular expressions, comparison operations on numbers, fields, variables, and so on. It reads the input files and splits each input line of the file into fields automatically. AWK has most of the well-designed features that every programming language should contain. Its syntax particularly resembles that of the C programming language. It is named after its original three authors: Alfred V. Aho Peter J. Weinberger Brian W. Kernighan AWK is a very powerful, elegant, and simple that every person dealing with text processing should be familiar with. This article is an excerpt from a book written by Shiwang Kalkhanda, titled Learning AWK Programming. This book will introduce you to AWK programming language and get you hands-on working with practical implementation of AWK. Types of AWK The AWK language was originally implemented as an AWK utility on Unix. Today, most Linux distributions provide GNU implementation of AWK (GAWK), and a symlink for AWK is created from the original GAWK binary. The AWK utility can be categorized into the following three types, depending upon the type of interpreter it uses for executing AWK programs: AWK: This is the original AWK interpreter available from AT&T Laboratories. However, it is not used much nowadays and hence it might not be well-maintained. Its limitation is that it splits a line into a maximum 99 fields. It was updated and replaced in the mid-1980s with an enhanced version called New AWK (NAWK). NAWK: This is AT&T's latest development on the AWK interpreter. It is well-maintained by one of the original authors of AWK - Dr. Brian W. Kernighan. GAWK: This is the GNU project's implementation of the AWK programming language. All GNU/Linux distributions are shipped with GAWK by default and hence it is the most popular version of AWK. GAWK interpreter is fully compatible with AWK and NAWK. Beyond these, we also have other, less popular, AWK interpreters and translators, mentioned as follows. These variants are useful in operations when you want to translate your AWK program to C, C++, or Perl: MAWK: Michael Brennan interpreter for AWK. TAWK: Thompson Automation interpreter/compiler/Microsoft Windows DLL for AWK. MKSAWK: Mortice Kern Systems interpreter/compiler/for AWK. AWKCC: An AWK translator to C (might not be well-maintained). AWKC++: Brian Kernighan's AWK translator to C++ (experimental). It can be downloaded from: https://9p.io/cm/cs/who/bwk/awkc++.ps. AWK2C: An AWK translator to C. It uses GNU AWK libraries extensively. A2P: An AWK translator to Perl. It comes with Perl. AWKA: Yet another AWK translator to C (comes with the library), based on MAWK. It can be downloaded from: http://awka.sourceforge.net/download.html. When and where to use AWK AWK is simpler than any other utility for text processing and is available as the default on Unix-like operating systems. However, some people might say Perl is a superior choice for text processing, as AWK is functionally a subset of Perl, but the learning curve for Perl is steeper than that of AWK; AWK is simpler than Perl. AWK programs are smaller and hence quicker to execute. Anybody who knows the Linux command line can start writing AWK programs in no time. Here are a few use cases of AWK: Text processing Producing formatted text reports/labels Performing arithmetic operations on fields of a file Performing string operations on different fields of a file Programs written in AWK are smaller than they would be in other higher-level languages for similar text processing operations. AWK programs are interpreted on a GNU/Linux Terminal and thus avoid the compiling, debugging phase of software development in other languages. Getting started with installation This section describes how to set up the AWK environment on your GNU/Linux system, and we'll also discuss the workflow of AWK. Then, we'll look at different methods for executing AWK programs. Installation on Linux Generally, AWK is installed by default on most GNU/Linux distributions. Using the which command, you can check whether it is installed on your system or not. In case AWK is not installed on your system, you can do so in one of two ways: Using the package manager of the corresponding GNU/Linux system Compiling from the source code Let's take a look at each method in detail in the following sections. Using the package manager Different flavors of GNU/Linux distribution have different package-management utilities. If you are using a Debian-based GNU/Linux distribution, such as Ubuntu, Mint, or Debian, then you can install it using the Advance Package Tool (APT) package manager, as follows: [ shiwang@linux ~ ] $ sudo apt-get update -y [ shiwang@linux ~ ] $ sudo apt-get install gawk -y Similarly, to install AWK on an RPM-based GNU/Linux distribution, such as Fedora, CentOS, or RHEL, you can use the Yellowdog Updator Modified (YUM) package manager, as follows: [ root@linux ~ ] # yum update -y [ root@linux ~ ] # yum install gawk -y For installation of AWK on openSUSE, you can use the zypper (zypper command line) package-management utility, as follows: [ root@linux ~ ] # zypper update -y [ root@linux ~ ] # zypper install gawk -y Once the installation is finished, make sure AWK is accessible through the command line. We can check that using the which command, which will return the absolute path of AWK on our system: [ root@linux ~ ] # which awk /usr/bin/awk You can also use awk --version to find the AWK version on our system: [ root@linux ~ ] # awk --version Compiling from the source code Like every other open source utility, the GNU AWK source code is freely available for download as part of the GNU project. Previously, you saw how to install AWK using the package manager; now, you will see how to install AWK by compiling from its source code on the GNU/Linux distribution. The following steps are applicable to most of the GNU/Linux software for installation: Download the source code from a GNU project ftp site. Here, we will use the wget command line utility to download it, however you are free to choose any other program, such as curl, you feel comfortable with: [ shiwang@linux ~ ] $ wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.3.tar.xz Extract the downloaded source code: [ shiwang@linux ~ ] $ tar xvf gawk-4.1.3.tar.xz Change your working directory and execute the configure file to configure the GAWK as per the working environment of your system: [ shiwang@linux ~ ] $ cd gawk-4.1.3 && ./configure Once the configure command completes its execution successfully, it will generate the make file. Now, compile the source code by executing the make command: [ shiwang@linux ~ ] $ make Type make install to install the programs and any data files and documentation. When installing into a prefix owned by root, it is recommended that the package be configured and built as a regular user, and only the make install phase is executed with root privileges: [ shiwang@linux ~ ] $ sudo make install Upon successful execution of these five steps, you have compiled and installed AWK on your GNU/Linux distribution. You can verify this by executing the which awk command in the Terminal or awk --version: [ root@linux ~ ] # which awk /usr/bin/awk Now you have a working AWK/GAWK installation and we are ready to begin AWK programming, but before that, our next section describes the workflow of the AWK interpreter. If you are running on macOS X, AWK, and not GAWK, would be installed as a default on it. For GAWK installation on macOS X, please refer to MacPorts for GAWK. Workflow of AWK Having a basic knowledge of the AWK interpreter workflow will help you to better understand AWK and will result in more efficient AWK program development. Hence, before getting your hands dirty with AWK programming, you need to understand its internals. The AWK workflow can be summarized as shown in the following figure: Let's take a look at each operation: READ OPERATION: AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory. It works on text input, which can be a file, the standard input stream, or from a pipe, which it further splits into records and fields: Records: An AWK record is a single, continuous data input that AWK works on. Records are bounded by a record separator, whose value is stored in the RS variable. The default value of RS is set to a newline character. So, the lines of input are considered records for the AWK interpreter. Records are read continuously until the end of the input is reached. Figure 1.2 shows how input data is broken into records and then goes further into how it is split into fields: Fields: Each record can further be broken down into individual chunks called fields. Like records, fields are bounded. The default field separator is any amount of whitespace, including tab and space characters. So by default, lines of input are further broken down into individual words separated by whitespace. You can refer to the fields of a record by a field number, beginning with 1. The last field in each record can be accessed by its number or with the NF special variable, which contains the number of fields in the current record, as shown in Figure 1.3: EXECUTE OPERATION: All AWK commands are applied sequentially on the input (records and fields). By default, AWK executes commands on each record/line. This behavior of AWK can be restricted by the use of patterns. REPEAT OPERATION: The process of read and execute is repeated until the end of the file is reached. The following flowchart depicts the workflow: We introduced you to the AWK programming language and got ourselves a quick primer to get started with application development. If you found this post is useful, do check out the book Learning AWK Programming to learn more about the intricacies of AWK programming language for text processing. The oldest programming languages in use today What is the difference between functional and object oriented programming? Systems programming with Go in UNIX and Linux

0
0
20184

article-image-work-with-classes-in-typescript

Amey Varangaonkar

15 May 2018

8 min read

How to work with classes in Typescript

Amey Varangaonkar

15 May 2018

8 min read

If we are developing any application using TypeScript, be it a small-scale or a large-scale application, we will use classes to manage our properties and methods. Prior to ES 2015, JavaScript did not have the concept of classes, and we used functions to create class-like behavior. TypeScript introduced classes as part of its initial release, and now we have classes in ES6 as well. The behavior of classes in TypeScript and JavaScript ES6 closely relates to the behavior of any object-oriented language that you might have worked on, such as C#. This excerpt is taken from the book TypeScript 2.x By Example written by Sachin Ohri. Object-oriented programming in TypeScript Object-oriented programming allows us to represent our code in the form of objects, which themselves are instances of classes holding properties and methods. Classes form the container of related properties and their behavior. Modeling our code in the form of classes allows us to achieve various features of object-oriented programming, which helps us write more intuitive, reusable, and robust code. Features such as encapsulation, polymorphism, and inheritance are the result of implementing classes. TypeScript, with its implementation of classes and interfaces, allows us to write code in an object-oriented fashion. This allows developers coming from traditional languages, such as Java and C#, feel right at home when learning TypeScript. Understanding classes Prior to ES 2015, JavaScript developers did not have any concept of classes; the best way they could replicate the behavior of classes was with functions. The function provides a mechanism to group together related properties and methods. The methods can be either added internally to the function or using the prototype keyword. The following is an example of such a function: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; this.fullName = function() { return this.firstName + ' ' + this.lastName ; }; } In this preceding example, we have the fullName method encapsulated inside the Name function. Another way of adding methods to functions is shown in the following code snippet with the prototype keyword: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; } Name.prototype.fullName = function() { return this.firstName + ' ' + this.lastName ; }; These features of functions did solve most of the issues of not having classes, but most of the dev community has not been comfortable with these approaches. Classes make this process easier. Classes provide an abstraction on top of common behavior, thus making code reusable. The following is the syntax for defining a class in TypeScript: The syntax of the class should look very similar to readers who come from an object-oriented background. To define a class, we use a class keyword followed by the name of the class. The News class has three member properties and one method. Each member has a type assigned to it and has an access modifier to define the scope. On line 10, we create an object of a class with the new keyword. Classes in TypeScript also have the concept of a constructor, where we can initialize some properties at the time of object creation. Access modifiers Once the object is created, we can access the public members of the class with the dot operator. Note that we cannot access the author property with the espn object because this property is defined as private. TypeScript provides three types of access modifiers. Public Any property defined with the public keyword will be freely accessible outside the class. As we saw in the previous example, all the variables marked with the public keyword were available outside the class in an object. Note that TypeScript assigns public as a default access modifier if we do not assign any explicitly. This is because the default JavaScript behavior is to have everything public. Private When a property is marked as private, it cannot be accessed outside of the class. The scope of a private variable is only inside the class when using TypeScript. In JavaScript, as we do not have access modifiers, private members are treated similarly to public members. Protected The protected keyword behaves similarly to private, with the exception that protected variables can be accessed in the derived classes. The following is one such example: class base{ protected id: number; } class child extends base{ name: string; details():string{ return `${name} has id: ${this.id}` } } In the preceding code, we extend the child class with the base class and have access to the id property inside the child class. If we create an object of the child class, we will still not have access to the id property outside. Readonly As the name suggests, a property with a readonly access modifier cannot be modified after the value has been assigned to it. The value assigned to a readonly property can only happen at the time of variable declaration or in the constructor. In the above code, line 5 gives an error stating that property name is readonly, and cannot be an assigned value. Transpiled JavaScript from classes While learning TypeScript, it is important to remember that TypeScript is a superset of JavaScript and not a new language on its own. Browsers can only understand JavaScript, so it is important for us to understand the JavaScript that is transpiled by TypeScript. TypeScript provides an option to generate JavaScript based on the ECMA standards. You can configure TypeScript to transpile into ES5 or ES6 (ES 2015) and even ES3 JavaScript by using the flag target in the tsconfig.json file. The biggest difference between ES5 and ES6 is with regard to the classes, let, and const keywords which were introduced in ES6. Even though ES6 has been around for more than a year, most browsers still do not have full support for ES6. So, if you are creating an application that would target older browsers as well, consider having the target as ES5. So, the JavaScript that's generated will be different based on the target setting. Here, we will take an example of class in TypeScript and generate JavaScript for both ES5 and ES6. The following is the class definition in TypeScript: This is the same code that we saw when we introduced classes in the Understanding Classes section. Here, we have a class named News that has three members, two of which are public and one private. The News class also has a format method, which returns a string concatenated from the member variables. Then, we create an object of the News class in line 10 and assign values to public properties. In the last line, we call the format method to print the result. Now let's look at the JavaScript transpiled by TypeScript compiler for this class. ES6 JavaScript ES6, also known as ES 2015, is the latest version of JavaScript, which provides many new features on top of ES5. Classes are one such feature; JavaScript did not have classes prior to ES6. The following is the code generated from the TypeScript class, which we saw previously: If you compare the preceding code with TypeScript code, you will notice minor differences. This is because classes in TypeScript and JavaScript are similar, with just types and access modifiers additional in TypeScript. In JavaScript, we do not have the concept of declaring public members. The author variable, which was defined as private and was initialized at its declaration, is converted to a constructor initialization in JavaScript. If we had not have initialized author, then the produced JavaScript would not have added author in the constructor. ES5 JavaScript ES5 is the most popular JavaScript version supported in browsers, and if you are developing an application that has to support the majority of browser versions, then you need to transpile your code to the ES5 version. This version of JavaScript does not have classes, and hence the transpiled code converts classes to functions, and methods inside the classes are converted to prototypically defined methods on the functions. The following is the code transpiled when we have the target set as ES5 in the TypeScript compiler options: As discussed earlier, the basic difference is that the class is converted to a function. The interesting aspect of this conversion is that the News class is converted to an immediately invoked function expression (IIFE). An IIFE can be identified by the parenthesis at the end of the function declaration, as we see in line 9 in the preceding code snippet. IIFEs cause the function to be executed immediately and help to maintain the correct scope of a function rather than declaring the function in a global scope. Another difference was how we defined the method format in the ES5 JavaScript. The prototype keyword is used to add the additional behavior to the function, which we see here. A couple of other differences you may have noticed include the change of the let keyword to var, as let is not supported in ES5. All variables in ES5 are defined with the var keyword. Also, the format method now does not use a template string, but standard string concatenation to print the output. TypeScript does a good job of transpiling the code to JavaScript while following recommended practices. This helps in making sure we have a robust and reusable code with minimum error cases. If you found this tutorial useful, make sure you check out the book TypeScript 2.x By Example for more hands-on tutorials on how to effectively leverage the power of TypeScript to develop and deploy state-of-the-art web applications. How to install and configure TypeScript Understanding Patterns and Architectures in TypeScript Writing SOLID JavaScript code with TypeScript

0
0
30675

article-image-install-configure-typescript

Amey Varangaonkar

11 May 2018

9 min read

How to install and configure TypeScript

Amey Varangaonkar

11 May 2018

9 min read

In this tutorial, we will look at the installation process of TypeScript and the editor setup for TypeScript development. Microsoft does well in providing easy-to-perform steps to install TypeScript on all platforms, namely Windows, macOS, and Linux. [box type="shadow" align="" class="" width=""]The following excerpt is taken from the book TypeScript 2.x By Example written by Sachin Ohri. This book presents hands-on examples and projects to learn the fundamental concepts of the popular TypeScript programming language.[/box] Installation of TypeScript TypeScript's official website is the best source to install the latest version. On the website, go to the Download section. There, you will find details on how to install TypeScript. Node.js and Visual Studio are the two most common ways to get it. It supports a host of other editors and has plugins available for them in the same link. We will be installing TypeScript using Node.js and using Visual Studio Code as our primary editor. You can use any editor of your choice and be able to run the applications seamlessly. If you use full-blown Visual Studio as your primary development IDE, then you can use either of the links, Visual Studio 2017 or Visual Studio 2013, to download the TypeScript SDK. Visual Studio does come with a TypeScript compiler but it's better to install it from this link so as to get the latest version. To install TypeScript using Node.js, we will use npm (node package manager), which comes with Node.js. Node.js is a popular JavaScript runtime for building and running server-side JavaScript applications. As TypeScript compiles into JavaScript, Node is an ideal fit for developing server-side applications with the TypeScript language. As mentioned on the website, just running the following command in the Terminal (on macOS) / Command Prompt (on Windows) window will install the latest version: npm install -g typescript To load any package from Node.js, the npm command starts with npm install; the -g flag identifies that we are installing the package globally. The last parameter is the name of the package that we are installing. Once it is installed, you can check the version of TypeScript by running the following command in the Terminal window: tsc -v You can use the following command to get the help for all the other options that are available with tsc: tsc -h TypeScript editors One of the outstanding features of TypeScript is its support for editors. All the editors provide support for language services, thereby providing features such as IntelliSense, statement completion, and error highlighting. If you are coming from a .NET background, then Visual Studio 2013/2015/2017 is a good option for you. Visual Studio does not require any configuration and it's easy to start using TypeScript. As we discussed earlier, just install the SDK and you are good to go. If you are from a Java background, TypeScript supports Eclipse as well. It also supports plugins for Sublime, WebStorm, and Atom, and each of these provides a rich set of features. Visual Studio Code (VS Code) is another good option for an IDE. It's a smaller, lighter version of Visual Studio and primarily used for web application development. VS Code is lightweight and cross-platform, capable of running on Windows, Linux, and macOS. It has an ever-increasing set of plugins to help you write better code, such as TSLint, a static analysis tool to help TypeScript code for readability, maintainability, and error checking. VS Code has a compelling case to be the default IDE for all sorts of web application development. In this post, we will briefly look at the Visual Studio and VS Code setup for TypeScript. Visual Studio Visual Studio is a full-blown IDE provided by Microsoft for all .NET based development, but now Visual Studio also has excellent support for TypeScript with built-in project templates. A TypeScript compiler is integrated into Visual Studio to allow automatic transpiling of code to JavaScript. Visual Studio also has the TypeScript language service integrated to provide IntelliSense and design-time error checking, among other things. With Visual Studio, creating a project with a TypeScript file is as simple as adding a new file with a .ts extension. Visual Studio will provide all the features out of the box. VS Code VS Code is a lightweight IDE from Microsoft used for web application development. VS Code can be installed on Windows, macOS, and Linux-based systems. VS Code can recognize the different type of code files and comes with a huge set of extensions to help in development. You can install VS Code from https://code.visualstudio.com/download. VS Code comes with an integrated TypeScript compiler, so we can start creating projects directly. The following screenshot shows a TypeScript file opened in VS Code: To run the project in VS Code, we need a task runner. VS Code includes multiple task runners which can be configured for the project, such as Gulp, Grunt, and TypeScript. We will be using the TypeScript task runner for our build. VS Code has a Command Palette which allows you to access various different features, such as Build Task, Themes, Debug options, and so on. To open the Command Palette, use Ctrl + Shift + P on a Windows machine or Cmd + Shift + P on a macOS. In the Command Palette, type Build, as shown in the following screenshot, which will show the command to build the project: When the command is selected, VS Code shows an alert, No built task defined..., as follows: We select Configure Build Task and, from all the available options as shown in the following screenshot, choose TypeScript build: This creates a new folder in your project, .vscode and a new file, task.json. This JSON file is used to create the task that will be responsible for compiling TypeScript code in VS Code. TypeScript needs another JSON file (tsconfig.json) to be able to configure compiler options. Every time we run the code, tsc will look for a file with this name and use this file to configure itself. TypeScript is extremely flexible in transpiling the code to JavaScript as per developer requirements, and this is achieved by configuring the compiler options of TypeScript. TypeScript compiler The TypeScript compiler is called tsc and is responsible for transpiling the TypeScript code to JavaScript. The TypeScript compiler is also cross-platform and supported on Windows, macOS, and Linux. To run the TypeScript compiler, there are a couple of options. One is to integrate the compiler in your editor of choice, which we explained in the previous section. In the previous section, we also integrated the TypeScript compiler with VS Code, which allowed us to build our code from the editor itself. All the compiler configurations that we would want to use are added to the tsconfig.json file. Another option is to use tsc directly from the command line / Terminal window. TypeScript's tsc command takes compiler configuration options as parameters and compiles code into JavaScript. For example, create a simple TypeScript file in Notepad and add the following lines of code to it. To create a file as a TypeScript file, we just need to make sure we have the file extension as *.ts: class Editor { constructor(public name: string,public isTypeScriptCompatible : Boolean) {} details() { console.log('Editor: ' + this.name + ', TypeScript installed: ' + this.isTypeScriptCompatible); } } class VisualStudioCode extends Editor{ public OSType: string constructor(name: string,isTypeScriptCompatible : Boolean, OSType: string) { super(name,isTypeScriptCompatible); this.OSType = OSType; } } let VS = new VisualStudioCode('VSCode', true, 'all'); VS.details(); This is the same code example we used in the TypeScript features section of this chapter. Save this file as app.ts (you can give it any name you want, as long as the extension of the file is *.ts). In the command line / Terminal window, navigate to the path where you have saved this file and run the following command: tsc app.ts This command will build the code and the transpile it into JavaScript. The JavaScript file is also saved in the same location where we had TypeScript. If there is any build issue, tsc will show these messages on the command line only. As you can imagine, running the tsc command manually for medium- to large-scale projects is not a productive approach. Hence, we prefer to use an editor that has TypeScript integrated. The following table shows the most commonly used TypeScript compiler configurations. We will be discussing these in detail in upcoming chapters: Compiler option Type Description allowUnusedLabels boolean By default, this flag is false. This option tells the compiler to flag unused labels. alwaysStrict boolean By default, this flag is false. When turned on, this will cause the compiler to compile in strict mode and emit use strict in the source file. module string Specify module code generation: None, CommonJS, AMD, System, UMD, ES6, or ES2015. moduleResolution string Determines how the module is resolved. noImplicitAny boolean This property allows an error to be raised if there is any code which implies data type as any. This flag is recommended to be turned off if you are migrating a JavaScript project to TypeScript in an incremental manner. noImplicitReturn boolean Default value is false; raises an error if not all code paths return a value. noUnusedLocals boolean Reports an error if there are any unused locals in the code. noUnusedParameter boolean Reports an error if there are any unused parameters in the code. outDir string Redirects output structure to the directory. outFile string Concatenates and emits output to a single file. The order of concatenation is determined by the list of files passed to the compiler on the command line along with triple-slash references and imports. See the output file order documentation for more details. removeComments boolean Remove all comments except copyright header comments beginning with /*!. sourcemap boolean Generates corresponding .map file. Target string Specifies ECMAScript target version: ES3(default), ES5, ES6/ES2015, ES2016, ES2017, or ESNext. Watch Runs the compiler in watch mode. Watches input files and triggers recompilation on changes. We saw it is quite easy to set up and configure TypeScript, and we are now ready to get started with our first application! To learn more about writing and compiling your first TypeScript application, make sure you check out the book TypeScript 2.x By Example. Introduction to TypeScript Introducing Object Oriented Programmng with TypeScript Elm and TypeScript – Static typing on the Frontend

0
1
50226

article-image-8-recipes-to-master-promises-in-ecmascript-2018

Richa Tripathi

08 May 2018

17 min read

8 recipes to master Promises in ECMAScript 2018

Richa Tripathi

08 May 2018

17 min read

0
0
19228

article-image-single-responsibility-principle-solid-net-core

Aaron Lazar

07 May 2018

7 min read

Applying Single Responsibility principle from SOLID in .NET Core

Aaron Lazar

07 May 2018

7 min read

0
0
16646

article-image-azure-function-asp-net-core-mvc-application

Aaron Lazar

03 May 2018

10 min read

How to call an Azure function from an ASP.NET Core MVC application

Aaron Lazar

03 May 2018

10 min read

In this tutorial, we'll learn how to call an Azure Function from an ASP.NET Core MVC application. [box type="shadow" align="" class="" width=""]This article is an extract from the book C# 7 and .NET Core Blueprints, authored by Dirk Strauss and Jas Rademeyer. This book is a step-by-step guide that will teach you essential .NET Core and C# concepts with the help of real-world projects.[/box] We will get started with creating an ASP.NET Core MVC application that will call our Azure Function to validate an email address entered into a login screen of the application: This application does no authentication at all. All it is doing is validating the email address entered. ASP.NET Core MVC authentication is a totally different topic and not the focus of this post. In Visual Studio 2017, create a new project and select ASP.NET Core Web Application from the project templates. Click on the OK button to create the project. This is shown in the following screenshot: On the next screen, ensure that .NET Core and ASP.NET Core 2.0 is selected from the drop-down options on the form. Select Web Application (Model-View-Controller) as the type of application to create. Don't bother with any kind of authentication or enabling Docker support. Just click on the OK button to create your project: After your project is created, you will see the familiar project structure in the Solution Explorer of Visual Studio: Creating the login form For this next part, we can create a plain and simple vanilla login form. For a little bit of fun, let's spice things up a bit. Have a look on the internet for some free login form templates: I decided to use a site called colorlib that provided 50 free HTML5 and CSS3 login forms in one of their recent blog posts. The URL to the article is: https://colorlib.com/wp/html5-and-css3-login-forms/. I decided to use Login Form 1 by Colorlib from their site. Download the template to your computer and extract the ZIP file. Inside the extracted ZIP file, you will see that we have several folders. Copy all the folders in this extracted ZIP file (leave the index.html file as we will use this in a minute): Next, go to the solution for your Visual Studio application. In the wwwroot folder, move or delete the contents and paste the folders from the extracted ZIP file into the wwwroot folder of your ASP.NET Core MVC application. Your wwwroot folder should now look as follows: 4. Back in Visual Studio, you will see the folders when you expand the wwwroot node in the CoreMailValidation project. 5. I also want to focus your attention to the Index.cshtml and _Layout.cshtml files. We will be modifying these files next: Open the Index.cshtml file and remove all the markup (except the section in the curly brackets) from this file. Paste the HTML markup from the index.html file from the ZIP file we extracted earlier. Do not copy the all the markup from the index.html file. Only copy the markup inside the <body></body> tags. Your Index.cshtml file should now look as follows: @{ ViewData["Title"] = "Login Page"; } <div class="limiter"> <div class="container-login100"> <div class="wrap-login100"> <div class="login100-pic js-tilt" data-tilt> <img src="images/img-01.png" alt="IMG"> </div> <form class="login100-form validate-form"> <span class="login100-form-title"> Member Login </span> <div class="wrap-input100 validate-input" data-validate="Valid email is required: ex@abc.xyz"> <input class="input100" type="text" name="email" placeholder="Email"> <span class="focus-input100"></span> <span class="symbol-input100"> <i class="fa fa-envelope" aria-hidden="true"></i> </span> </div> <div class="wrap-input100 validate-input" data-validate="Password is required"> <input class="input100" type="password" name="pass" placeholder="Password"> <span class="focus-input100"></span> <span class="symbol-input100"> <i class="fa fa-lock" aria-hidden="true"></i> </span> </div> <div class="container-login100-form-btn"> <button class="login100-form-btn"> Login </button> </div> <div class="text-center p-t-12"> <span class="txt1"> Forgot </span> <a class="txt2" href="#"> Username / Password? </a> </div> <div class="text-center p-t-136"> <a class="txt2" href="#"> Create your Account <i class="fa fa-long-arrow-right m-l-5" aria-hidden="true"></i> </a> </div> </form> </div> </div> </div> The code for this chapter is available on GitHub here: Next, open the Layout.cshtml file and add all the links to the folders and files we copied into the wwwroot folder earlier. Use the index.html file for reference. You will notice that the _Layout.cshtml file contains the following piece of code—@RenderBody(). This is a placeholder that specifies where the Index.cshtml file content should be injected. If you are coming from ASP.NET Web Forms, think of the _Layout.cshtml page as a master page. Your Layout.cshtml markup should look as follows: <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>@ViewData["Title"] - CoreMailValidation</title> <link rel="icon" type="image/png" href="~/images/icons/favicon.ico" /> <link rel="stylesheet" type="text/css" href="~/vendor/bootstrap/css/bootstrap.min.css"> <link rel="stylesheet" type="text/css" href="~/fonts/font-awesome-4.7.0/css/font-awesome.min.css"> <link rel="stylesheet" type="text/css" href="~/vendor/animate/animate.css"> <link rel="stylesheet" type="text/css" href="~/vendor/css-hamburgers/hamburgers.min.css"> <link rel="stylesheet" type="text/css" href="~/vendor/select2/select2.min.css"> <link rel="stylesheet" type="text/css" href="~/css/util.css"> <link rel="stylesheet" type="text/css" href="~/css/main.css"> </head> <body> <div class="container body-content"> @RenderBody() <hr /> <footer> <p>© 2018 - CoreMailValidation</p> </footer> </div> <script src="~/vendor/jquery/jquery-3.2.1.min.js"></script> <script src="~/vendor/bootstrap/js/popper.js"></script> <script src="~/vendor/bootstrap/js/bootstrap.min.js"></script> <script src="~/vendor/select2/select2.min.js"></script> <script src="~/vendor/tilt/tilt.jquery.min.js"></script> <script> $('.js-tilt').tilt({ scale: 1.1 }) </script> <script src="~/js/main.js"></script> @RenderSection("Scripts", required: false) </body> </html> If everything worked out right, you will see the following page when you run your ASP.NET Core MVC application. The login form is obviously totally non-functional: However, the login form is totally responsive. If you had to reduce the size of your browser window, you will see the form scale as your browser size reduces. This is what you want. If you want to explore the responsive design offered by Bootstrap, head on over to https://getbootstrap.com/ and go through the examples in the documentation: The next thing we want to do is hook this login form up to our controller and call the Azure Function we created to validate the email address we entered. Let's look at doing that next. Hooking it all up To simplify things, we will be creating a model to pass to our controller: Create a new class in the Models folder of your application called LoginModel and click on the Add button: 2. Your project should now look as follows. You will see the model added to the Models folder: The next thing we want to do is add some code to our model to represent the fields on our login form. Add two properties called Email and Password: namespace CoreMailValidation.Models { public class LoginModel { public string Email { get; set; } public string Password { get; set; } } } Back in the Index.cshtml view, add the model declaration to the top of the page. This makes the model available for use in our view. Take care to specify the correct namespace where the model exists: @model CoreMailValidation.Models.LoginModel @{ ViewData["Title"] = "Login Page"; } The next portion of code needs to be written in the HomeController.cs file. Currently, it should only have an action called Index(): public IActionResult Index() { return View(); } Add a new async function called ValidateEmail that will use the base URL and parameter string of the Azure Function URL we copied earlier and call it using an HTTP request. I will not go into much detail here, as I believe the code to be pretty straightforward. All we are doing is calling the Azure Function using the URL we copied earlier and reading the return data: private async Task<string> ValidateEmail(string emailToValidate) { string azureBaseUrl = "https://core-mail- validation.azurewebsites.net/api/HttpTriggerCSharp1"; string urlQueryStringParams = $"? code=/IS4OJ3T46quiRzUJTxaGFenTeIVXyyOdtBFGasW9dUZ0snmoQfWoQ ==&email={emailToValidate}"; using (HttpClient client = new HttpClient()) { using (HttpResponseMessage res = await client.GetAsync( $"{azureBaseUrl}{urlQueryStringParams}")) { using (HttpContent content = res.Content) { string data = await content.ReadAsStringAsync(); if (data != null) { return data; } else return ""; } } } } Create another public async action called ValidateLogin. Inside the action, check to see if the ModelState is valid before continuing. For a nice explanation of what ModelState is, have a look at the following article—https://www.exceptionnotfound.net/asp-net-mvc-demystified-modelstate/. We then do an await on the ValidateEmail function, and if the return data contains the word false, we know that the email validation failed. A failure message is then passed to the TempData property on the controller. The TempData property is a place to store data until it is read. It is exposed on the controller by ASP.NET Core MVC. The TempData property uses a cookie-based provider by default in ASP.NET Core 2.0 to store the data. To examine data inside the TempData property without deleting it, you can use the Keep and Peek methods. To read more on TempData, see the Microsoft documentation here: https://docs.microsoft.com/en-us/aspnet/core/fundamentals/app-state?tabs=aspnetcore2x. If the email validation passed, then we know that the email address is valid and we can do something else. Here, we are simply just saying that the user is logged in. In reality, we will perform some sort of authentication here and then route to the correct controller. So now you know how to call an Azure Function from an ASP.NET Core application. If you found this tutorial helpful and you'd like to learn more, go ahead and pick up the book C# 7 and .NET Core Blueprints. What is ASP.NET Core? Why ASP.NET makes building apps for mobile and web easy How to dockerize an ASP.NET Core application

0
1
40715

article-image-solve-web-dev-problem-functional-programming

Guest Contributor

01 May 2018

14 min read

Seven wrongs don’t make the one right: Solving a problem with Functional Javascript

Guest Contributor

01 May 2018

14 min read

0
0
16658

article-image-common-design-patterns-javascript

Richa Tripathi

01 May 2018

14 min read

Implementing 5 Common Design Patterns in JavaScript (ES8)

Richa Tripathi

01 May 2018

14 min read

0
0
37744

article-image-how-to-dockerize-asp-net-core-application

Aaron Lazar

27 Apr 2018

5 min read

How to dockerize an ASP.NET Core application

Aaron Lazar

27 Apr 2018

5 min read

0
0
53751

article-image-r6-classes-retrieve-live-data-markets-wallets

Pravin Dhandre

23 Apr 2018

11 min read

Using R6 classes in R to retrieve live data for markets and wallets

Pravin Dhandre

23 Apr 2018

11 min read

0
0
6401

article-image-chaos-engineering-managing-complexity-by-breaking-things

Richard Gall

20 Apr 2018

7 min read

Chaos Engineering: managing complexity by breaking things

Richard Gall

20 Apr 2018

7 min read

Chaos Engineering is based on a fundamental assertion about software infrastructure today: that it is inherently chaotic. Or, to be more specific, it is chaotic because it is complex. Whereas software infrastructure used to be centralized, owned and licensed by large enterprise vendors, today much of the software that comprises infrastructure is open source. This is where we get back to chaos - because software infrastructure is comprised of many different parts, the way these parts can be unpredictable. Chaos Engineering is an attempt to acknowledge that fact and develop software accordingly. Who invented Chaos Engineering? Chaos Engineering began at Netflix. That makes sense when you consider the complexity of the Netflix technology stack and the way the company have scaled over the last 5 years or so. It built a number of tools to help adopt this chaos-first approach, the most prominent being Chaos Monkey. First launched in 2011 and open-sourced in 2012, Chaos Monkey was a tool that randomly selects instances in production and pulls them down; a little bit like monkeys pulling off your windscreen wipers in a safari park. However, Chaos Monkey became part of a wider suite of tools - called the Simian Army - that were built by Netflix to cause chaos in different part of its infrastructure. Here are the other two components used to simulate chaos: Chaos Gorilla causes big trouble by pulling down an entire AWS availability zone Latency monkey delays communication, essentially simulating poor network performance From that point Chaos Engineering grew. A number of large Silicon Valley organizations have adopted a similar approaches. For example, Facebook's Project Storm simulates data center failures on a huge scale, while Uber uses a tool called uDestroy. Slack has recently spoken in detail on the importance of stress testing their software too; the company is looking to build an engineering team simply to perform Chaos Engineering and improve Slack's reliability. One of the most interesting figures in Chaos Engineering is a man called Kolton Andrus. Andrus used to work at Amazon and Google, but today he is the CEO and founder of Gremlin, a startup that "helps engineers build resilient systems". Essentially, Andrus helped to develop the concept of Chaos Engineering while he was working at Netflix. Gremlin is his vehicle that is making it accessible to others. Chaos Engineering in practice Now the conceptual stuff is out of the way, here's how chaos engineering works. It's actually quite straightforward: Chaos Engineering simulates all sorts of unpredictable situations and scenarios in order to see how the system responds. It's effectively a form of stress testing. As we've seen, over the past few years companies have built their own tools to allow them to stress test their infrastructure. But Gremlin is taking the approach of offering this as a service. It's product is described as 'resiliency-as-a-service.' Its' product is a whole library of 'attacks' which can replicate different types of outages within a system. These are what it calls 'chaos experiments' that allows you to 'identify weak points in your system and fix them before they become a problem'. In this sense, Chaos Engineering is a bit like using the principles of penetration testing an applying it to software testing more broadly. By simulating everything that could possibly go wrong it allows you to make much better optimization decisions. The principles of Chaos Engineering are documented here. This is effectively its 'manifesto'. There's a lot in there worth reading, but here are the 5 principles that any sort of testing or experimentation should aspire to: Base your testing hypothesis on steady state behavior. Consider your infrastructure holistically, making individual parts work is important but not the priority. Simulate a variety of real-world events. This could be hardware or software failures, or simply external changes like spikes in traffic. What's important is that they're all unpredictable. Test in production. Your tests should be authentic. Automate! Testing things could be laborious and require a lot of manual work. Make use of automation tools to do lots of different tests without taking up too much of your time. Don't cause unnecessary pain. While it's important that your stress-tests are authentic, the impact must be contained and minimized by the engineer. Why Chaos Engineering now? Chaos Engineering isn't particularly new. As you've seen, Netflix has been doing it since 2011. But it does feel more urgent and relevant today. That's because the complexity of the software infrastructure behind many of the biggest Silicon Valley companies is now mainstream. It's normal. Cloud isn't an exotic buzzword any more - it's a reality (a reality that often has failures). Microservices are common - they're a commonsense way of building better applications and websites. Alongside this increased complexity, there is also a growing awareness of how much software outages can cost businesses. In a white paper, Gremlin make a big deal out of how much money is lost due to outages. Gremlin cite BAs system failure in summer 2017, which led to passengers stranded all over the world. This outage was estimated to have cost BA $135 million. It also refers to the Amazon S3 outage in March 2017, which is believed to have cost Amazon's customers $150 million. So - outages cost money. Yes, it's marketing spiel from Gremlin, but it's also true. It doesn't take a genius to work out that if you're eCommerce site is down for an hour, you're going to have lost a lot of money. Because software performance is so tied up with business performance, it feels incredibly fragile. That's why Chaos Engineering is perhaps more important and popular than ever. It's a way of countering that fragility. The key challenges of Chaos Engineering Chaos Engineering poses many challenges to software engineering teams. First and foremost, it requires a big cultural change. If you're intent on breaking everything, there are no rules about how things should work or what you're trying to build. Instead you're looking for the best way to build software that performs for the user. More practically, Chaos Engineering isn't that easy to do in a cost-effective manner. Everything Gremlin details in its white paper is very much true - of course outages cost a hell of a lot. But creative destruction and experimentation feels like an expensive route through software projects. It's not hard to see how it might appear self-indulgent, especially to a company or organization where software isn't properly understood. And more to the point, how often do businesses actually do the smart thing when they're building software? Long term projects are always difficult. So much software evolves pragmatically - often for the worse. Adding in an extra layer of experimentation and detailed testing is a weird mix of bacchanalian and hyper-organized, something that many organizations just couldn't process or properly understand. Chaos engineering and the future of software development Chaos Engineering certainly looks like the future of software development. The only question is whether services like those provided by Gremlin will take off. To understand the true value of stress testing your infrastructure you do need at least a modicum awareness of the complexity of your infrastructure. Indeed, you probably need to have a conversation about what services and dependencies are most business critical. Or rather, which ones most impact the user. That's something this TechCrunch piece addresses: "Testing can... be very political. Finding the points of failure in a system might force deep conversations about a particular software architecture and its robustness in the face of tough situations. A particular company might be deeply invested in a specific technical roadmap (e.g. microservices) that chaos engineering tests show is not as resilient to failures as originally predicted." This means there is going to be a question mark over the extent to which Chaos Engineering ever really enters the mainstream. How many businesses want to have these conversations? It's not just about the inclination - it's also about the time and money. It's an innovative software engineering approach that really calls people's bluff when they talk about innovation. It asks difficult questions about how and why you innovate: do you do new things because you think you should? Is this new thing going to be good for the business? And how well will it work for users? Of course these questions are vital when you're building software. But they rarely make building software easier.

0
0
34553

article-image-boost-r-codes-using-c-fortran

Pravin Dhandre

18 Apr 2018

16 min read

How to boost R codes using C++ and Fortran

Pravin Dhandre

18 Apr 2018

16 min read

Sometimes, R code just isn't fast enough. Maybe you've used profiling to figure out where your bottlenecks are, and you've done everything you can think of within R, but your code still isn't fast enough. In those cases, a useful alternative can be to delegate some parts of the implementation to Fortran or C++. This is an advanced technique that can often prove to be quite useful if know how to program in such languages. In today’s tutorial, we will try to explore techniques to boost R codes and calculations using efficient languages like Fortran and C++. Delegating code to other languages can address bottlenecks such as the following: Loops that can't be easily vectorized due to iteration dependencies Processes that involve calling functions millions of times Inefficient but necessary data structures that are slow in R Delegating code to other languages can provide great performance benefits, but it also incurs the cost of being more explicit and careful with the types of objects that are being moved around. In R, you can get away with simple things such as being imprecise about a number being an integer or a real. In these other languages, you can't; every object must have a precise type, and it remains fixed for the entire execution. Boost R codes using an old-school approach with Fortran We will start with an old-school approach using Fortran first. If you are not familiar with it, Fortran is the oldest programming language still under use today. It was designed to perform lots of calculations very efficiently and with very few resources. There are a lot of numerical libraries developed with it, and many high-performance systems nowadays still use it, either directly or indirectly. Here's our implementation, named sma_fortran(). The syntax may throw you off if you're not used to working with Fortran code, but it's simple enough to understand. First, note that to define a function technically known as a subroutine in Fortran, we use the subroutine keyword before the name of the function. As our previous implementations do, it receives the period and data (we use the dataa name with an extra a at the end because Fortran has a reserved keyword data, which we shouldn't use in this case), and we will assume that the data is already filtered for the correct symbol at this point. Next, note that we are sending new arguments that we did not send before, namely smas and n. Fortran is a peculiar language in the sense that it does not return values, it uses side effects instead. This means that instead of expecting something back from a call to a Fortran subroutine, we should expect that subroutine to change one of the objects that was passed to it, and we should treat that as our return value. In this case, smas fulfills that role; initially, it will be sent as an array of undefined real values, and the objective is to modify its contents with the appropriate SMA values. Finally, the n represents the number of elements in the data we send. Classic Fortran doesn't have a way to determine the size of an array being passed to it, and it needs us to specify the size manually; that's why we need to send n. In reality, there are ways to work around this, but since this is not a tutorial about Fortran, we will keep the code as simple as possible. Next, note that we need to declare the type of objects we're dealing with as well as their size in case they are arrays. We proceed to declare pos (which takes the place of position in our previous implementation, because Fortran imposes a limit on the length of each line, which we don't want to violate), n, endd (again, end is a keyword in Fortran, so we use the name endd instead), and period as integers. We also declare dataa(n), smas(n), and sma as reals because they will contain decimal parts. Note that we specify the size of the array with the (n) part in the first two objects. Once we have declared everything we will use, we proceed with our logic. We first create a for loop, which is done with the do keyword in Fortran, followed by a unique identifier (which are normally named with multiples of tens or hundreds), the variable name that will be used to iterate, and the values that it will take, endd and 1 to n in this case, respectively. Within the for loop, we assign pos to be equal to endd and sma to be equal to 0, just as we did in some of our previous implementations. Next, we create a while loop with the do…while keyword combination, and we provide the condition that should be checked to decide when to break out of it. Note that Fortran uses a very different syntax for the comparison operators. Specifically, the .lt. operator stand for less-than, while the .ge. operator stands for greater-than-or-equal-to. If any of the two conditions specified is not met, then we will exit the while loop. Having said that, the rest of the code should be self-explanatory. The only other uncommon syntax property is that the code is indented to the sixth position. This indentation has meaning within Fortran, and it should be kept as it is. Also, the number IDs provided in the first columns in the code should match the corresponding looping mechanisms, and they should be kept toward the left of the logic-code. For a good introduction to Fortran, you may take a look at Stanford's Fortran 77 Tutorial (https://web.stanford.edu/class/me200c/tutorial_77/). You should know that there are various Fortran versions, and the 77 version is one of the oldest ones. However, it's also one of the better supported ones: subroutine sma_fortran(period, dataa, smas, n) integer pos, n, endd, period real dataa(n), smas(n), sma do 10 endd = 1, n pos = endd sma = 0.0 do 20 while ((endd - pos .lt. period) .and. (pos .ge. 1)) sma = sma + dataa(pos) pos = pos - 1 20 end do if (endd - pos .eq. period) then sma = sma / period else sma = 0 end if smas(endd) = sma 10 continue end Once your code is finished, you need to compile it before it can be executed within R. Compilation is the process of translating code into machine-level instructions. You have two options when compiling Fortran code: you can either do it manually outside of R or you can do it within R. The second one is recommended since you can take advantage of R's tools for doing so. However, we show both of them. The first one can be achieved with the following code: $ gfortran -c sma-delegated-fortran.f -o sma-delegated-fortran.so This code should be executed in a Bash terminal (which can be found in Linux or Mac operating systems). We must ensure that we have the gfortran compiler installed, which was probably installed when R was. Then, we call it, telling it to compile (using the -c option) the sma-delegated-fortran.f file (which contains the Fortran code we showed before) and provide an output file (with the -o option) named sma-delegatedfortran.so. Our objective is to get this .so file, which is what we need within R to execute the Fortran code. The way to compile within R, which is the recommended way, is to use the following line: system("R CMD SHLIB sma-delegated-fortran.f") It basically tells R to execute the command that produces a shared library derived from the sma-delegated-fortran.f file. Note that the system() function simply sends the string it receives to a terminal in the operating system, which means that you could have used that same command in the Bash terminal used to compile the code manually. To load the shared library into R's memory, we use the dyn.load() function, providing the location of the .so file we want to use, and to actually call the shared library that contains the Fortran implementation, we use the .Fortran() function. This function requires type checking and coercion to be explicitly performed by the user before calling it. To provide a similar signature as the one provided by the previous functions, we will create a function named sma_delegated_fortran(), which receives the period, symbol, and data parameters as we did before, also filters the data as we did earlier, calculates the length of the data and puts it in n, and uses the .Fortran() function to call the sma_fortran() subroutine, providing the appropriate parameters. Note that we're wrapping the parameters around functions that coerce the types of these objects as required by our Fortran code. The results list created by the .Fortran() function contains the period, dataa, smas, and n objects, corresponding to the parameters sent to the subroutine, with the contents left in them after the subroutine was executed. As we mentioned earlier, we are interested in the contents of the sma object since they contain the values we're looking for. That's why we send only that part back after converting it to a numeric type within R. The transformations you see before sending objects to Fortran and after getting them back is something that you need to be very careful with. For example, if instead of using single(n) and as.single(data), we use double(n) and as.double(data), our Fortran implementation will not work. This is something that can be ignored within R, but it can't be ignored in the case of Fortran: system("R CMD SHLIB sma-delegated-fortran.f") dyn.load("sma-delegated-fortran.so") sma_delegated_fortran <- function(period, symbol, data) { data <- data[which(data$symbol == symbol), "price_usd"] n <- length(data) results <- .Fortran( "sma_fortran", period = as.integer(period), dataa = as.single(data), smas = single(n), n = as.integer(n) ) return(as.numeric(results$smas)) } Just as we did earlier, we benchmark and test for correctness: performance <- microbenchmark( sma_12 <- sma_delegated_fortran(period, symboo, data), unit = "us" ) all(sma_1$sma - sma_12 <= 0.001, na.rm = TRUE) #> TRUE summary(performance)$me In this case, our median time is of 148.0335 microseconds, making this the fastest implementation up to this point. Note that it's barely over half of the time from the most efficient implementation we were able to come up with using only R. Take a look at the following table: Boost R codes using a modern approach with C++ Now, we will show you how to use a more modern approach using C++. The aim of this section is to provide just enough information for you to start experimenting using C++ within R on your own. We will only look at a tiny piece of what can be done by interfacing R with C++ through the Rcpp package (which is installed by default in R), but it should be enough to get you started. If you have never heard of C++, it's a language used mostly when resource restrictions play an important role and performance optimization is of paramount importance. Some good resources to learn more about C++ are Meyer's books on the topic, a popular one being Effective C++ (Addison-Wesley, 2005), and specifically for the Rcpp package, Eddelbuettel's Seamless R and C++ integration with Rcpp by Springer, 2013, is great. Before we continue, you need to ensure that you have a C++ compiler in your system. On Linux, you should be able to use gcc. On Mac, you should install Xcode from the application store. O n Windows, you should install Rtools. Once you test your compiler and know that it's working, you should be able to follow this section. We'll cover more on how to do this in Appendix, Required Packages. C++ is more readable than Fortran code because it follows more syntax conventions we're used to nowadays. However, just because the example we will use is readable, don't think that C++ in general is an easy language to use; it's not. It's a very low-level language and using it correctly requires a good amount of knowledge. Having said that, let's begin. The #include line is used to bring variable and function definitions from R into this file when it's compiled. Literally, the contents of the Rcpp.h file are pasted right where the include statement is. Files ending with the .h extensions are called header files, and they are used to provide some common definitions between a code's user and its developers. The using namespace Rcpp line allows you to use shorter names for your function. Instead of having to specify Rcpp::NumericVector, we can simply use NumericVector to define the type of the data object. Doing so in this example may not be too beneficial, but when you start developing for complex C++ code, it will really come in handy. Next, you will notice the // [[Rcpp::export(sma_delegated_cpp)]] code. This is a tag that marks the function right below it so that R know that it should import it and make it available within R code. The argument sent to export() is the name of the function that will be accessible within R, and it does not necessarily have to match the name of the function in C++. In this case, sma_delegated_cpp() will be the function we call within R, and it will call the smaDelegated() function within C++: #include using namespace Rcpp; // [[Rcpp::export(sma_delegated_cpp)]] NumericVector smaDelegated(int period, NumericVector data) { int position, n = data.size(); NumericVector result(n); double sma; for (int end = 0; end < n; end++) { position = end; sma = 0; while(end - position < period && position >= 0) { sma = sma + data[position]; position = position - 1; } if (end - position == period) { sma = sma / period; } else { sma = NA_REAL; } result[end] = sma; } return result; } Next, we will explain the actual smaDelegated() function. Since you have a good idea of what it's doing at this point, we won't explain its logic, only the syntax that is not so obvious. The first thing to note is that the function name has a keyword before it, which is the type of the return value for the function. In this case, it's NumericVector, which is provided in the Rcpp.h file. This is an object designed to interface vectors between R and C++. Other types of vector provided by Rcpp are IntegerVector, LogicalVector, and CharacterVector. You also have IntegerMatrix, NumericMatrix, LogicalMatrix, and CharacterMatrix available. Next, you should note that the parameters received by the function also have types associated with them. Specifically, period is an integer (int), and data is NumericVector, just like the output of the function. In this case, we did not have to pass the output or length objects as we did with Fortran. Since functions in C++ do have output values, it also has an easy enough way of computing the length of objects. The first line in the function declare a variables position and n, and assigns the length of the data to the latter one. You may use commas, as we do, to declare various objects of the same type one after another instead of splitting the declarations and assignments into its own lines. We also declare the vector result with length n; note that this notation is similar to Fortran's. Finally, instead of using the real keyword as we do in Fortran, we use the float or double keyword here to denote such numbers. Technically, there's a difference regarding the precision allowed by such keywords, and they are not interchangeable, but we won't worry about that here. The rest of the function should be clear, except for maybe the sma = NA_REAL assignment. This NA_REAL object is also provided by Rcpp as a way to denote what should be sent to R as an NA. Everything else should result familiar. Now that our function is ready, we save it in a file called sma-delegated-cpp.cpp and use R's sourceCpp() function to bring compile it for us and bring it into R. The .cpp extension denotes contents written in the C++ language. Keep in mind that functions brought into R from C++ files cannot be saved in a .Rdata file for a later session. The nature of C++ is to be very dependent on the hardware under which it's compiled, and doing so will probably produce various errors for you. Every time you want to use a C++ function, you should compile it and load it with the sourceCpp() function at the moment of usage. library(Rcpp) sourceCpp("./sma-delegated-cpp.cpp") sma_delegated_cpp <- function(period, symbol, data) { data <- as.numeric(data[which(data$symbol == symbol), "price_usd"]) return(sma_cpp(period, data)) } If everything worked fine, our function should be usable within R, so we benchmark and test for correctness. I promise this is the last one: performance <- microbenchmark( sma_13 <- sma_delegated_cpp(period, symboo, data), unit = "us" ) all(sma_1$sma - sma_13 <= 0.001, na.rm = TRUE) #> TRUE summary(performance)$median #> [1] 80.6415 This time, our median time was 80.6415 microseconds, which is three orders of magnitude faster than our first implementation. Think about it this way: if you provide an input for sma_delegated_cpp() so that it took around one hour for it to execute, sma_slow_1() would take around 1,000 hours, which is roughly 41 days. Isn't that a surprising difference? When you are in situations that take that much execution time, it's definitely worth it to try and make your implementations as optimized as possible. You may use the cppFunction() function to write your C++ code directly inside an .R file, but you should not do so. Keep that just for testing small pieces of code. Separating your C++ implementation into its own files allows you to use the power of your editor of choice (or IDE) to guide you through the development as well as perform deeper syntax checks for you. You read an excerpt from R Programming By Example authored by Omar Trejo Navarro. This book provides step-by-step guide to build simple-to-advanced applications through examples in R using modern tools. Getting Inside a C++ Multithreaded Application Understanding the Dependencies of a C++ Application

0
0
15700

article-image-cross-validation-r-predictive-models

Pravin Dhandre

17 Apr 2018

8 min read

Cross-validation in R for predictive models

Pravin Dhandre

17 Apr 2018

8 min read

In today’s tutorial, we will efficiently train our first predictive model, we will use Cross-validation in R as the basis of our modeling process. We will build the corresponding confusion matrix. Most of the functionality comes from the excellent caret package. You can find more information on the vast features of caret package that we will not explore in this tutorial. Before moving to the training tutorial, lets understand what a confusion matrix is. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which your classification model is confused when it makes predictions. It gives you insight not only into the errors being made by your classifier but more importantly the types of errors that are being made. Training our first predictive model Following best practices, we will use Cross Validation (CV) as the basis of our modeling process. Using CV we can create estimates of how well our model will do with unseen data. CV is powerful, but the downside is that it requires more processing and therefore more time. If you can take the computational complexity, you should definitely take advantage of it in your projects. Going into the mathematics behind CV is outside of the scope of this tutorial. If interested, you can find out more information on cross validation on Wikipedia . The basic idea is that the training data will be split into various parts, and each of these parts will be taken out of the rest of the training data one at a time, keeping all remaining parts together. The parts that are kept together will be used to train the model, while the part that was taken out will be used for testing, and this will be repeated by rotating the parts such that every part is taken out once. This allows you to test the training procedure more thoroughly, before doing the final testing with the testing data. We use the trainControl() function to set our repeated CV mechanism with five splits and two repeats. This object will be passed to our predictive models, created with the caret package, to automatically apply this control mechanism within them: cv.control <- trainControl(method = "repeatedcv", number = 5, repeats = 2) Our predictive models pick for this example are Random Forests (RF). We will very briefly explain what RF are, but the interested reader is encouraged to look into James, Witten, Hastie, and Tibshirani's excellent "Statistical Learning" (Springer, 2013). RF are a non-linear model used to generate predictions. A tree is a structure that provides a clear path from inputs to specific outputs through a branching model. In predictive modeling they are used to find limited input-space areas that perform well when providing predictions. RF create many such trees and use a mechanism to aggregate the predictions provided by this trees into a single prediction. They are a very powerful and popular Machine Learning model. Let's have a look at the random forests example: Random forests aggregate trees To train our model, we use the train() function passing a formula that signals R to use MULT_PURCHASES as the dependent variable and everything else (~ .) as the independent variables, which are the token frequencies. It also specifies the data, the method ("rf" stands for random forests), the control mechanism we just created, and the number of tuning scenarios to use: model.1 <- train( MULT_PURCHASES ~ ., data = train.dfm.df, method = "rf", trControl = cv.control, tuneLength = 5 ) Improving speed with parallelization If you actually executed the previous code in your computer before reading this, you may have found that it took a long time to finish (8.41 minutes in our case). As we mentioned earlier, text analysis suffers from very high dimensional structures which take a long time to process. Furthermore, using CV runs will take a long time to run. To cut down on the total execution time, use the doParallel package to allow for multi-core computers to do the training in parallel and substantially cut down on time. We proceed to create the train_model() function, which takes the data and the control mechanism as parameters. It then makes a cluster object with the makeCluster() function with a number of available cores (processors) equal to the number of cores in the computer, detected with the detectCores() function. Note that if you're planning on using your computer to do other tasks while you train your models, you should leave one or two cores free to avoid choking your system (you can then use makeCluster(detectCores() -2) to accomplish this). After that, we start our time measuring mechanism, train our model, print the total time, stop the cluster, and return the resulting model. train_model <- function(data, cv.control) { cluster <- makeCluster(detectCores()) registerDoParallel(cluster) start.time <- Sys.time() model <- train( MULT_PURCHASES ~ ., data = data, method = "rf", trControl = cv.control, tuneLength = 5 ) print(Sys.time() - start.time) stopCluster(cluster) return(model) } Now we can retrain the same model much faster. The time reduction will depend on your computer's available resources. In the case of an 8-core system with 32 GB of memory available, the total time was 3.34 minutes instead of the previous 8.41 minutes, which implies that with parallelization, it only took 39% of the original time. Not bad right? Let's have look at how the model is trained: model.1 <- train_model(train.dfm.df, cv.control) Computing predictive accuracy and confusion matrices Now that we have our trained model, we can see its results and ask it to compute some predictive accuracy metrics. We start by simply printing the object we get back from the train() function. As can be seen, we have some useful metadata, but what we are concerned with right now is the predictive accuracy, shown in the Accuracy column. From the five values we told the function to use as testing scenarios, the best model was reached when we used 356 out of the 2,007 available features (tokens). In that case, our predictive accuracy was 65.36%. If we take into account the fact that the proportions in our data were around 63% of cases with multiple purchases, we have made an improvement. This can be seen by the fact that if we just guessed the class with the most observations (MULT_PURCHASES being true) for all the observations, we would only have a 63% accuracy, but using our model we were able to improve toward 65%. This is a 3% improvement. Keep in mind that this is a randomized process, and the results will be different every time you train these models. That's why we want a repeated CV as well as various testing scenarios to make sure that our results are robust: model.1 #> Random Forest #> #> 212 samples #> 2007 predictors #> 2 classes: 'FALSE', 'TRUE' #> #> No pre-processing #> Resampling: Cross-Validated (5 fold, repeated 2 times) #> Summary of sample sizes: 170, 169, 170, 169, 170, 169, ... #> Resampling results across tuning parameters: #> #> mtry Accuracy Kappa #> 2 0.6368771 0.00000000 #> 11 0.6439092 0.03436849 #> 63 0.6462901 0.07827322 #> 356 0.6536545 0.16160573 #> 2006 0.6512735 0.16892126 #> #> Accuracy was used to select the optimal model using the largest value. #> The final value used for the model was mtry = 356. To create a confusion matrix, we can use the confusionMatrix() function and send it the model's predictions first and the real values second. This will not only create the confusion matrix for us, but also compute some useful metrics such as sensitivity and specificity. We won't go deep into what these metrics mean or how to interpret them since that's outside the scope of this tutorial, but we highly encourage the reader to study them using the resources cited in this tutorial: confusionMatrix(model.1$finalModel$predicted, train$MULT_PURCHASES) #> Confusion Matrix and Statistics #> #> Reference #> Prediction FALSE TRUE #> FALSE 18 19 #> TRUE 59 116 #> #> Accuracy : 0.6321 #> 95% CI : (0.5633, 0.6971) #> No Information Rate : 0.6368 #> P-Value [Acc > NIR] : 0.5872 #> #> Kappa : 0.1047 #> Mcnemar's Test P-Value : 1.006e-05 #> #> Sensitivity : 0.23377 #> Specificity : 0.85926 #> Pos Pred Value : 0.48649 #> Neg Pred Value : 0.66286 #> Prevalence : 0.36321 #> Detection Rate : 0.08491 #> Detection Prevalence : 0.17453 #> Balanced Accuracy : 0.54651 #> #> 'Positive' Class : FALSE You read an excerpt from R Programming By Example authored by Omar Trejo Navarro. This book gets you familiar with R’s fundamentals and its advanced features to get you hands-on experience with R’s cutting edge tools for software development. Getting Started with Predictive Analytics Here’s how you can handle the bias variance trade-off in your ML models

0
0
22003

article-image-26-new-java-9-enhancements-you-will-love

Aarthi Kumaraswamy

09 Apr 2018

11 min read

26 new Java 9 enhancements you will love

Aarthi Kumaraswamy

09 Apr 2018

11 min read

Java 9 represents a major release and consists of a large number of internal changes to the Java platform. Collectively, these internal changes represent a tremendous set of new possibilities for Java developers, some stemming from developer requests, others from Oracle-inspired enhancements. In this post, we will review 26 of the most important changes. Each change is related to a JDK Enhancement Proposal (JEP). JEPs are indexed and housed at openjdk.java.net/jeps/0. You can visit this site for additional information on each JEP. [box type="note" align="" class="" width=""]The JEP program is part of Oracle's support for open source, open innovation, and open standards. While other open source Java projects can be found, OpenJDK is the only one supported by Oracle. [/box] These changes have several impressive implications, including: Heap space efficiencies Memory allocation Compilation process improvements Type testing Annotations Automated runtime compiler tests Improved garbage collection 26 Java 9 enhancements you should know Improved Contended Locking [JEP 143] The general goal of JEP 143 was to increase the overall performance of how the JVM manages contention over locked Java object monitors. The improvements to contended locking were all internal to the JVM and do not require any developer actions to benefit from them. The overall improvement goals were related to faster operations. These include faster monitor enter, faster monitor exit, and faster notifications. 2. Segmented code cache [JEP 197] The segmented code cache JEP (197) upgrade was completed and results in faster, more efficient execution time. At the core of this change was the segmentation of the code cache into three distinct segments--non-method, profiled, and non-profiled code. 3. Smart Java compilation, phase two [JEP 199] The JDK Enhancement Proposal 199 is aimed at improving the code compilation process. All Java developers will be familiar with the javac tool for compiling source code to bytecode, which is used by the JVM to run Java programs. Smart Java Compilation, also referred to as Smart Javac and sjavac, adds a smart wrapper around the javac process. Perhaps the core improvement sjavac adds is that only the necessary code is recompiled. [box type="shadow" align="" class="" width=""]Check out this tutorial to know how you can recognize patterns with neural networks in Java.[/box] 4. Resolving Lint and Doclint warnings [JEP 212] Both Lint and Doclint report errors and warnings during the compile process. Resolution of these warnings was the focus of JEP 212. When using core libraries, there should not be any warnings. This mindset led to JEP 212, which has been resolved and implemented in Java 9. 5. Tiered attribution for javac [JEP 215] JEP 215 represents an impressive undertaking to streamline javac's type checking schema. In Java 8, type checking of poly expressions is handled by a speculative attribution tool. The goal with JEP 215 was to change the type checking schema to create faster results. The new approach, released with Java 9, uses a tiered attribution tool. This tool implements a tiered approach for type checking argument expressions for all method calls. Permissions are also made for method overriding. 6. Annotations pipeline 2.0 [JEP 217] Java 8 related changes impacted Java annotations but did not usher in a change to how javac processed them. There were some hardcoded solutions that allowed javac to handle the new annotations, but they were not efficient. Moreover, this type of coding (hardcoding workarounds) is difficult to maintain. So, JEP 217 focused on refactoring the javac annotation pipeline. This refactoring was all internal to javac, so it should not be evident to developers. 7. New version-string scheme [JEP 223] Prior to Java 9, the release numbers did not follow industry standard versioning--semantic versioning. Oracle has embraced semantic versioning for Java 9 and beyond. For Java, a major-minor-security schema will be used for the first three elements of Java version numbers: Major: A major release consisting of a significant new set of features Minor: Revisions and bug fixes that are backward compatible Security: Fixes deemed critical to improving security 8. Generating run-time compiler tests automatically [JEP 233] The purpose of JEP 233 was to create a tool that could automate the runtime compiler tests. The tool that was created starts by generating a random set of Java source code and/or byte code. The generated code will have three key characteristics: Be syntactically correct Be semantically correct Use a random seed that permits reusing the same randomly-generated code 9. Testing class-file attributes generated by Javac [JEP 235] Prior to Java 9, there was no method of testing a class-file's attributes. Running a class and testing the code for anticipated or expected results was the most commonly used method of testing javac generated class-files. This technique falls short of testing to validate the file's attributes. The lack of, or insufficient, capability to create tests for class-file attributes was the impetus behind JEP 235. The goal is to ensure javac creates a class-file's attributes completely and correctly. 10. Storing interned strings in CDS archives [JEP 250] CDS archives now allocate specific space on the heap for strings: The string space is mapped using a shared-string table, hash tables, and deduplication. 11. Preparing JavaFX UI controls and CSS APIs for modularization [JEP 253] Prior to Java 9, JavaFX controls as well as CSS functionality were only available to developers by interfacing with internal APIs. Java 9's modularization has made the internal APIs inaccessible. Therefore, JEP 253 was created to define public, instead of internal, APIs. This was a larger undertaking than it might seem. Here are a few actions that were taken as part of this JEP: Moving javaFX control skins from the internal to public API (javafx.scene.skin) Ensuring API consistencies Generation of a thorough javadoc 12. Compact strings [JEP 254] The string data type is an important part of nearly every Java app. While JEP 254's aim was to make strings more space-efficient, it was approached with caution so that existing performance and compatibilities would not be negatively impacted. Starting with Java 9, strings are now internally represented using a byte array along with a flag field for encoding references. 13. Merging selected Xerces 2.11.0 updates into JAXP [JEP 255] Xerces is a library used for parsing XML in Java. It was updated to 2.11.0 in late 2010, so JEP 255's aim was to update JAXP to incorporate changes in Xerces 2.11.0. 14. Updating JavaFX/Media to a newer version of GStreamer [JEP 257] The purpose of JEP 257 was to ensure JavaFX/Media was updated to include the latest release of GStreamer for stability, performance, and security assurances. GStreamer is a multimedia processing framework that can be used to build systems that take in media from several different formats and, after processing, export them in selected formats. 15. HarfBuzz Font-Layout Engine [JEP 258] Prior to Java 9, the layout engine used to handle font complexities; specifically, fonts that have rendering behaviors beyond what the common Latin fonts have. Java used the uniform client interface, also referred to as ICU, as the defacto text rendering tool. The ICU layout engine has been depreciated and, in Java 9, has been replaced with the HarfBuzz font layout engine. HarfBuzz is an OpenType text rendering engine. This type of layout engine has the characteristic of providing script-aware code to help ensure text is laid out as desired. 16. HiDPI graphics on Windows and Linux [JEP 263] JEP 263 was focused on ensuring the crispness of on-screen components, relative to the pixel density of the display. The following terms are relevant to this JEP and are provided along with the below listed descriptive information: DPI-aware application: An application that is able to detect and scale images for the display's specific pixel density DPI-unaware application: An application that makes no attempt to detect and scale images for the display's specific pixel density HiDPI graphics: High dots-per-inch graphics Retina display: This term was created by Apple to refer to displays with a pixel density of at least 300 pixels per inch Prior to Java 9, automatic scaling and sizing were already implemented in Java for the Mac OS X operating system. This capability was added in Java 9 for Windows and Linux operating systems. 17. Marlin graphics renderer [JEP 265] JEP 265 replaced the Pisces graphics rasterizer with the Marlin graphics renderer in the Java 2D API. This API is used to draw 2D graphics and animations. The goal was to replace Pisces with a rasterizer/renderer that was much more efficient and without any quality loss. This goal was realized in Java 9. An intended collateral benefit was to include a developer-accessible API. Previously, the means of interfacing with the AWT and Java 2D was internal. 18. Unicode 8.0.0 [JEP 267] Unicode 8.0.0 was released on June 17, 2015. JEP 267 focused on updating the relevant APIs to support Unicode 8.0.0. In order to fully comply with the new Unicode standard, several Java classes were updated. The following listed classes were updated for Java 9 to comply with the new Unicode standard: java.awt.font.NumericShaper java.lang.Character java.lang.String java.text.Bidi java.text.BreakIterator java.text.Normalizer 19. Reserved stack areas for critical sections [JEP 270] The goal of JEP 270 was to mitigate problems stemming from stack overflows during the execution of critical sections. This mitigation took the form of reserving additional thread stack space. [box type="shadow" align="" class="" width=""]Are you looking out for running parallel data operations using Java streams, check out this post for more details.[/box] 20. Dynamic linking of language-defined object models [JEP 276] Java interoperability was enhanced with JEP 276. The necessary JDK changes were made to permit runtime linkers from multiple languages to coexist in a single JVM instance. This change applies to high-level operations, as you would expect. An example of a relevant high-level operation is the reading or writing of a property with elements such as accessors and mutators. The high-level operations apply to objects of unknown types. They can be invoked with INVOKEDYNAMIC instructions. Here is an example of calling an object's property when the object's type is unknown at compile time: INVOKEDYNAMIC "dyn:getProp:age" 21. Additional tests for humongous objects in G1 [JEP 278] One of the long-favored features of the Java platform is the behind the scenes garbage collection. JEP 278's focus was to create additional WhiteBox tests for humongous objects as a feature of the G1 garbage collector. 22. Improving test-failure troubleshooting [JEP 279] For developers that do a lot of testing, JEP 279 is worth reading about. Additional functionality has been added in Java 9 to automatically collect information to support troubleshooting test failures as well as timeouts. Collecting readily available diagnostic information during tests stands to provide developers and engineers with greater fidelity in their logs and other output. 23. Optimizing string concatenation [JEP 280] JEP 280 is an interesting enhancement for the Java platform. Prior to Java 9, string concatenation was translated by javac into StringBuilder : : append chains. This was a sub-optimal translation methodology often requiring StringBuilder presizing. The enhancement changed the string concatenation bytecode sequence, generated by javac, so that it uses INVOKEDYNAMIC calls. The purpose of the enhancement was to increase optimization and to support future optimizations without the need to reformat the javac's bytecode. 24. HotSpot C++ unit-test framework [JEP 281] HotSpot is the name of the JVM. This Java enhancement was intended to support the development of C++ unit tests for the JVM. Here is a partial, non-prioritized, list of goals for this enhancement: Command-line testing Create appropriate documentation Debug compile targets Framework elasticity IDE support Individual and isolated unit testing Individualized test results Integrate with existing infrastructure Internal test support Positive and negative testing Short execution time testing Support all JDK 9 build platforms Test compile targets Test exclusion Test grouping Testing that requires the JVM to be initialized Tests co-located with source code Tests for platform-dependent code Write and execute unit testing (for classes and methods) This enhancement is evidence of the increasing extensibility. 25. Enabling GTK 3 on Linux [JEP 283] GTK+, formally known as the GIMP toolbox, is a cross-platform tool used for creating Graphical User Interfaces (GUI). The tool consists of widgets accessible through its API. JEP 283's focus was to ensure GTK 2 and GTK 3 were supported on Linux when developing Java applications with graphical components. The implementation supports Java apps that employ JavaFX, AWT, and Swing. 26. New HotSpot build system [JEP 284] The Java platform used, prior to Java 9, was a build system riddled with duplicate code, redundancies, and other inefficiencies. The build system has been reworked for Java 9 based on the build-infra framework. In this context, infra is short for infrastructure. The overarching goal for JEP 284 was to upgrade the build system to one that was simplified. Specific goals included: Leverage existing build system Maintainable code Minimize duplicate code Simplification Support future enhancements Summary We explored some impressive new features of the Java platform, with a specific focus on javac, JDK libraries, and various test suites. Memory management improvements, including heap space efficiencies, memory allocation, and improved garbage collection represent a powerful new set of Java platform enhancements. Changes regarding the compilation process resulting in greater efficiencies were part of this discussion We also covered important improvements, such as with the compilation process, type testing, annotations, and automated runtime compiler tests. You just enjoyed an excerpt from the book, Mastering Java 9 written by By Dr. Edward Lavieri and Peter Verhas.

0
0
23716

Packt Editorial Staff

03 Apr 2018

18 min read

What is domain driven design?

Packt Editorial Staff

03 Apr 2018

18 min read

Domain driven design exists because all software exists for a purpose. It does something. For example, you can't provide a software solution for a financial system such as online stock trading if you don't understand the stock exchanges and their functioning. Having domain knowledge is essential to solving problems with software. Domain driven design is simply designing software with the specific domain - whether that's finance, medicine, law, eCommerce - in mind. This has been taken from Mastering Microservices with Java 9 - Second Edition. Central to Domain Driven Design is the concept of a model. A model is an abstraction, or a blueprint, of the domain. Domain driven design is a collaborative activity Designing this model is not rocket science, but it does take a lot of effort, refining, and input from domain experts. It is the collective job of software designers, domain experts, and developers. They organize information, divide it into smaller parts, group them logically, and create modules. Each module can be taken up individually, and can be divided using a similar approach. This process can be followed until we reach the unit level, or when we cannot divide it any further. A complex project may have more of such iterations; similarly, a simple project could have just a single iteration of it. Once a model is defined and well documented, it can move onto the next stage - code design. So, here we have a software design—a domain model and code design, and code implementation of the domain model. The domain model provides a high level of the architecture of a solution (software/application), and the code implementation gives the domain model a life, as a working model. Domain Driven Design makes design and development work together. It provides the ability to develop software continuously, while keeping the design up to date based on feedback received from the development. It solves one of the limitations offered by Agile and Waterfall methodologies, making software maintainable, including design and code, as well as keeping application minimum viable. It gives developers the right platform to understand the domain, and provides the opportunity to share early feedback of the domain model implementation. It removes the bottleneck that appears in later stages when stockholders wait for deliverables. The fundamental components of Domain Driven Design To understand domain driven design, you can break it down into 3 fundamental concepts: Ubiquitous language and unified model language (UML) Multilayer architecture Artifacts (components) Ubiquitous language Ubiquitous language is a common language to communicate within a project. It's because designing a model is a collaborative effort of software designers, domain experts, and developers that it requires a common language to communicate with. It removes misunderstandings, misinterpretations. Communication gaps so often lead to bad software - ubiquitous language minimizes these gaps. It does, however, need to be used everywhere on a project. Unified Modeling Language (UML) is widely used and very popular when creating models. It also has a few limitations; for example, when you have thousands of classes drawn from a paper, it's difficult to represent class relationships and simultaneously understand their abstraction while taking a meaning from it. Also, UML diagrams do not represent the concepts of a model and what objects are supposed to do. Therefore, UML should always be used with other documents, code, or any other reference for effective communication. Multilayered architecture Multilayered architecture is a common solution for Domain Driven Design. It contains four layers: Presentation layer or (UI) Application layer - responsible for application logic. It maintains and coordinates the overall flow of the product/service. It does not contain business logic or UI. It may hold the state of application objects, like tasks in progress. Domain layer - contains the domain information and business logic. It holds the state of the business object. Infrastructure layer - provides support to all the other layers and is responsible for communication between them. To understand the interaction of the different layers, take the example of table booking at a restaurant. The end user places a request for a table booking using UI. The UI passes the request to the application layer. The application layer fetches the domain objects, such as the restaurant, the table, a date, and so on, from the domain layer. The domain layer fetches these existing persisted objects from the infrastructure, and invokes relevant methods to make the booking and persist them back to the infrastructure layer. Once domain objects are persisted, the application layer shows the booking confirmation to the end user. Artifacts used in Domain Driven Design There are seven different artifacts used in Domain Driven Design to express, create, and retrieve domain models: Entities Value objects Services Aggregates Repository Factory Module Entities are certain types of objects that are identifiable and remain the same throughout the states of the products/services. These objects are not identified by their attributes, but by their identity and thread of continuity. These type of objects are known as entities. It sounds pretty simple, but it carries complexity. You need to understand how we can define the entities. Let's take an example of a table booking system, where we have a restaurant class with attributes such as restaurant name, address, phone number, establishment data, and so on. We can take two instances of the restaurant class that are not identifiable using the restaurant name, as there could be other restaurants with the same name. Similarly, if we go by any other single attribute, we will not find any attributes that can singularly identify a unique restaurant. If two restaurants have all the same attribute values, they are therefore the same and are interchangeable with each other. Still, they are not the same entities, as both have different references (memory addresses). Conversely, let's take a class of U.S. citizens. Every U.S. citizen has his or her own social security number. This number is not only unique, but remains unchanged throughout the life of the citizen and assures continuity. This citizen object would exist in the memory, would be serialized, and would be removed from the memory and stored in the database. It even exists after the person is deceased. It will be kept in the system for as long as the system exists. A citizen's social security number remains the same irrespective of its representation. Therefore, creating entities in a product means creating an identity. So, now give an identity to any restaurant in the previous example, then either use a combination of attributes such as restaurant name, establishment date, and street, or add an identifier such as restaurant_id to identify it. The basic rule is that two identifiers cannot be the same. Therefore, when we introduce an identifier for an entity, we need to be sure of it. There are different ways to create a unique identity for objects, described as follows: Using the primary key in a table. Using an automated generated ID by a domain module. A domain program generates the identifier and assigns it to objects that are being persisted among different layers. A few real-life objects carry user-defined identifiers themselves. For example, each country has its own country codes for dialing ISD calls. Composite key. This is a combination of attributes that can also be used for creating an identifier, as explained for the preceding restaurant object. Value objects Value objects (VOs) simplify the design. In contrast to entities, value objects have only attributes and no conceptual identity. A best practice is to keep value objects as immutable objects. If possible, you should even keep entity objects immutable too. You might want to keep all objects as entities, but you're likely to run into problems if you do this; there has to be one instance for each object. Let's say you are creating customers as entity objects. Each customer object would represent the restaurant guest; this cannot be used for booking orders for other guests. This may create millions of customer entity objects in the memory if millions of customers are using the system. Not only are there millions of uniquely identifiable objects that exist in the system, but each object is being tracked. Tracking as well as creating an identity is complex. A highly credible system is required to create and track these objects, which is not only very complex, but also resource heavy. It may result in system performance degradation. Therefore, it is important to use value objects instead of using entities. The reasons are explained in the next few paragraphs. Applications don't always need to have to be trackable and have an identifiable customer object. There are cases when you just need to have some or all attributes of the domain element. These are the cases when value objects can be used by the application. It makes things simple and improves the performance. Value objects can easily be created and destroyed, owing to the absence of identity. This simplifies the design—it makes value objects available for garbage collection if no other object has referenced them. Value objects should be designed and coded as immutable. Once they are created, they should never be modified during their life-cycle. If you need a different value of the VO, or any of its objects, then simply create a new value object, but don't modify the original value object. Here, immutability carries all the significance from object-oriented programming (OOP). A value object can be shared and used without impacting on its integrity if, and only if, it is immutable. Services While creating the domain model, you may come across situations where behavior may not be related to any object. These behaviors can be accommodated in service objects. Service objects are part of the domain layer and do not have any internal state. The sole purpose of service objects is to provide behavior to the domain that does not belong to a single entity or value object. Ubiquitous language helps you to identify different objects, identities, or value objects with different attributes and behaviors during the process of domain driven design and domain modelling. During the course of creating the domain model, you may find different behaviors or methods that do not belong to any specific object. Such behaviors are important, and so cannot be neglected. Neither can you add them to entities or value objects. It would spoil the object to add behavior that does not belong to it. Keep in mind, that behavior may impact on various objects. The use of object-oriented programming makes it possible to attach to some objects; this is known as a service. Services are common in technical frameworks. These are also used in domain layers in domain driven design. A service object does not have any internal state; its only purpose is to provide a behavior to the domain. Service objects provide behaviors that cannot be related to specific entities or value objects. Service objects may provide one or more related behaviors to one or more entities or value objects. It is a practice to define the services explicitly in the domain model. While creating the services, you need to tick all of the following points: Service objects' behavior performs on entities and value objects, but it does not belong to entities or value objects Service objects' behavior state is not maintained, and hence, they are stateless Services are part of the domain model Services may also exist in other layers. It is very important to keep domain-layer services isolated. It removes the complexities and keeps the design decoupled. Let's take an example where a restaurant owner wants to see the report of his monthly table bookings. In this case, he will log in as an admin and click the Display Report button after providing the required input fields, such as duration. Application layers pass the request to the domain layer that owns the report and templates objects, with some parameters such as report ID, and so on. Reports get created using the template, and data is fetched from either the database or other sources. Then the application layer passes through all the parameters, including the report ID to the business layer. Here, a template needs to be fetched from the database or another source to generate the report based on the ID. This operation does not belong to either the report object or the template object. Therefore, a service object is used that performs this operation to retrieve the required template from the database. Aggregates Aggregate domain pattern is related to the object's life cycle. It defines ownership and boundaries which is crucial in Domain Driven Design When you reserve a table at your favorite restaurant online using an application, you don't need to worry about the internal system and process that takes place to book your reservation, including searching for available restaurants, then for available tables on the given date, time, and so on and so forth. Therefore, you can say that a reservation application is an aggregate of several other objects, and works as a root for all the other objects for a table reservation system. This root should be an entity that binds collections of objects together. It is also called the aggregate root. This root object does not pass any reference of inside objects to external worlds, and protects the changes performed within internal objects. We need to understand why aggregators are required. A domain model can contain large numbers of domain objects. The bigger the application functionalities and size and the more complex its design, the greater number of objects present. A relationship exists between these objects. Some may have a many-to-many relationship, a few may have a one-to-many relationship, and others may have a one-to-one relationship. These relationships are enforced by the model implementation in the code, or in the database that ensures that these relationships among the objects are kept intact. Relationships are not just unidirectional; they can also be bidirectional. They can also increase in complexity. The designer's job is to simplify these relationships in the model. Some relationships may exist in a real domain, but may not be required in the domain model. Designers need to ensure that such relationships do not exist in the domain model. Similarly, multiplicity can be reduced by these constraints. One constraint may do the job where many objects satisfy the relationship. It is also possible that a bidirectional relationship could be converted into a unidirectional relationship. No matter how much simplification you input, you may still end up with relationships in the model. These relationships need to be maintained in the code. When one object is removed, the code should remove all the references to this object from other places. For example, a record removal from one table needs to be addressed wherever it has references in the form of foreign keys and such, to keep the data consistent and maintain its integrity. Also, invariants (rules) need to be forced and maintained whenever data changes. Relationships, constraints, and invariants bring a complexity that requires an efficient handling in code. We find the solution by using the aggregate represented by the single entity known as the root, which is associated with the group of objects that maintains consistency with regards to data changes. This root is the only object that is accessible from outside, so this root element works as a boundary gate that separates the internal objects from the external world. Roots can refer to one or more inside objects, and these inside objects can have references to other inside objects that may or may not have relationships with the root. However, outside objects can also refer to the root, and not to any inside objects. An aggregate ensures data integrity and enforces the invariant. Outside objects cannot make any change to inside objects; they can only change the root. However, they can use the root to make a change inside the object by calling exposed operations. The root should pass the value of inside objects to outside objects if required. If an aggregate object is stored in the database, then the query should only return the aggregate object. Traversal associations should be used to return the object when it is internally linked to the aggregate root. These internal objects may also have references to other aggregates. An aggregate root entity holds its global identity, and holds local identities inside their entities. A simple example of an aggregate in the table booking system is the customer. Customers can be exposed to external objects, and their root object contains their internal object address and contact information. When requested, the value object of internal objects, such as address, can be passed to external objects: Repository In a domain model, at a given point in time, many domain objects may exist. Each object may have its own life-cycle, from the creation of objects to their removal or persistence. Whenever any domain operation needs a domain object, it should retrieve the reference of the requested object efficiently. It would be very difficult if you didn't maintain all of the available domain objects in a central object. A central object carries the references of all the objects, and is responsible for returning the requested object reference. This central object is known as the repository. The repository is a point that interacts with infrastructures such as the database or file system. A repository object is the part of the domain model that interacts with storage such as the database, external sources, and so on, to retrieve the persisted objects. When a request is received by the repository for an object's reference, it returns the existing object's reference. If the requested object does not exist in the repository, then it retrieves the object from storage. For example, if you need a customer, you would query the repository object to provide the customer with ID 31. The repository would provide the requested customer object if it is already available in the repository, and if not, it would query the persisted stores such as the database, fetch it, and provide its reference. The main advantage of using the repository is having a consistent way to retrieve objects where the requestor does not need to interact directly with the storage such as the database. A repository may query objects from various storage types, such as one or more databases, filesystems, or factory repositories, and so on. In such cases, a repository may have strategies that also point to different sources for different object types As you can see in the repository object flow diagram on the right, the repository interacts with the infrastructure layer, and this interface is part of the domain layer. The requestor may belong to a domain layer, or an application layer. The repository helps the system to manage the life cycle of domain objects. Factory A factory is required when a simple constructor is not enough to create the object. It helps to create complex objects, or an aggregate that involves the creation of other related objects. A factory is also a part of the life cycle of domain objects, as it is responsible for creating them. Factories and repositories are in some way related to each other, as both refer to domain objects. The factory refers to newly created objects, whereas the repository returns the already existing objects either from the memory, or from external storage. Let's see how control flows, by using a user creation process application. Let's say that a user signs up with a username user1. This user creation first interacts with the factory, which creates the name user1 and then caches it in the domain using the repository, which also stores it in the storage for persistence. When the same user logs in again, the call moves to the repository for a reference. This uses the storage to load the reference and pass it to the requestor. The requestor may then use this user1 object to book the table in a specified restaurant, and at a specified time. These values are passed as parameters, and a table booking record is created in storage using the repository: The factory may use one of the object-oriented programming patterns, such as the factory or abstract factory pattern, for object creation. Modules Modules are the best way to separate related business objects. These are best suited to large projects where the size of domain objects is bigger. For the end user, it makes sense to divide the domain model into modules and set the relationship between these modules. Once you understand the modules and their relationship, you start to see the bigger picture of the domain model, thus it's easier to drill down further and understand the model. Modules also help you to write code that is highly cohesive, or maintains low coupling. Ubiquitous language can be used to name these modules. For the table booking system, we could have different modules, such as user-management, restaurants and tables, analytics and reports, and reviews, and so on. This introduction to domain driven design should give you a strong foundation for using it when you build software. It's principles are useful - in particular, making sure you collaborate and use the same language as different stakeholders is one of domain driven design's most valuable contributions to the way we approach software development.

0
0
25354

How-To Tutorials - Programming

That '70s language: AWK programming

How to work with classes in Typescript

How to install and configure TypeScript

8 recipes to master Promises in ECMAScript 2018

Applying Single Responsibility principle from SOLID in .NET Core

How to call an Azure function from an ASP.NET Core MVC application

Seven wrongs don’t make the one right: Solving a problem with Functional Javascript

Implementing 5 Common Design Patterns in JavaScript (ES8)

How to dockerize an ASP.NET Core application

Using R6 classes in R to retrieve live data for markets and wallets

Trending Topics

Chaos Engineering: managing complexity by breaking things

How to boost R codes using C++ and Fortran

Cross-validation in R for predictive models

26 new Java 9 enhancements you will love

What is domain driven design?

Create a Free Account To Continue Reading

SignIn Free Account To Continue Reading