Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-cors-nodejs
Packt
20 Jun 2017
14 min read
Save for later

CORS in Node.js

Packt
20 Jun 2017
14 min read
In this article by Randall Goya, and Rajesh Gunasundaram the author of the book CORS Essentials, Node.js is a cross-platform JavaScript runtime environment that executes JavaScript code at server side. This enables to have a unified language across the web application development. JavaScript becomes the unified language that runs both on client side and server side. (For more resources related to this topic, see here.) In this article we will learn about: Node.js is a JavaScript platform for developing server-side web applications. Node.js can provide the web server for other frameworks including Express.js, AngularJS, Backbone,js, Ember.js and others. Some other JavaScript frameworks such as ReactJS, Ember.js and Socket.IO may also use Node.js as the web server. Isomorphic JavaScript can add server-side functionality for client-side frameworks. JavaScript frameworks are evolving rapidly. This article reviews some of the current techniques, and syntax specific for some frameworks. Make sure to check the documentation for the project to discover the latest techniques. Understanding CORS concepts, you may create your own solution, because JavaScript is a loosely structured language. All the examples are based on the fundamentals of CORS, with allowed origin(s), methods, and headers such as Content-Type, or preflight, that may be required according to the CORS specification. JavaScript frameworks are very popular JavaScript is sometimes called the lingua franca of the Internet, because it is cross-platform and supported by many devices. It is also a loosely-structured language, which makes it possible to craft solutions for many types of applications. Sometimes an entire application is built in JavaScript. Frequently JavaScript provides a client-side front-end for applications built with Symfony, Content Management Systems such as Drupal, and other back-end frameworks. Node.js is server-side JavaScript and provides a web server as an alternative to Apache, IIS, Nginx and other traditional web servers. Introduction to Node.js Node.js is an open-source and cross-platform library that enables in developing server-side web applications. Applications will be written using JavaScript in Node.js can run on many operating systems, including OS X, Microsoft Windows, Linux, and many others. Node.js provides a non-blocking I/O and an event-driven architecture designed to optimize an application's performance and scalability for real-time web applications. The biggest difference between PHP and Node.js is that PHP is a blocking language, where commands execute only after the previous command has completed, while Node.js is a non-blocking language where commands execute in parallel, and use callbacks to signal completion. Node.js can move files, payloads from services, and data asynchronously, without waiting for some command to complete, which improves performance. Most JS frameworks that work with Node.js use the concept of routes to manage pages and other parts of the application. Each route may have its own set of configurations. For example, CORS may be enabled only for a specific page or route. Node.js loads modules for extending functionality via the npm package manager. The developer selects which packages to load with npm, which reduces bloat. The developer community creates a large number of npm packages created for specific functions. JXcore is a fork of Node.js targeting mobile devices and IoTs (Internet of Things devices). JXcore can use both Google V8 and Mozilla SpiderMonkey as its JavaScript engine. JXcore can run Node applications on iOS devices using Mozilla SpiderMonkey. MEAN is a popular JavaScript software stack with MongoDB (a NoSQL database), Express.js and AngularJS, all of which run on a Node.js server. JavaScript frameworks that work with Node.js Node.js provides a server for other popular JS frameworks, including AngularJS, Express.js. Backbone.js, Socket.IO, and Connect.js. ReactJS was designed to run in the client browser, but it is often combined with a Node.js server. As we shall see in the following descriptions, these frameworks are not necessarily exclusive, and are often combined in applications. Express.js is a Node.js server framework Express.js is a Node.js web application server framework, designed for building single-page, multi-page, and hybrid web applications. It is considered the "standard" server framework for Node.js. The package is installed with the command npm install express –save. AngularJS extends static HTML with dynamic views HTML was designed for static content, not for dynamic views. AngularJS extends HTML syntax with custom tag attributes. It provides model–view–controller (MVC) and model–view–viewmodel (MVVM) architectures in a front-end client-side framework.  AngularJS is often combined with a Node.js server and other JS frameworks. AngularJS runs client-side and Express.js runs on the server, therefore Express.js is considered more secure for functions such as validating user input, which can be tampered client-side. AngularJS applications can use the Express.js framework to connect to databases, for example in the MEAN stack. Connect.js provides middleware for Node.js requests Connect.js is a JavaScript framework providing middleware to handle requests in Node.js applications. Connect.js provides middleware to handle Express.js and cookie sessions, to provide parsers for the HTML body and cookies, and to create vhosts (virtual hosts) and error handlers, and to override methods. Backbone.js often uses a Node.js server Backbone.js is a JavaScript framework with a RESTful JSON interface and is based on the model–view–presenter (MVP) application design. It is designed for developing single-page web applications, and for keeping various parts of web applications (for example, multiple clients and the server) synchronized. Backbone depends on Underscore.js, plus jQuery for use of all the available fetures. Backbone often uses a Node.js server, for example to connect to data storage. ReactJS handles user interfaces ReactJS is a JavaScript library for creating user interfaces while addressing challenges encountered in developing single-page applications where data changes over time. React handles the user interface in model–view–controller (MVC) architecture. ReactJS typically runs client-side and can be combined with AngularJS. Although ReactJS was designed to run client-side, it can also be used server-side in conjunction with Node.js. PayPal and Netflix leverage the server-side rendering of ReactJS known as Isomorphic ReactJS. There are React-based add-ons that take care of the server-side parts of a web application. Socket.IO uses WebSockets for realtime event-driven applications Socket.IO is a JavaScript library for event-driven web applications using the WebSocket protocol ,with realtime, bi-directional communication between web clients and servers. It has two parts: a client-side library that runs in the browser, and a server-side library for Node.js. Although it can be used as simply a wrapper for WebSocket, it provides many more features, including broadcasting to multiple sockets, storing data associated with each client, and asynchronous I/O. Socket.IO provides better security than WebSocket alone, since allowed domains must be specified for its server. Ember.js can use Node.js Ember is another popular JavaScript framework with routing that uses Moustache templates. It can run on a Node.js server, or also with Express.js. Ember can also be combined with Rack, a component of Ruby On Rails (ROR). Ember Data is a library for  modeling data in Ember.js applications. CORS in Express.js The following code adds the Access-Control-Allow-Origin and Access-Control-Allow-Headers headers globally to all requests on all routes in an Express.js application. A route is a path in the Express.js application, for example /user for a user page. app.all sets the configuration for all routes in the application. Specific HTTP requests such as GET or POST are handled by app.get and app.post. app.all('*', function(req, res, next) { res.header("Access-Control-Allow-Origin", "*"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); app.get('/', function(req, res, next) { // Handle GET for this route }); app.post('/', function(req, res, next) { // Handle the POST for this route }); For better security, consider limiting the allowed origin to a single domain, or adding some additional code to validate or limit the domain(s) that are allowed. Also, consider limiting sending the headers only for routes that require CORS by replacing app.all with a more specific route and method. The following code only sends the CORS headers on a GET request on the route/user, and only allows the request from http://www.localdomain.com. app.get('/user', function(req, res, next) { res.header("Access-Control-Allow-Origin", "http://www.localdomain.com"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); Since this is JavaScript code, you may dynamically manage the values of routes, methods, and domains via variables, instead of hard-coding the values. CORS npm for Express.js using Connect.js middleware Connect.js provides middleware to handle requests in Express.js. You can use Node Package Manager (npm) to install a package that enables CORS in Express.js with Connect.js: npm install cors The package offers flexible options, which should be familiar from the CORS specification, including using credentials and preflight. It provides dynamic ways to validate an origin domain using a function or a regular expression, and handler functions to process preflight. Configuration options for CORS npm origin: Configures the Access-Control-Allow-Origin CORS header with a string containing the full URL and protocol making the request, for example http://localdomain.com. Possible values for origin: Default value TRUE uses req.header('Origin') to determine the origin and CORS is enabled. When set to FALSE CORS is disabled. It can be set to a function with the request origin as the first parameter and a callback function as the second parameter. It can be a regular expression, for example /localdomain.com$/, or an array of regular expressions and/or strings to match. methods: Sets the Access-Control-Allow-Methods CORS header. Possible values for methods: A comma-delimited string of HTTP methods, for example GET, POST An array of HTTP methods, for example ['GET', 'PUT', 'POST'] allowedHeaders: Sets the Access-Control-Allow-Headers CORS header. Possible values for allowedHeaders: A comma-delimited string of  allowed headers, for example "Content-Type, Authorization'' An array of allowed headers, for example ['Content-Type', 'Authorization'] If unspecified, it defaults to the value specified in the request's Access-Control-Request-Headers header exposedHeaders: Sets the Access-Control-Expose-Headers header. Possible values for exposedHeaders: A comma-delimited string of exposed headers, for example 'Content-Range, X-Content-Range' An array of exposed headers, for example ['Content-Range', 'X-Content-Range'] If unspecified, no custom headers are exposed credentials: Sets the Access-Control-Allow-Credentials CORS header. Possible values for credentials: TRUE—passes the header for preflight FALSE or unspecified—omit the header, no preflight maxAge: Sets the Access-Control-Allow-Max-Age header. Possible values for maxAge An integer value in milliseconds for TTL to cache the request If unspecified, the request is not cached preflightContinue: Passes the CORS preflight response to the next handler. The default configuration without setting any values allows all origins and methods without preflight. Keep in mind that complex CORS requests other than GET, HEAD, POST will fail without preflight, so make sure you enable preflight in the configuration when using them. Without setting any values, the configuration defaults to: { "origin": "*", "methods": "GET,HEAD,PUT,PATCH,POST,DELETE", "preflightContinue": false } Code examples for CORS npm These examples demonstrate the flexibility of CORS npm for specific configurations. Note that the express and cors packages are always required. Enable CORS globally for all origins and all routes The simplest implementation of CORS npm enables CORS for all origins and all requests. The following example enables CORS for an arbitrary route " /product/:id" for a GET request by telling the entire app to use CORS for all routes: var express = require('express') , cors = require('cors') , app = express(); app.use(cors()); // this tells the app to use CORS for all re-quests and all routes app.get('/product/:id', function(req, res, next){ res.json({msg: 'CORS is enabled for all origins'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80'); }); Allow CORS for dynamic origins for a specific route The following example uses corsOptions to check if the domain making the request is in the whitelisted array with a callback function, which returns null if it doesn't find a match. This CORS option is passed to the route "product/:id" which is the only route that has CORS enabled. The allowed origins can be dynamic by changing the value of the variable "whitelist." var express = require('express') , cors = require('cors') , app = express(); // define the whitelisted domains and set the CORS options to check them var whitelist = ['http://localdomain.com', 'http://localdomain-other.com']; var corsOptions = { origin: function(origin, callback){ var originWhitelisted = whitelist.indexOf(origin) !== -1; callback(null, originWhitelisted); } }; // add the CORS options to a specific route /product/:id for a GET request app.get('/product/:id', cors(corsOptions), function(req, res, next){ res.json({msg: 'A whitelisted domain matches and CORS is enabled for route product/:id'}); }); // log that CORS is enabled on the server app.listen(80, function(){ console.log(''CORS is enabled on the web server listening on port 80''); }); You may set different CORS options for specific routes, or sets of routes, by defining the options assigned to unique variable names, for example "corsUserOptions." Pass the specific configuration variable to each route that requires that set of options. Enabling CORS preflight CORS requests that use a HTTP method other than GET, HEAD, POST (for example DELETE), or that use custom headers, are considered complex and require a preflight request before proceeding with the CORS requests. Enable preflight by adding an OPTIONS handler for the route: var express = require('express') , cors = require('cors') , app = express(); // add the OPTIONS handler app.options('/products/:id', cors()); // options is added to the route /products/:id // use the OPTIONS handler for the DELETE method on the route /products/:id app.del('/products/:id', cors(), function(req, res, next){ res.json({msg: 'CORS is enabled with preflight on the route '/products/:id' for the DELETE method for all origins!'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); You can enable preflight globally on all routes with the wildcard: app.options('*', cors()); Configuring CORS asynchronously One of the reasons to use NodeJS frameworks is to take advantage of their asynchronous abilities, handling multiple tasks at the same time. Here we use a callback function corsDelegateOptions and add it to the cors parameter passed to the route /products/:id. The callback function can handle multiple requests asynchronously. var express = require('express') , cors = require('cors') , app = express(); // define the allowed origins stored in a variable var whitelist = ['http://example1.com', 'http://example2.com']; // create the callback function var corsDelegateOptions = function(req, callback){ var corsOptions; if(whitelist.indexOf(req.header('Origin')) !== -1){ corsOptions = { origin: true }; // the requested origin in the CORS response matches and is allowed }else{ corsOptions = { origin: false }; // the requested origin in the CORS response doesn't match, and CORS is disabled for this request } callback(null, corsOptions); // callback expects two parameters: error and options }; // add the callback function to the cors parameter for the route /products/:id for a GET request app.get('/products/:id', cors(corsDelegateOptions), function(req, res, next){ res.json({msg: ''A whitelisted domain matches and CORS is enabled for route product/:id'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); Summary We have learned important stuffs of applying CORS in Node.js. Let us have a qssuick recap of what we have learnt: Node.js provides a web server built with JavaScript, and can be combined with many other JS frameworks as the application server. Although some frameworks have specific syntax for implementing CORS, they all follow the CORS specification by specifying allowed origin(s) and method(s). More robust frameworks allow custom headers such as Content-Type, and preflight when required for complex CORS requests. JavaScript frameworks may depend on the jQuery XHR object, which must be configured properly to allow Cross-Origin requests. JavaScript frameworks are evolving rapidly. The examples here may become outdated. Always refer to the project documentation for up-to-date information. With knowledge of the CORS specification, you may create your own techniques using JavaScript based on these examples, depending on the specific needs of your application. https://en.wikipedia.org/wiki/Node.js  Resources for Article: Further resources on this subject: An Introduction to Node.js Design Patterns [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] API with MongoDB and Node.js [article]
Read more
  • 0
  • 0
  • 25710

article-image-understanding-basics-rxjava
Packt
20 Jun 2017
15 min read
Save for later

Understanding the Basics of RxJava

Packt
20 Jun 2017
15 min read
In this article, by Tadas Subonis author of the book Reactive Android Programming, will go through the core basics of RxJava so that we can fully understand what it is, what are the core elements, and how they work. Before that, let's take a step back and briefly discuss how RxJava is different from other approaches. RxJava is about reacting to results. It might be an item that originated from some source. It can also be an error. RxJava provides a framework to handle these items in a reactive way and to create complicated manipulation and handling schemes in a very easy-to-use interface. Things like waiting for an arrival of an item before transforming it become very easy with RxJava. To achieve all this, RxJava provides some basic primitives: Observables: A source of data Subscriptions: An activated handle to the Observable that receives data Schedulers: A means to define where (on which Thread) the data is processed First of all, we will cover Observables--the source of all the data and the core structure/class that we will be working with. We will explore how are they related to Disposables (Subscriptions). Furthermore, the life cycle and hook points of an Observable will be described, so we will actually know what's happening when an item travels through an Observable and what are the different stages that we can tap into. Finally, we will briefly introduce Flowable--a big brother of Observable that lets you handle big amounts of data with high rates of publishing. To summarize, we will cover these aspects: What is an Observable? What are Disposables (formerly Subscriptions)? How items travel through the Observable? What is backpressure and how we can use it with Flowable? Let's dive into it! (For more resources related to this topic, see here.) Observables Everything starts with an Observable. It's a source of data that you can observe for emitted data (hence the name). In almost all cases, you will be working with the Observable class. It is possible to (and we will!) combine different Observables into one Observable. Basically, it is a universal interface to tap into data streams in a reactive way. There are lots of different ways of how one can create Observables. The simplest way is to use the .just() method like we did before: Observable.just("First item", "Second item"); It is usually a perfect way to glue non-Rx-like parts of the code to Rx compatible flow. When an Observable is created, it is not usually defined when it will start emitting data. If it was created using simple tools such as.just(), it won't start emitting data until there is a subscription to the observable. How do you create a subscription? It's done by calling .subscribe() : Observable.just("First item", "Second item") .subscribe(); Usually (but not always), the observable be activated the moment somebody subscribes to it. So, if a new Observable was just created, it won't magically start sending data "somewhere". Hot and Cold Observables Quite often, in the literature and documentation terms, Hot and Cold Observables can be found. Cold Observable is the most common Observable type. For example, it can be created with the following code: Observable.just("First item", "Second item") .subscribe(); Cold Observable means that the items won't be emitted by the Observable until there is a Subscriber. This means that before the .subscribe() is called, no items will be produced and thus none of the items that are intended to be omitted will be missed, everything will be processed. Hot Observable is an Observable that will begin producing (emitting) items internally as soon as it is created. The status updates are produced constantly and it doesn't matter if there is something that is ready to receive them (like Subscription). If there were no subscriptions to the Observable, it means that the updates will be lost. Disposables A disposable (previously called Subscription in RxJava 1.0) is a tool that can be used to control the life cycle of an Observable. If the stream of data that the Observable is producing is boundless, it means that it will stay active forever. It might not be a problem for a server-side application, but it can cause some serious trouble on Android. Usually, this is the common source of memory leaks. Obtaining a reference to a disposable is pretty simple: Disposable disposable = Observable.just("First item", "Second item") .subscribe(); Disposable is a very simple interface. It has only two methods: dispose() and isDisposed() .  dispose() can be used to cancel the existing Disposable (Subscription). This will stop the call of .subscribe()to receive any further items from Observable, and the Observable itself will be cleaned up. isDisposed() has a pretty straightforward function--it checks whether the subscription is still active. However, it is not used very often in regular code as the subscriptions are usually unsubscribed and forgotten. The disposed subscriptions (Disposables) cannot be re-enabled. They can only be created anew. Finally, Disposables can be grouped using CompositeDisposable like this: Disposable disposable = new CompositeDisposable( Observable.just("First item", "Second item").subscribe(), Observable.just("1", "2").subscribe(), Observable.just("One", "Two").subscribe() ); It's useful in the cases when there are many Observables that should be canceled at the same time, for example, an Activity being destroyed. Schedulers As described in the documentation, a scheduler is something that can schedule a unit of work to be executed now or later. In practice, it means that Schedulers control where the code will actually be executed and usually that means selecting some kind of specific thread. Most often, Subscribers are used to executing long-running tasks on some background thread so that it wouldn't block the main computation or UI thread. This is especially relevant on Android when all long-running tasks must not be executed on MainThread. Schedulers can be set with a simple .subscribeOn() call: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .subscribe(); There are only a few main Schedulers that are commonly used: Schedulers.io() Schedulers.computation() Schedulers.newThread() AndroidSchedulers.mainThread() The AndroidSchedulers.mainThread() is only used on Android systems. Scheduling examples Let's explore how schedulers work by checking out a few examples. Let's run the following code: Observable.just("First item", "Second item") .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as follows: on-next:main:First item subscribe:main:First item on-next:main:Second item subscribe:main:Second item Now let's try changing the code to as shown: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); Now, the output should look like this: on-next:RxCachedThreadScheduler-1:First item subscribe:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:RxCachedThreadScheduler-1:Second item We can see how the code was executed on the main thread in the first case and on a new thread in the next. Android requires that all UI modifications should be done on the main thread. So, how can we execute a long-running process in the background but process the result on the main thread? That can be done with .observeOn() method: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as illustrated: on-next:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:main:First item subscribe:main:Second item You will note that the items in the doOnNext block were executed on the "RxThread", and the subscribe block items were executed on the main thread. Investigating the Flow of Observable The logging inside the steps of an Observable is a very powerful tool when one wants to understand how they work. If you are in doubt at any point as to what's happening, add logging and experiment. A few quick iterations with logs will definitely help you understand what's going on under the hood. Let's use this technique to analyze a full flow of an Observable. We will start off with this script: private void log(String stage, String item) { Log.d("APP", stage + ":" + Thread.currentThread().getName() + ":" + item); } private void log(String stage) { Log.d("APP", stage + ":" + Thread.currentThread().getName()); } Observable.just("One", "Two") .subscribeOn(Schedulers.io()) .doOnDispose(() -> log("doOnDispose")) .doOnComplete(() -> log("doOnComplete")) .doOnNext(e -> log("doOnNext", e)) .doOnEach(e -> log("doOnEach")) .doOnSubscribe((e) -> log("doOnSubscribe")) .doOnTerminate(() -> log("doOnTerminate")) .doFinally(() -> log("doFinally")) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> log("subscribe", e)); It can be seen that it has lots of additional and unfamiliar steps (more about this later). They represent different stages during the processing of an Observable. So, what's the output of the preceding script?: doOnSubscribe:main doOnNext:RxCachedThreadScheduler-1:One doOnEach:RxCachedThreadScheduler-1 doOnNext:RxCachedThreadScheduler-1:Two doOnEach:RxCachedThreadScheduler-1 doOnComplete:RxCachedThreadScheduler-1 doOnEach:RxCachedThreadScheduler-1 doOnTerminate:RxCachedThreadScheduler-1 doFinally:RxCachedThreadScheduler-1 subscribe:main:One subscribe:main:Two doOnDispose:main Let's go through some of the steps. First of all, by calling .subscribe() the doOnSubscribe block was executed. This started the emission of items from the Observable as we can see on the doOnNext and doOnEach lines. Finally, the stream finished at termination life cycle was activated--the doOnComplete, doOnTerminate and doOnFinally. Also, the reader will note that the doOnDispose block was called on the main thread along with the subscribe block. The flow will be a little different if .subscribeOn() and .observeOn() calls won't be there: doOnSubscribe:main doOnNext:main:One doOnEach:main subscribe:main:One doOnNext:main:Two doOnEach:main subscribe:main:Two doOnComplete:main doOnEach:main doOnTerminate:main doOnDispose:main doFinally:main You will readily note that now, the doFinally block was executed after doOnDispose while in the former setup, doOnDispose was the last. This happens due to the way Android Looper schedulers code blocks for execution and the fact that we used two different threads in the first case. The takeaway here is that whenever you are unsure of what is going on, start logging actions (and the thread they are running on) to see what's actually happening. Flowable Flowable can be regarded as a special type of Observable (but internally it isn't). It has almost the same method signature like the Observable as well. The difference is that Flowable allows you to process items that emitted faster from the source than some of the following steps can handle. It might sound confusing, so let's analyze an example. Assume that you have a source that can emit a million items per second. However, the next step uses those items to do a network request. We know, for sure, that we cannot do more than 50 requests per second: That poses a problem. What will we do after 60 seconds? There will be 60 million items in the queue waiting to be processed. The items are accumulating at a rate of 1 million items per second between the first and the second steps because the second step processes them at a much slower rate. Clearly, the problem here is that the available memory will be exhausted and the programming will fail with an OutOfMemory (OOM) exception. For example, this script will cause an excessive memory usage because the processing step just won't be able to keep up with the pace the items are emitted at. PublishSubject<Integer> observable = PublishSubject.create(); observable .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); for (int i = 0; i < 1000000; i++) { observable.onNext(i); } private void log(Throwable throwable) { Log.e("APP", "Error", throwable); } By converting this to a Flowable, we can start controlling this behavior: observable.toFlowable(BackpressureStrategy.MISSING) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Since we have chosen not to specify how we want to handle items that cannot be processed (it's called Backpressuring), it will throw a MissingBackpressureException. However, if the number of items was 100 instead of a million, it would have been just fine as it wouldn't hit the internal buffer of Flowable. By default, the size of the Flowable queue (buffer) is 128. There are a few Backpressure strategies that will define how the excessive amount of items should be handled. Drop Items Dropping means that if the downstream processing steps cannot keep up with the pace of the source Observable, just drop the data that cannot be handled. This can only be used in the cases when losing data is okay, and you care more about the values that were emitted in the beginning. There are a few ways in which items can be dropped. The first one is just to specify Backpressure strategy, like this: observable.toFlowable(BackpressureStrategy.DROP) Alternatively, it will be like this: observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureDrop() A similar way to do that would be to call .sample(). It will emit items only periodically, and it will take only the last value that's available (while BackpressureStrategy.DROP drops it instantly unless it is free to push it down the stream). All the other values between "ticks" will be dropped: observable.toFlowable(BackpressureStrategy.MISSING) .sample(10, TimeUnit.MILLISECONDS) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Preserve Latest Item Preserving the last items means that if the downstream cannot cope with the items that are being sent to them, stop emitting values and wait until they become available. While waiting, keep dropping all the values except the last one that arrived and when the downstream becomes available to send the last message that's currently stored. Like with Dropping, the "Latest" strategy can be specified while creating an Observable: observable.toFlowable(BackpressureStrategy.LATEST) Alternatively, by calling .onBackpressure(): observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureLatest() Finally, a method, .debounce(), can periodically take the last value at specific intervals: observable.toFlowable(BackpressureStrategy.MISSING) .debounce(10, TimeUnit.MILLISECONDS) Buffering It's usually a poor way to handle different paces of items being emitted and consumed as it often just delays the problem. However, this can work just fine if there is just a temporal slowdown in one of the consumers. In this case, the items emitted will be stored until later processing and when the slowdown is over, the consumers will catch up. If the consumers cannot catch up, at some point the buffer will run out and we can see a very similar behavior to the original Observable with memory running out. Enabling buffers is, again, pretty straightforward by calling the following: observable.toFlowable(BackpressureStrategy.BUFFER) or observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureBuffer() If there is a need to specify a particular value for the buffer, one can use .buffer(): observable.toFlowable(BackpressureStrategy.MISSING) .buffer(10) Completable, Single, and Maybe Types Besides the types of Observable and Flowable, there are three more types that RxJava provides: Completable: It represents an action without a result that will be completed in the future Single: It's just like Observable (or Flowable) that returns a single item instead of a stream Maybe: It stands for an action that can complete (or fail) without returning any value (like Completable) but can also return an item like Single However, all these are used quite rarely. Let's take a quick look at the examples. Completable Since Completable can basically process just two types of actions--onComplete and onError--we will cover it very briefly. Completable has many static factory methods available to create it but, most often, it will just be found as a return value in some other libraries. For example, the Completable can be created by calling the following: Completable completable = Completable.fromAction(() -> { log("Let's do something"); }); Then, it is to be subscribed with the following: completable.subscribe(() -> { log("Finished"); }, throwable -> { log(throwable); }); Single Single provides a way to represent an Observable that will return just a single item (thus the name). You might ask, why it is worth having it at all? These types are useful to tell the developers about the specific behavior that they should expect. To create a Single, one can use this example: Single.just("One item") The Single and the Subscription to it can be created with the following: Single.just("One item") .subscribe((item) -> { log(item); }, (throwable) -> { log(throwable); }); Make a note that this differs from Completable in that the first argument to the .subscribe() action now expects to receive an item as a result. Maybe Finally, the Maybe type is very similar to the Single type, but the item might not be returned to the subscriber in the end. The Maybe type can be created in a very similar fashion as before: Maybe.empty(); or like Maybe.just("Item"); However, the .subscribe() can be called with arguments dedicated to handling onSuccess (for received items), onError (to handle errors), and onComplete (to do a final action after the item is handled): Maybe.just("Item") .subscribe( s -> log("success: " + s), throwable -> log("error"), () -> log("onComplete") ); Summary In this article, we covered the most essentials parts of RxJava. Resources for Article: Further resources on this subject: The Art of Android Development Using Android Studio [article] Drawing and Drawables in Android Canvas [article] Optimizing Games for Android [article]
Read more
  • 0
  • 0
  • 14739

article-image-introduction-nfrs
Packt
20 Jun 2017
14 min read
Save for later

Introduction to NFRs

Packt
20 Jun 2017
14 min read
In this article by Sameer Paradkar, the author of the book Mastering Non-Functional Requirements, we will learn the non-functional requirements are those aspects of the IT system that, while not directly affect the business functionality of the application but have a profound impact on the efficiency and effectiveness of business systems for end users as well as the people responsible for supporting the program. The definition of these requirements is an essential factor in developing a total customer solution that delivers business goals. Non-functional requirements are used primarily to drive the operational aspects of the architecture, in other words, to address major operational and technical areas of the system to ensure the robustness and ruggedness of the application. Benchmark or Proof-of-Concept can be used to verify if the implementation meets these requirements or indicate if a corrective action is necessary. Ideally, a series of tests should be planned that maps to the development schedule and grows in complexity. The topics that are covered in this article are as follows: Definition of NFRs NFR KPIs and metrics (For more resources related to this topic, see here.) Introducing NFR The following pointers state the definition of NFR: To define requirements and constraints on the IT system As a basis for cost estimates and early system sizing To assess the viability of the proposed IT system NFRs are an important determining factor of the architecture and design of the operational models As an guideline to design phase to meet NFRs such as performance, scalability, availability The NFRs foreach of the domains e.g. scalability, availability and so on,must be understood to facilitate the design and development of the target operating model. These include the servers, networks, and platforms including the application runtime environments. These are critical for the execution of benchmark tests. They also affect the design of technical and application components. End users have expectations about the effectiveness of the application. These characteristics include ease of software use, speed, reliability, and recoverability when unexpected conditions arise. The NFRs define these aspects of the IT system. The non-functional requirements should be defined precisely and involves quantifying them. NFRs should provide measurements the application must meet. For example, the maximum number of time allowed to execute a process, the number of hours in a day an application must be available, the maximum size of a database on disk, and the number of concurrent users supported are typical NFRs the software must implement. Figure 1: Key Non-Functional Requirements There are many kinds of non-functional requirements, including: Performance Performance is the responsiveness of the application to perform specific actions in a given time span. Performance is scored in terms of throughput or latency. Latency is the time taken by the application to respond to an event. Throughput is the number of events scored in a given time interval. An application’s performance can directly impact its scalability. Enhancing application’s performance often enhances scalability by reducing contention for shared resources. Performance attributes specify the timing characteristics of the application. Certain features are more time-sensitive than others; the NFRs should identify such software tasks that have constraints on their performance. Response time relates to the time needed to complete specific business processes, batch or interactive, within the target business system. The system must be designed to fulfil the agreed upon response time requirements, while supporting the defined workload mapped against the given static baseline, on a system platform that does not exceed the stated utilization. The following attributes are: Throughput: The ability of the system to execute a given number of transactions within a given unit of time Response times: The distribution of time which the system takes to respond to the request Scalability Scalability is the ability to handle an increase in the work load without impacting the performance, or the ability to quickly expand the architecture. Itis the ability to expand the architecture to accommodate more users, more processes, more transactions, additional systems and services as the business requirements change and the systems evolve to meet the future business demands. This permits existing systems to be extended without replacing them. Thisdirectly affects the architecture and the selection of software components and hardware. The solution must allow the hardware and the deployed software services and components to be scaled horizontally as well as vertically. Horizontal scaling involves replicating the same functionality across additional nodes vertical scaling involves the same functionality across bigger and more powerful nodes. Scalability definitions measure volumes of users and data the system should support. There are two key techniques for improving both vertical and horizontal scalability. Vertical Scaling is also known as scaling up and includes adding more resources such as memory, CPUand hard disk to a system. Horizontal scaling is also know as scaling out and includes adding more nodes to a cluster forwork load sharing. The following attributes are: Throughput: Number of maximum transactions your system needs to handle. E.g., thousand a day or A million Storage: Amount  of data you going to need to store Growth requirements: Data growth in the next 3-5 years Availability Availability is the time frame in which the system functions normally and without failures. Availability is measured as the percentage of total application downtime over a defined time period. Availability is affected by failures, exceptions, infrastructure issues, malicious attacks, and maintenance and upgrades. It is the uptime or the amount of time the system is operational and available for use. This is specified because some systems are architected with expected downtime for activities like database upgrades and backups. Availability also conveys the number of hours or days per week or weeks per year the application will be available to its end customers, as well as how rapidly it can recover from faults. Since the architecture establishes software, hardware, and networking entities, this requirement extends to all of them. Hardware availability, recoverability, and reliability definitions measure system up-time. For example, it is specified in terms of mean time between failures or “MTBF”. The following attributes are: Availability: Application availability considering the weekends, holidays and maintenance times and failures. Locations of operation: Geographic location, Connection requirements and the restrictions of the network prevail. Offline Requirement: Time available for offline operations including batch processing & system maintenance. Length of time between failures Recoverability: Time required by the system can resume operation in the event of failure. Resilience: The reliability characteristics of the system and sub-components Capacity This non-functional requirement defines the ways in which the system is expected to scale-up by increasing capacity, hardware or adding machines based on business objectives. Capacity is delivering enough functionality required for the end users.  A request for a web service to provide 1,000 requests per second when the server is only capable of 100 requests a second, may not succeed.  While this sounds like an availability issue, it occurs because the server is unable to handle the requisite capacity. A single node may not be able to provide enough capacity, and one needs to deploy multiple nodes with a similar configuration to meet organizational capacity requirements. Capacity to identify a failing node and restart it on another machine or VM is a non-functional requirement. The following attributes are: Throughput is the number of peak transactions the system needs to handle Storage: Volume of data the system can persist at run time to disk and relates to the memory/disk Year-on-yeargrowthrequirements (users, processing and storage) e-channel growth projections Different types of things (for example, activities or transactions supported, and so on) For each type of transaction, volumes on an hourly, daily, weekly, monthly, and so on During the specific time of the day (for example, at lunch), week, month or year are volumes significantly higher Transaction volume growth expected and additional volumes you will be able to handle Security Security is the ability of an application to avoid malicious incidences and events outside of the designed system usage, and prevent disclosure or loss of information. Improving security increases the reliability of application by reducing the likelihood of an attack succeeding and impairing operations. Adding security controls protects assets and prevents unauthorized access and manipulation of critical information. The factors that affect an application security are confidentiality and integrity. The key security controls used to secure systems are authorization, authentication, encryption, auditing, and logging. Definition and monitoring of effectiveness in meeting the security requirements of the system, for example, to avoid financial harm in accounting systems, is critical. Integrityrequirements are restrictingaccess to functionality or data to certain users and protecting the privacyof data entered into the software. The following attributes are: Authentication: Correct identification of parties attempting to access systems and protection of systems from unauthorized parties Authorization: Mechanism required to authorize users to perform different functions within the systems Encryption(data at rest or data in flight): All external communications between the data server and clients must beencrypted Data confidentiality: All data must be protectively marked, stored and protected Compliance: The process to confirm systems compliance with the organization's security standards and policies  Maintainability Maintainability is the ability of any application to go through modifications and updates with a degree of ease. This is the degree of flexibility with which the application can be modified, whether for bug fixes or to update functionality. These changes may impact any of the components, services, functionality, or interfaces in the application landscape while modifying to fix errors, or to meet changing business requirements. This is also a degree of time it takes to restore the system to its normal state following a failure or fault. Improving maintainability can improve the availability and reduce the run-time defects. Application’s maintainability is dependent on the overall quality attributes. It is critical as a large chunk of the IT budget is spent on maintenance of systems. The more maintainable a system is the lower the total cost of ownership. The following attributes are: Conformance to design standards, coding standards, best practices, reference architectures, and frameworks. Flexibility: The degree to which the system is intended to support change Release support: The way in which the system supports the introduction of initial release, phased rollouts and future releases Manageability Manageability is the ease with which the administrators can manage the application, through useful instrumentation exposed for monitoring. It is the ability of the system or the group of the system to provide key information to the operations and support team to be able to debug, analyze and understand the root cause of failures. It deals with compliance/governance with the domain frameworks and polices. The key is to design the application that is easy to manage, by exposing useful instrumentation for monitoring systems and for understanding the cause of failures. The following attributes are: System must maintain total traceability of transactions Businessobjectsand database fields are part of auditing User and transactional timestamps. File characteristics include size before, size after and structure Getting events and alerts as thresholds (for example, memory, storage, processor) are breached Remotely manage applications and create new virtual instances at the click of a button Rich graphical dashboard for all key applications metrics and KPI Reliability Reliability is the ability of the application to maintain its integrity and veracity over a time span and also in the event of faults or exceptions. It is measured as the probability that the software will not fail and that it will continue functioning for a defined time interval. It alsospecifies the ability of the system to maintain its performance over a time span. Unreliable software is prone to failures anda few processes may be more sensitive to failure than others, because such processes may not be able to recover from a fault or exceptions. The following attributes are: The characteristic of a system to perform its functions under stated conditions for a specificperiod of time. Mean Time To Recovery: Time is available to get the system back up online. Mean Time Between Failures – Acceptable threshold for downtime Data integrity is also known as referential integrity in database tables and interfaces Application Integrity and Information Integrity: during transactions Fault trapping (I/O): Handling failures and recovery Extensibility Extensibility is the ability of a system to cater to future changes through flexible architecture, design or implementation. Extensible applications have excellent endurance, which prevents the expensive processes of procuring large inflexible applications and retiring them due to changes in business needs. Extensibility enables organizations to take advantage of opportunities and respond to risks. While there is a significant difference extensibility is often tangled with modifiability quality. Modifiability means that is possible to change the software whereas extensibility means that change has been planned and will be effortless. Adaptability is at times erroneously leveraged with extensibility. However, adaptability deals with how the user interactions with the system are managed and governed. Extensibilityallows a system, people, technology, information, and processes all working together to achieve following objectives: The following attributes are: Handle new information types Manage new or changed business entities Consume or provide new feeds Recovery In the event of a natural calamity for example, flood or hurricane, the entire facility where the application is hosted may become inoperable or inaccessible. Business-critical applications should have a strategy to recover from such disasters within a reasonable amount of time frame. The solution implementing various processes must be integrated with the existing enterprise disaster recovery plan. The processes must be analysed to understand the criticality of each process to the business, the impact of loss to the business in case of non-availability of the process. Based on this analysis, appropriate disaster procedures must be developed, and plans should be outlined. As part of disaster recovery, electronic backups of data and procedures must be maintained at the recovery location and be retrievable within the appropriate time frames for system function restoration. In the case of high criticality, real-time mirroring to a mirror site should be deployed. The following attributes are: Recoveryprocess: Recovery Time Objectives(RTO) / Recovery Point Objectives(RPO) Restore time: Time required switching to the secondary site when the primary fails RPO/Backup time: Time it takes to back your data Backup frequencies: Frequency of backing-up the transaction data, configuration data and code Interoperability Interoperability is the ability to exchange information and communicate with internal and external applications and systems. Interoperable systems make it easier to exchange information both internally and externally. The data formats, transport protocols and interfaces are the key attributes for architecting interoperable systems. Standardization of data formats, transport protocols and interfaces are the key aspect to be considered when architecting interoperable system. Interoperability is achieved through: Publishing and describing interfaces Describing the syntax used to communicate Describing the semantics of information it produces and consumes Leveraging open standards to communicate with external systems Loosely coupled with external systems The following attributes are: Compatibility with shared applications: Other system it needs to integrate Compatibility with 3rd party applications: Other systems it has to live with amicably Compatibility with various OS: Different OS compatibility Compatibility on different platforms: Hardware platforms it needs to work on Usability Usability measures characteristics such as consistency and aesthetics in the user interface. Consistency is the constant use of mechanisms employed in the user interface while Aesthetics refers to the artistic, visual quality of the user interface. It is the ease at which the users operate the system and make productive use of it. Usability is discussed with relation to the system interfaces, but it can just as well be applied to any tool, device, or rich system. This addresses the factors that establish the ability of the software to be understood, used, and learned by its intended users. The application interfaces must be designed with end users in mind so that they are intuitive to use, are localized, provide access for differently abled users, and provide an excellent overall user experience. The following attributes are: Look and feel standards: Layout and flow, screen element density, keyboard shortcuts, UI metaphors, colors. Localization/Internationalization requirements: Keyboards, paper sizes, languages, spellings, and so on Summary It explains he introduction of NFRs and why NFRs are a critical for building software systems. The article also explained various KPI for each of the key of NFRs i.e. scalability, availability, reliability and do on.  Resources for Article: Further resources on this subject: Software Documentation with Trac [article] The Software Task Management Tool - Rake [article] Installing Software and Updates [article]
Read more
  • 0
  • 0
  • 7250

article-image-monitoring-logging-and-troubleshooting
Packt
20 Jun 2017
6 min read
Save for later

Monitoring, Logging, and Troubleshooting

Packt
20 Jun 2017
6 min read
In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster. (For more resources related to this topic, see here.) Monitoring Kubernetes with Heapster Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following: Log InfluxDB Google Cloud monitoring Google Cloud logging Hawkular-Metrics(metrics only) OpenTSDB Monasca (metrics only) Kafka (metrics only) Riemann (metrics only) Elasticsearch You can use multiple backends by specifying sinks on the command-line: --sink=log --sink=influxdb:http://monitoring-influxdb:80/ cAdvisor cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization. The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet. Here is what it looks same as shown following: InfluxDB backend InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering. The storageschema The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information: pod_id – Unique ID of a pod pod_name – User-provided name of a pod pod_namespace – The namespace of a pod container_base_image – Base image for the container container_name – User-provided name of the container or full cgroup name for system containers host_id – Cloud-provider-specified or user-specified Identifier of a node hostname – Hostname where the container ran labels – Comma-separated list of user-provided labels; format is key:value’ namespace_id – UID of the namespace of a pod resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage Here are all the metrics grouped by category. As you can see, it is quite extensive. CPU cpu/limit – CPU hard limit in millicores cpu/node_capacity – CPU capacity of a node cpu/node_allocatable – CPU allocatable of a node cpu/node_reservation – Share of CPU that is reserved on the node allocatable cpu/node_utilization – CPU utilization as a share of node allocatable cpu/request – CPU request (the guaranteed amount of resources) in millicores cpu/usage – Cumulative CPU usage on all cores cpu/usage_rate – CPU usage on all cores in millicores File system filesystem/usage – Total number of bytes consumed on a filesystem filesystem/limit – The total size of the filesystem in bytes filesystem/available – The number of available bytes remaining in the filesystem Memory memory/limit – Memory hard limit in bytes memory/major_page_faults – Number of major page faults memory/major_page_faults_rate – Number of major page faults per second memory/node_capacity – Memory capacity of a node memory/node_allocatable – Memory allocatable of a node memory/node_reservation – Share of memory that is reserved on the node allocatable memory/node_utilization – Memory utilization as a share of memory allocatable memory/page_faults – Number of page faults memory/page_faults_rate – Number of page faults per second memory/request – Memory request (the guaranteed amount of resources) in bytes memory/usage – Total memory usage memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel Network network/rx – Cumulative number of bytes received over the network network/rx_errors – Cumulative number of errors while receiving over the network network/rx_errors_rate – Number of errors per second while receiving over the network network/rx_rate – Number of bytes received over the network per second network/tx – Cumulative number of bytes sent over the network network/tx_errors – Cumulative number of errors while sending over the network network/tx_errors_rate – Number of errors while sending over the network network/tx_rate – Number of bytes sent over the network per second Uptime uptime – Number of milliseconds since the container was started You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: http 32699/TCP NodePort: api 30020/TCP Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default: Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language. Grafana visualization Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: <unset> 30763/TCP Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend: Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs. Summary In this article we have learned how to do monitoring Kubernetes with Heapster.  Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] Building A Recommendation System with Azure [article] Setting up a Kubernetes Cluster [article]
Read more
  • 0
  • 0
  • 21389

article-image-manipulating-functions-functional-programming
Packt
20 Jun 2017
6 min read
Save for later

Manipulating functions in functional programming

Packt
20 Jun 2017
6 min read
In this article by Wisnu Anggoro, author of the book Learning C++ Functional Programming, you will learn to apply functional programming techniques to C++ to build highly modular, testable, and reusable code. In this article, you will learn the following topics: Applying a first-class function in all functions Passing a function as other functions parameter Assigning a function to a variable Storing a function in the container (For more resources related to this topic, see here.) Applying a first-class function in all functions There's nothing special with the first-class function since it's a normal class. We can treat the first-class function like any other data type. However, in the language that supports the first-class function, we can perform the following tasks without invoking the compiler recursively: Passing a function as other function parameters Assigning functions to a variable Storing functions in collections Fortunately, C++ can be used to solve the preceding tasks. We will discuss it in depth in the following topics. Passing a function as other functions parameter Let's start passing a function as the function parameter. We will choose one of four functions and invoke the function from its main function. The code will look as follows: /* first-class-1.cpp */ #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; int addition(int x, int y) { return x + y; } int subtraction(int x, int y) { return x - y; } int multiplication(int x, int y) { return x * y; } int division(int x, int y) { return x / y; } void PassingFunc(FuncType fn, int x, int y) { cout << "Result = " << fn(x, y) << endl; } int main() { int i, a, b; FuncType func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; } return 0; } From the preceding code, we can see that we have four functions, and we want the user to choose one from them, and then run it. In the switch statement, we will invoke one of the four functions based on the choice of the user. We will pass the selected function to PassingFunc(), as we can see in the following code snippet: case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; The result we get on the screen should look like the following screenshot: The preceding screenshot shows that we selected the Subtraction mode and gave 8 to a and 3 to b. As we expected, the code gives us 5 as a result. Assigning a function to variable We can also assign a function to the variable so that we can call the function by calling the variable. We will refactor first-class-1.cpp, and it will be as follows: /* first-class-2.cpp */ #include <functional> #include <iostream> using namespace std; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; } cout << "Result = " << func(a, b) << endl; return 0; } We will now assign the four functions based on the user choice. We will store the selected function in func variable inside the switch statement, as follows: case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; After the func variable is assigned with the user's choice, the code will just call the variable like calling the function as follows: cout << "Result = " << func(a, b) << endl; Moreover, we will obtain the same output on the console if we run the code. Storing a function in the container Now, let's save the function to the container. Here, we will use the vector as the container. The code is as follows: /* first-class-3.cpp */ #include <vector> #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { vector<FuncType> functions; functions.push_back(addition); functions.push_back(subtraction); functions.push_back(multiplication); functions.push_back(division); int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; cout << "Result = " << functions.at(i - 1)(a, b) << endl; return 0; } From the preceding code, we can see that we created a new vector named functions, then stored four different functions to it. Same with our two previous code samples, we ask the user to select the mode as well. However, now the code becomes simpler since we don't need to add the switch statement as we can select the function directly by selecting the vector index, as we can see in the following line of code: cout << "Result = " << functions.at(i - 1)(a, b) << endl; However, since the vector is a zero-based index, we have to adjust the index with the menu choice. The result will be the same with our two previous code samples. Summary In this article, we discussed that there are some techniques to manipulate a function to produce the various purpose on it. Since we can implement the first-class function in C++ language, we can pass a function as other functions parameter. We can treat a function as a data object so we can assign it to a variable and store it in the container. Resources for Article: Further resources on this subject: Introduction to the Functional Programming [article] Functional Programming in C# [article] Putting the Function in Functional Programming [article]
Read more
  • 0
  • 0
  • 18827

article-image-understanding-basics-gulp
Packt
19 Jun 2017
15 min read
Save for later

Understanding the Basics of Gulp

Packt
19 Jun 2017
15 min read
In this article written by Travis Maynard, author of the book Getting Started with Gulp - Second Edition, we will take a look at the basics of gulp and how it works. Understanding some of the basic principles and philosophies behind the tool, it's plugin system will assist you as you begin writing your own gulpfiles. We'll start by taking a look at the engine behind gulp and then follow up by breaking down the inner workings of gulp itself. By the end of this article, you will be prepared to begin writing your own gulpfile. (For more resources related to this topic, see here.) Installing node.js and npm As you learned in the introduction, node.js and npm are the engines that work behind the scenes that allow us to operate gulp and keep track of any plugins we decide to use. Downloading and installing node.js For Mac and Windows, the installation is quite simple. All you need to do is navigate over to http://nodejs.org and click on the big green install button. Once the installer has finished downloading, run the application and it will install both node.js and npm. For Linux, there are a couple more steps, but don't worry; with your newly acquired command-line skills, it should be relatively simple. To install node.js and npm on Linux, you'll need to run the following three commands in Terminal: sudo add-apt-repository ppa:chris-lea/node.js sudo apt-get update sudo apt-get install nodejs The details of these commands are outside the scope of this book, but just for reference, they add a repository to the list of available packages, update the total list of packages, and then install the application from the repository we added. Verify the installation To confirm that our installation was successful, try the following command in your command line: node -v If node.js is successfully installed, node -v will output a version number on the next line of your command line. Now, let's do the same with npm: npm -v Like before, if your installation was successful, npm -v should output the version number of npm on the next line. The versions displayed in this screenshot reflect the latest Long Term Support (LTS) release currently available as of this writing. This may differ from the version that you have installed depending on when you're reading this. It's always suggested that you use the latest LTS release when possible. The -v  command is a common flag used by most command-line applications to quickly display their version number. This is very useful to debug version issues while using command-line applications. Creating a package.json file Having npm in our workflow will make installing packages incredibly easy; however, we should look ahead and establish a way to keep track of all the packages (or dependencies) that we use in our projects. Keeping track of dependencies is very important to keep your workflow consistent across development environments. Node.js uses a file named package.json to store information about your project, and npm uses this same file to manage all of the package dependencies your project requires to run properly. In any project using gulp, it is always a great practice to create this file ahead of time so that you can easily populate your dependency list as you are installing packages or plugins. To create the package.json file, we will need to run npm's built in init action using the following command: npm init Now, using the preceding command, the terminal will show the following output: Your command line will prompt you several times asking for basic information about the project, such as the project name, author, and the version number. You can accept the defaults for these fields by simply pressing the Enter key at each prompt. Most of this information is used primarily on the npm website if a developer decides to publish a node.js package. For our purposes, we will just use it to initialize the file so that we can properly add our dependencies as we move forward. The screenshot for the preceding command is as follows: Installing gulp With npm installed and our package.json file created, we are now ready to begin installing node.js packages. The first and most important package we will install is none other than gulp itself. Locating gulp Locating and gathering information about node.js packages is very simple, thanks to the npm registry. The npm registry is a companion website that keeps track of all the published node.js modules, including gulp and gulp plugins. You can find this registry at http://npmjs.org. Take a moment to visit the npm registry and do a quick search for gulp. The listing page for each node.js module will give you detailed information on each project, including the author, version number, and dependencies. Additionally, it also features a small snippet of command-line code that you can use to install the package along with readme information that will outline basic usage of the package and other useful information. Installing gulp locally Before we install gulp, make sure you are in your project's root directory, gulp-book, using the cd and ls commands you learned earlier. If you ever need to brush up on any of the standard commands, feel free to take a moment to step back and review as we progress through the book. To install packages with npm, we will follow a similar pattern to the ones we've used previously. Since we will be covering both versions 3.x and 4.x in this book, we'll demonstrate installing both: For installing gulp 3.x, you can use the following: npm install --save-dev gulp For installing gulp 4.x, you can use the following: npm install --save-dev gulpjs/gulp#4.0 This command is quite different from the 3.x command because this command is installing the latest development release directly from GitHub. Since the 4.x version is still being actively developed, this is the only way to install it at the time of writing this book. Once released, you will be able to run the previous command to without installing from GitHub. Upon executing the command, it will result in output similar to the following: To break this down, let's examine each piece of this command to better understand how npm works: npm: This is the application we are running install: This is the action that we want the program to run. In this case, we are instructing npm to install something in our local folder --save-dev: This is a flag that tells npm to add this module to the dev dependencies list in our package.json file gulp: This is the package we would like to install Additionally, npm has a –-save flag that saves the module to the list of dependencies instead of devDependencies. These dependency lists are used to separate the modules that a project depends on to run, and the modules a project depends on when in development. Since we are using gulp to assist us in development, we will always use the --save-dev flag throughout the book. So, this command will use npm to contact the npm registry, and it will install gulp to our local gulp-book directory. After using this command, you will note that a new folder has been created that is named node_modules. It is where node.js and npm store all of the installed packages and dependencies of your project. Take a look at the following screenshot: Installing gulp-cli globally For many of the packages that we install, this will be all that is needed. With gulp, we must install a companion module gulp-cli globally so that we can use the gulp command from anywhere in our filesystem. To install gulp-cli globally, use the following command: npm install -g gulp-cli In this command, not much has changed compared to the original command where we installed the gulp package locally. We've only added a -g flag to the command, which instructs npm to install the package globally. On Windows, your console window should be opened under an administrator account in order to install an npm package globally. At first, this can be a little confusing, and for many packages it won't apply. Similar build systems actually separate these usages into two different packages that must be installed separately; once that is installed globally for command-line use and another installed locally in your project. Gulp was created so that both of these usages could be combined into a single package, and, based on where you install it, it could operate in different ways. Anatomy of a gulpfile Before we can begin writing tasks, we should take a look at the anatomy and structure of a gulpfile. Examining the code of a gulpfile will allow us to better understand what is happening as we run our tasks. Gulp started with four main methods:.task(), .src(), .watch(), and .dest(). The release of version 4.x introduced additional methods such as: .series() and .parallel(). In addition to the gulp API methods, each task will also make use of the node.js .pipe() method. This small list of methods is all that is needed to understand how to begin writing basic tasks. They each represent a specific purpose and will act as the building blocks of our gulpfile. The task() method The .task() method is the basic wrapper for which we create our tasks. Its syntax is .task(string, function). It takes two arguments—string value representing the name of the task and a function that will contain the code you wish to execute upon running that task. The src() method The .src() method is our input or how we gain access to the source files that we plan on modifying. It accepts either a single glob string or an array of glob strings as an argument. Globs are a pattern that we can use to make our paths more dynamic. When using globs, we can match an entire set of files with a single string using wildcard characters as opposed to listing them all separately. The syntax is for this method is .src(string || array).  The watch() method The .watch() method is used to specifically look for changes in our files. This will allow us to keep gulp running as we code so that we don't need to rerun gulp any time we need to process our tasks. This syntax is different between the 3.x and 4.x version. For version 3.x the syntax is—.watch(string || array, array) with the first argument being our paths/globs to watch and the second argument being the array of task names that need to be run when those files change. For version 4.x the syntax has changed a bit to allow for two new methods that provide more explicit control of the order in which tasks are executed. When using 4.x instead of passing in an array as the second argument, we will use either the .series() or .parallel() method like so—.watch(string || array, gulp.series() || gulp.parallel()). The dest() method The dest() method is used to set the output destination of your processed file. Most often, this will be used to output our data into a build or dist folder that will be either shared as a library or accessed by your application. The syntax for this method is .dest(string). The pipe() method The .pipe() method will allow us to pipe together smaller single-purpose plugins or applications into a pipechain. This is what gives us full control of the order in which we would need to process our files. The syntax for this method is .pipe(function). The parallel() and series() methods The parallel and series methods were added in version 4.x as a way to easily control whether your tasks are run together all at once or in a sequence one after the other. This is important if one of your tasks requires that other tasks complete before it can be ran successfully. When using these methods the arguments will be the string names of your tasks separated by a comma. The syntax for these methods is—.series(tasks) and .parallel(tasks); Understanding these methods will take you far, as these are the core elements of building your tasks. Next, we will need to put these methods together and explain how they all interact with one another to create a gulp task. Including modules/plugins When writing a gulpfile, you will always start by including the modules or plugins you are going to use in your tasks. These can be both gulp plugins or node.js modules, based on what your needs are. Gulp plugins are small node.js applications built for use inside of gulp to provide a single-purpose action that can be chained together to create complex operations for your data. Node.js modules serve a broader purpose and can be used with gulp or independently. Next, we can open our gulpfile.js file and add the following code: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); The gulpfile.js file will look as shown in the following screenshot: In this code, we have included gulp and two gulp plugins: gulp-concat and gulp-uglify. As you can now see, including a plugin into your gulpfile is quite easy. After we install each module or plugin using npm, you simply use node.js' require() function and pass it in the name of the module. You then assign it to a new variable so that you can use it throughout your gulpfile. This is node.js' way of handling modularity, and because a gulpfile is essentially a small node.js application, it adopts this practice as well. Writing a task All tasks in gulp share a common structure. Having reviewed the five methods at the beginning of this section, you will already be familiar with most of it. Some tasks might end up being larger than others, but they still follow the same pattern. To better illustrate how they work, let's examine a bare skeleton of a task. This skeleton is the basic bone structure of each task we will be creating. Studying this structure will make it incredibly simple to understand how parts of gulp work together to create a task. An example of a sample task is as follows: gulp.task(name, function() { return gulp.src(path) .pipe(plugin) .pipe(plugin) .pipe(gulp.dest(path)); }); In the first line, we use the new gulp variable that we created a moment ago and access the .task() method. This creates a new task in our gulpfile. As you learned earlier, the task method accepts two arguments: a task name as a string and a callback function that will contain the actions we wish to run when this task is executed. Inside the callback function, we reference the gulp variable once more and then use the .src() method to provide the input to our task. As you learned earlier, the source method accepts a path or an array of paths to the files that we wish to process. Next, we have a series of three .pipe() methods. In each of these pipe methods, we will specify which plugin we would like to use. This grouping of pipes is what we call our pipechain. The data that we have provided gulp with in our source method will flow through our pipechain to be modified by each piped plugin that it passes through. The order of the pipe methods is entirely up to you. This gives you a great deal of control in how and when your data is modified. You may have noticed that the final pipe is a bit different. At the end of our pipechain, we have to tell gulp to move our modified file somewhere. This is where the .dest() method comes into play. As we mentioned earlier, the destination method accepts a path that sets the destination of the processed file as it reaches the end of our pipechain. If .src() is our input, then .dest() is our output. Reflection To wrap up, take a moment to look at a finished gulpfile and reflect on the information that we just covered. This is the completed gulpfile that we will be creating from scratch, so don't worry if you still feel lost. This is just an opportunity to recognize the patterns and syntaxes that we have been studying so far. We will begin creating this file step by step: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); // Process Styles gulp.task('styles', function() {     return gulp.src('css/*.css')         .pipe(concat('all.css'))         .pipe(gulp.dest('dist/')); }); // Process Scripts gulp.task('scripts', function() {     return gulp.src('js/*.js')         .pipe(concat('all.js'))         .pipe(uglify())         .pipe(gulp.dest('dist/')); }); // Watch Files For Changes gulp.task('watch', function() {     gulp.watch('css/*.css', 'styles');     gulp.watch('js/*.js', 'scripts'); }); // Default Task gulp.task('default', gulp.parallel('styles', 'scripts', 'watch')); The gulpfile.js file will look as follows: Summary In this article, you installed node.js and learned the basics of how to use npm and understood how and why to install gulp both locally and globally. We also covered some of the core differences between the 3.x and 4.x versions of gulp and how they will affect your gulpfiles as we move forward. To wrap up the article, we took a small glimpse into the anatomy of a gulpfile to prepare us for writing our own gulpfiles from scratch. Resources for Article: Further resources on this subject: Performing Task with Gulp [article] Making a Web Server in Node.js [article] Developing Node.js Web Applications [article]
Read more
  • 0
  • 0
  • 17807
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-overview-important-concepts-microsoft-dynamics-nav-2016
Packt
19 Jun 2017
15 min read
Save for later

Overview of Important Concepts of Microsoft Dynamics NAV 2016

Packt
19 Jun 2017
15 min read
In this article by Rabindra Sah, author of the book, Mastering Microsoft Dynamics NAV 2016, we will cover the important concepts of Microsoft Dynamics NAV 2016.   (For more resources related to this topic, see here.) Version control Version control systems are the third-party system that provides the service that tracks changes in the file and folders of the system. In this section, we will be discussing two popular version control systems for Microsoft Dynamics NAV. Let's take a look at the web services architecture in Dynamics NAV. Microsoft Dynamics NAV 2016 uses two types of web services: SOAP web services and OData web services:   ODATA SOAP Page Yes Yes Query Yes No Codeunit No Yes The main difference between the two in Microsoft Dynamics NAV is that with SOAP web services, you can publish and reuse business logic, and with OData web services, you can change data to external applications. In a dataset, you can make certain changes in order to improve the performance of the report. This is not always applicable for all reports; it depends on the nature of the problem, and you should spend time analyzing the problem before fixing it. The following are some of the basic considerations that you should keep in mind when dealing with datasets in NAV reports:  Try to avoid the creation of dataset lines if possible; create variables instead Try to reduce the number of rows and columns Apply filters to the request page For slow reports with higher runtime, use the job queue to generate the report in the server Use text constants for the caption, if needed Try to avoid Include Caption for the column with no need for captions  Technical upgrade Technical upgrade is the least used upgrade process, which is when you are making one version upgrade at a time, that is, from Version 2009 to Version 2013 or Version 2013 to Version 2015. So, when you are planning to jump multiple versions at the same time, then technical upgrade might not be the perfect option to choose. It can be efficient when there are minimal changes in the source database objects, that is, fewer customizations. It can also be considered an efficient choice when the business requirement from the product is still the same or has very less significant changes. Upgrading estimates In this section, we are going to look at the core components that are responsible for the estimates of the upgrade process. The components that are to be considered while estimating for the upgrade process are as follows:   Code upgrade Object transformation Data upgrade Testing and implementation   Code upgrade The best method to estimate the code upgrade is to use a file compare tool. It helps us compare the file, folder, version control, conflicts and resolution, automatic intelligent merging, in-place editing of files, track changes, and code analysis. You can also design your own compare tool if you want, for example, take two version of the same object, take two versions of Customer table. Open them in Notepad and check line by line whether there is any difference in the line, and then get that line value and show it as a log. You can achieve this via C# or any programming language. This should run for each object in two versions of the NAV system and provide you with the statistics of the amount of changes: This can be really handy when it comes to the point of estimation for the code changes. You can also do it manually if the number of objects is less. It is recommended that one must use the Microsoft Dynamics Sure Step methodology while carrying out any upgrade projects. Dynamics Sure step is a full life cycle methodology tool which is designed to provide a discipline and best practice to upgrade, migrate, configure and deploy Microsoft Dynamics NAV Solution. Object transformation We must take a close look in the case of some objects that are not directly upgraded. As, for example, if your source database reports are in the classic version or early RTC version, it might not be feasible to transform them into the latest reports because of the huge technological gap between the reports. In these cases, you must be very careful while estimating for these upgrades. For example, TransHeader and TransFooter and other categorizations that are present in classic reports are hard to map directly into Dynamics NAV 2016 reports. We might have to develop our own logic to achieve these grouping values, which might take some additional time. So, always treat this section as a customization instead of upgrade. Mostly, Microsoft partner vendors keep this section separate and, in most of the cases, separate resources are assigned to do that for parallel work environments. Reports can also have word layouts that should also be considered during the estimates. Data upgrade We perform a number of distinct steps while upgrading data. You must consider the time for each section in the data upgrade process in order to correctly estimate the time for it: The first thing that we do is a trial data upgrade process. This allows us to analyze different aspects, such as, to see how long it takes; if data upgrade process works or not; and will it allow us to test the results of this trial data upgrade. Once we are done with the trial data upgrade, we might need to do it a number of times before it works. Then, we can do a preproduction data upgrade because since the moment we started our analyses and development, the production data might have changed, so we also need to do a preproduction run to also have a closer estimate of the time windows that we have available when we are going to do the real implementation. Acceptance testing is also a very important phase. Once you have done the data upgrade, you need the end users or key users to confirm that our data has been converted correctly. And then you are ready to perform the live data upgrade. So all of these different phases in the data upgrade will also require some time. The amount of time will also depend on the size of the database or the version that you are starting from. So, this gives you an overview of the different pillars that are important to estimate how much time it might take to prepare and analyze the updates project. Software as a Service Software as a Service is a cloud services delivery model, which offers an on-demand online software subscription. The latest SaaS release of Microsoft is Dynamics 365 (previously known as Project Madeira). The following image illustrates the SAAS taxonomy. Here you can clearly understand different services like Sales force, NetSuite, QuickBooks which are distributed as SAAS: Software as a Service taxonomy Understanding the PowerShell cmdlets We can categorize the PowerShell commands into five major categories of use:   Commands for Server administrators Command for implementers for company management Commands for administrators for upgrades Commands for administrator for security Commands for developers   Commands for server administrators The first category contains commands that can be used for administrative operations like create, save, remove, get, import, export, set, and the like as given in the following table: Dismount-NAVTenant New-NAVServerConfiguration Export-NAVApplication New-NAVServerInstance Get-NAVApplication New-NAVWebServerInstance Get-NAVServerConfiguration Remove-NAVApplication Get-NAVServerInstance Remove-NAVServerInstance Get-NAVServerSession Remove-NAVServerSession Get-NAVTenant Save-NAVTenantConfiguration Get-NAVWebServerInstance Set-NAVServerConfiguration Export-NAVServerLicenseInformation Set-NAVServerInstance Import-NAVServerLicense Set-NAVWebServerInstanceConfiguration Mount-NAVApplication Sync-NAVTenant Mount-NAVTenant   We can set up web server instances, change configurations, and create a multitenant environment; we can only use PowerShell for a multitenant environment. Command for implementers for company management The second category of commands can be used by implementers, in particular, for operations related to installation and configuration of the system. The following are a few examples of this category of commands: Copy-NAVCompany Get-NAVCompany New-NAVCompany Remove-NAVCompany Rename-NAVCompany Commands for administrators for upgrades The third category is a special category for administrators, which is related to upgradation of operations. Get-NAVDataUpgrade Resume-NAVDataUpgrade Start-NAVDataUpgrade Stop-NAVDataUpgrade The third category of commands can be useful along with the upgrade toolkit. Commands for administrator for security This is one of the most important categories, which is related to the backend of the system. The commands in this category grant the power of accessibility and permission to the administrators. I strongly recommend these make-life-easy commands if you are working on security operations. Commands in this category include the following: Get-NAVServerUser Remove-NAVServerPermission Get-NAVServerUserPermissionSet Remove-NAVServerPermissionSet New-NAVServerPermission Remove-NAVServerUser New-NAVServerPermissionSet Remove-NAVServerUserPermissionSet New-NAVServerUser Set-NAVServerPermission New-NAVServerUserPermissionSet Set-NAVServerPermissionSet   Set-NAVServerUser These commands can be used basically to add users, and for permission set. Commands for developers The last, but not the least, treasure of commands is dedicated to developers, and is one of my most-used commands. It covers a wide range of commands, and should be included in your daily work life. This set of commands includes the following: Get-NAVWebService Join-NAVApplicationObjectFile Invoke-NAVCodeunit Join-NAVApplicationObjectLanguageFile New-NAVWebService Merge-NAVApplicationObject Remove-NAVWebService Remove-NAVApplicationObjectLanguage Compare-NAVApplicationObject Set-NAVApplicationObjectProperty Export-NAVAppliactionObjectLanguage Split-NAVApplicationApplicationObjectFile Get-NAVApplicationObjectProperty Split-NAVApplicationObjectLanguageFile Import-NAVApplicationObjectLanguage Test-NAVApplicationObjectLanguage   Update-NAVApplicationObject Microsoft Dynamics NAV 2016 Posting preview In Microsoft Dynamics NAV 2016, you can review the entries to be created before you post a document or journal. This is made possible by the introduction of a new feature called Preview Posting, which enables you to preview the impact of a posting against each ledger associated with a document. In every document and journal that can be posted, you can click on Preview Posting to review the different types of entries that will be created when you post the document or journal. Workflow Workflow enables you to understand modern life business processes along with the best practices or industry standard practice. For example, ensuring that a customer credit limit has been independently verified and the requirement of two approvers for a payment process has been met. Workflow has these three main capabilities:   Approvals Notifications Automation   Workflow basically has three components, that is, Event, Condition, and Response. Event defines the name of any event in the system, whereas On Condition specifies the event, and the Then response is the action that needs to be taken on the basis of that condition. This is shown in the following screenshot: Exception handling Exception handling is a new concept in Microsoft Dynamics NAV. It was imported from .NET, and is now gaining popularity among the C/AL programmers because of its effective usage. Like C#, for exception handling, we use the Try function. The Try functions are the new additions to the function library, which enable you to handle errors that occur in the application during runtime. Here we are not dealing with compile time issues. For example, the message Error Returned: Divisible by Zero Error. is always a critical error, and should be handled in order to be avoided. This also stops the system from entering into the unsafe state. Like C sharp and other rich programming languages, the Try functions in C/AL provide easy-to-understand error messages, which can also be dynamic and directly generated by the system. This feature helps us preplan those errors and present better descriptive errors to the users. You can use the Try functions to catch errors/exceptions that are thrown by Microsoft Dynamics NAV or exceptions that are thrown during the .NET Framework interoperability operations. The Try function is in many ways similar to the conditional Codeunit.Run function except for following points: The database records that are changed because of the Try function cannot be rolled back The Try function calls do not need to be committed to the database Visual Basic programming Visual Basic (VB)is an event-driven programming language. It is also an Integrated development environment (IDE). If you are familiar with BASIC programming language, then it will be easy to understand Visual Basic, since it is derived from BASIC. I will try to provide the basics about this language here, since it is the least discussed topic in the NAV community, but very essential to be understood by all report designers and developers. Here we do not need to understand each and every detail of the VB programming language, but understanding the syntax and structure will help us understand the code that we are going to use in the RDLC report. An example code of VB can be written as follows: Public Function BlankZero(ByVal Value As Decimal) if Value = 0 then Return "" End if Return Value End Function End Sub This preceding function, BlankZero, basically just returns the value of the parameter. This is simplest function which can be found in the code section of the RDLC report. Unlike C/AL, we do not need to end the code line with a colon (;): Writing your own Test unit Writing your own Test unit is very important, not just to test your code but also to give you an eagle's-eye view on how your code is actually interacting with the system. It gives your coding a meaning, and allows others to understand and relate to your development. Writing a unit test involves basically four steps as shown in the following diagram: Certificates A certificate is nothing but a token that binds an identity to a cryptographic key. Microsoft Management Console (MMC) is a presentation service for management applications in the Microsoft Windows environment. It is a part of the independent software vendor (ISV) extensible service, that is, it provides a common integrated environment for snap-ins, provided by Microsoft and third-party software vendors. Certificate authority A certification authority (CA) is an entity that issues certificates. If all certificates have a common issuer, then the issuer's public key can be distributed out of band: In the preceding diagram, the certificate server is the third party, which has a secure relationship with both the parties that want to communicate. CA is connected to both parties through a secure channel. User B sends a copy of his public key to the CA. Then the CA encrypts public key of User B using a different key. Two files are created because of this trigger: the first is an encrypted package, which is nothing but the certificate, and the second is the digital signature of the certificate server. The certificate server returns the certificate to user B. Now User A asks for certificate from User B. User B sends the copy of its public key to User A. This is again done using a secure communication channel. User A decrypts the certificate using the key obtained from the certificate server, and extracts public key of User B. User A also checks the digital signature of the certificate server to ensure that the certificate is authentic. Here, whatever data is encrypted using public key of User B, it can only be decrypted using the private key of User B, which is only present with User B and not with any of the intruders over the Internet. So only User B can decrypt and read the content send by User A. Once the keys are transferred, User A can communicate with User B. In case User B wants to send data to User A, then User B would need public key of User A, which will be again granted by CA. Certificates are issued to a principal. – The Issuance policy specifies the principals to which the CA will issue certificates. The certification authority does not need to be online to check the validity of the certificate. It can be present in a server of a locked room. It is only consulted when a principal needs a certificate. Certificates are a way of associating an identity with a public key and distinguished name. Authentication policy for CA The authentication policy defines the way principals prove their identities. Each CA has its own requirements constrained by contractual requirements such as with Primary Certification Authority (PCA): PCA issues certificates to CA CA issues certificates to individuals and organizations All rely on non-electronic proofs of identity, such as biometrics (fingerprints), documents (drivers license or passport), or personal knowledge. A specific authentication policy can be determined by checking the policy of the CA that signed the certificate. Kinds of certificates There are at least four kinds of certificates, which are as follows: Site certificates (for example, www.msdn.microsoft.com). Personal certificates (used if the server wants to authenticate the client). You can install a personal certificate in your browser. Software vendor certificates (used when software is installed). Often, when you run a program, a dialog box appears warning that The publisher could not be verified. Are you sure you want to run this software? This is caused either because the software does not have a software vendor certificate, or because you do not trust the CA who signed the software vendor certificate. Anonymous certificates, (used by a whistle blower to indicate that the same person sent a sequence of messages, but doesn't know who that person is). Other types of certificates Certificates can also be based on a principal's association with an organization (such as Microsoft (MSDN)), where the principal lives, or the role played in an organization (such as the comptroller). Summary In this article we covered the important concepts in Dynamics Nav 2016 such as version control, dataset considerations, technical upgrade, Software as a Service, certificates, and so on. Resources for Article: Further resources on this subject: Introduction to Microsoft Dynamics NAV 2016 [article] Customization in Microsoft Dynamics CRM [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 1999

article-image-basics-python-absolute-beginners
Packt
19 Jun 2017
5 min read
Save for later

Basics of Python for Absolute Beginners

Packt
19 Jun 2017
5 min read
In this article by Bhaskar Das and Mohit Raj, authors of the book, Learn Python in 7 days, we will learn basics of Python. The Python language had a humble beginning in the late 1980s when a Dutchman, Guido Von Rossum, started working on a fun project that would be a successor to the ABC language with better exception handling and capability to interface with OS Amoeba at Centrum Wiskunde and Informatica. It first appeared in 1991. Python 2.0 was released in the year 2000 and Python 3.0 was released in the year 2008. The language was named Python after the famous British television comedy show Monty Python's Flying Circus, which was one of the favorite television programmes of Guido. Here, we will see why Python has suddenly influenced our lives, various applications that use Python, and Python's implementations. In this article, you will be learning the basic installation steps required to perform on different platforms (that is Windows, Linux and Mac), about environment variables, setting up environment variables, file formats, Python interactive shell, basic syntaxes, and, finally, printing out the formatted output. (For more resources related to this topic, see here.) Why Python? Now you might be suddenly bogged with the question, why Python? According to the Institute of Electrical and Electronics Engineers (IEEE) 2016 ranking, Python ranked third after C and Java. As per Indeed.com's data of 2016, Python job market search ranked fifth. Clearly, all the data points to the ever-rising demand in the job market for Python. It's a cool language if you want to learn it just for fun. Also, you will adore the language if you want to build your career around Python. At the school level, many schools have started including Python programming for kids. With new technologies taking the market by surprise, Python has been playing a dominant role. Whether it's cloud platform, mobile app development, BigData, IoT with Raspberry Pi, or the new Blockchain technology, Python is being seen as a niche language platform to develop and deliver scalable and robust applications. Some key features of the language are: Python programs can run on any platform, you can carry code created in a Windows machine and run it on Mac or Linux Python has a large inbuilt library with prebuilt and portable functionality, known as the standard library Python is an expressive language Python is free and open source Python code is about one third of the size of equivalent C++ and Java code. Python can be both dynamically and strongly typed In dynamically typed, the type of a variable is interpreted at runtime, which means that there is no need to define the type (int, float) of a variable in Python Python applications One of the most famous platform where Python is extensively used is YouTube. Other places where you will find Python being extensively used are special effects in Hollywood movies, drug evolution and discovery, traffic control systems, ERP systems, cloud hosting, e-commerce platform, CRM systems, and whichever field you can think of. Versions At the time of writing this book, the two main versions of the Python programming language available in the market were Python 2.x and Python 3.x. The stable releases at the time of writing this book were Python 2.7.13 and Python 3.6.0. Implementations of Python Major implementations include CPython, Jython, IronPython, MicroPython and PyPy. Installation Here, we will look forward to the installation of Python on three different OS platforms, namely Windows, Linux, and Mac OS. Let's begin with the Windows platform. Installation on Windows platform Python 2.x can be downloaded from https://www.python.org/downloads. The installer is simple and easy to install. Follow these steps to install the setup: Once you click on the setup installer, you will get a small window on your desktop screen as shown. Click onNext: Provide a suitable installation folder to install Python. If you don't provide the installation folder, then the installer will automatically create an installation folder for you as shown in the screenshot shown. Click on Next: After the completion of Step 2, you will get a window to customize Python as shown in the following screenshot. Note that theAdd python.exe to Path option has been markedx. Select this option to add it to system path variable. Click onNext: Finally, clickFinish to complete the installation: Summary So far, we did a walk through on the beginning and brief history of Python. We looked at the various implementations and flavors of Python. You also learned about installing on Windows OS. Hope this article has incited enough interest in Python and serves as your first step in the kingdom of Python, with enormous possibilities! Resources for Article: Further resources on this subject: Layout Management for Python GUI [article] Putting the Fun in Functional Python [article] Basics of Jupyter Notebook and Python [article]
Read more
  • 0
  • 0
  • 12415

article-image-lambdaarchitecture-pattern
Packt
19 Jun 2017
8 min read
Save for later

LambdaArchitecture Pattern

Packt
19 Jun 2017
8 min read
In this article by Tomcy John and Pankaj Misra, the authors of the book, Data Lake For Enterprises, we will learn about how the data in landscape of big data solutions can be made in near real time and certain practices that can be adopted for realizing Lambda Architecture in context of data lake. (For more resources related to this topic, see here.) The concept of a data lake in an enterprise was driven by certain challenges that Enterprises were facing with the way the data was handled, processed, and stored. Initially all the individual applications in the Enterprise, via a natural evolution cycle, started maintaining huge amounts of data into themselves with almost no reuse to other applications in the same enterprise. These created information silos across arious applications. As the next step of evolution, these individual applications started exposing this data across the organization as a data mart access layer over central data warehouse. While data mart solved one part of the problem, other problems still persisted. These problems were more about data governance, data ownership, data accessibility which were required to be resolved so as to have better availability of enterprise relevant data. This is where a need was felt to have data lakes, that could not only make such data available but also could store any form of data and process it so that data is analyzed and kept ready for consumption by consumer applications. In this article, we will look at some of the critical aspects of a data lake and understand why does it matter for an enterprise. If we need to define the term Data Lake, it can be defined as a vast repository of variety of enterprise wide raw information that can be acquired, processed, analyzed and delivered. The information thus handled could be any type of information ranging from structured, semi-structured data to completely unstructured data. Data Lake is expected to be able to derive Enterprise relevant meaning and insights from this information using various analysis and machine learning algorithms. Lambda architecture and data lake Lambda architecture as a pattern provides ways and means to perform highly scalable, performant, distributed computing on large sets of data and yet provide consistent (eventually) data with required processing both in batch as well as in near real time. Lambda architecture defines ways and means to enable scale out architecture across various data load profiles in an enterprise, with low latency expectations. The architecture pattern became significant with the emergence of big data and enterprise’s focus on real-time analytics and digital transformation. The pattern named Lambda (symbol λ) is indicative of a way by which data comes from two places (batch and speed - the curved parts of the lambda symbol) which then combines and served through the serving layer (the line merging from the curved part). Figure 01 : Lambda Symbol  The main layers constituting the Lambda layer are shown below: Figyure 02 : Components of Lambda Architecure In the above high level representation, data is fed to both the batch and speed layer. The batch layer keeps producing and re-computing views at every set batch interval. The speed layer also creates the relevant real-time/speed views. The serving layer orchestrates the query by querying both the batch and speed layer, merges it and sends the result back as results. A practical realization of such a data lake can be illustrated as shown below. The figure below shows multiple technologies used for such a realization, however once the data is acquired from multiple sources and queued in messaging layer for ingestion, the Lambda architecture pattern in form of ingestion layer, batch layer.and speed layer springs into action: Figure 03: Layers in Data Lake Data Acquisition Layer:In an organization, data exists in various forms which can be classified as structured data, semi-structured data, or as unstructured data.One of the key roles expected from the acquisition layer is to be able convert the data into messages that can be further processed in a data lake, hence the acquisition layer is expected to be flexible to accommodate variety of schema specifications at the same time must have a fast connect mechanism to seamlessly push all the translated data messages into the data lake. A typical flow can be represented as shown below. Figure 04: Data Acquisition Layer Messaging Layer: The messaging layer would form the Message Oriented Middleware (MOM) for the data lake architecture and hence would be the primary layer for decoupling the various layers with each other, but with guaranteed delivery of messages.The other aspect of a messaging layer is its ability to enqueue and dequeue messages, as in the case with most of the messaging frameworks. Most of the messaging frameworks provide enqueue and dequeue mechanisms to manage publishing and consumption of messages respectively. Every messaging frameworks provides its own set of libraries to connect to its resources(queues/topics). Figure 05: Message Queue Additionally the messaging layer also can perform the role of data stream producer which can converted the queued data into continuous streams of data which can be passed on for data ingestion. Data Ingestion Layer: A fast ingestion layer is one of the key layers in Lambda Architecture pattern. This layer needs to ensure how fast can data be delivered into working models of Lambda architecture.  The data ingestion layer is responsible for consuming the messages from the messaging layer and perform the required transformation for ingesting them into the lambda layer (batch and speed layer) such that the transformed output conforms to the expected storage or processing formats. Figure 06: Data Ingestion Layer Batch Processing: The batch processing layer of lambda architecture is expected to process the ingested data in batches so as to have optimum utilization of system resources, at the same time, long running operations may be applied to the data to ensure high quality of data output, which is also known as Modelled data. The conversion of raw data to a modelled data is the primary responsibility of this layer, wherein the modelled data is the data model which can be served by serving layers of lambda architecture. While Hadoop as a framework has multiple components that can process data as a batch, each data processing in Hadoop is a map reduce process. A map and reduce paradigm of process execution is not a new paradigm, rather it has been used in many application ever since mainframe systems came into existence. It is based on divide and rule and stems from the traditional multi-threading model. The primary mechanism here is to divide the batch across multiple processes and then combine/reduce output of all the processes into a single output. Figure 07: Batch Processing Speed (Near Real Time) Data Processing: This layer is expected to perform near real time processing on data received from ingestion layer. Since the processing is expected to be in near real time, such data processing will need to be quick, fast and efficient, with support and design for high concurrency scenarios and eventually consistent outcome. The real-time processing was often dependent on data like the look-up data and reference data, hence there was a need to have a very fast data layer such that any look-up or reference data does not adversely impact the real-time nature of the processing. Near real time data processing pattern is not very different from the way it is done in batch mode, but the primary difference being that the data is processed as soon as it is available for processing and does not have to be batched, as shown below. Figure 08: Speed (Near Real Time) Processing Data Storage Layer: The data storage layer is very eminent in the lambda architecture pattern as this layer defines the reactivity of the overall solution to the incoming event/data streams. The storage, in context of lambda architecture driven data lake can be classified broadly into non-indexed and indexed data storage. Typically, the batch processing is performed on non-indexed data stored as data blocks for faster batch processing, while speed (near real time processing) is performed on indexed data which can be accessed randomly and supports complex search patterns by means of inverted indexes. Both of these models are depicted below. Figure 09: Non-Indexed and Indexed Data Storage Examples Lambda in action Once all the layers in lambda architecture have performed their respective roles, the data can be exported, exposed via services and can be delivered through other protocols from the data lake. This can also include merging the high quality processed output from batch processing with indexed storage, using technologies and frameworks, so as to provide enriched data for near real time requirements as well with interesting visualizations. Figure 10: Lambda in action Summary In this article we have briefly discussed a practical approach towards implementing a data lake for enterprises by leveraging Lambda architecture pattern. Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] System Architecture and Design of Ansible [article] Microservices and Service Oriented Architecture [article]
Read more
  • 0
  • 0
  • 4658

article-image-thread-synchronization-and-communication
Packt
19 Jun 2017
20 min read
Save for later

Thread synchronization and communication

Packt
19 Jun 2017
20 min read
In this article by Maya Posch, the author of the book Mastering C++ Multithreading, we will learn to work through and understand a basic multithreaded C++ application. While generally threads are used to work on a task more or less independently from other threads, there are many occasions where one would want to pass data between threads, or even control other threads, such as from a central task scheduler thread. This article looks at how such tasks are accomplished. Topics covered in this article include: Use of mutexes, locks and similar synchronization structures. The use of condition variables and signals to control threads. Safely passing and sharing data between threads. (For more resources related to this topic, see here.) Safety first The central problem with concurrency is that of ensuring safe access to shared resources, including when communicating between threads. There is also the issue of threads being able to communicate and synchronize themselves. What makes multithreaded programming such a challenge is to be able to keep track of each interaction between threads, ensure that each and every form of access is secured while not falling into the traps of deadlocks and data races. In this article we will look at a fairly complex example involving a task scheduler. This is a form of high-concurrency, high-throughput situation where many different requirements come together, with many potential traps, as we will see in a moment. The scheduler A good example of multithreading with a significant amount of synchronization and communication between threads is the scheduling of tasks. Hereby the goal is to accept incoming tasks and assign them to work threads as quickly as possible. Hereby a number of different approaches are possible. Often one has worker threads running in an active loop, constantly polling a central queue for new tasks. Disadvantages of this approach include the wasting of processor cycles on said polling and the congestion which forms at the synchronization mechanism used, generally a mutex.Furthermore, this active polling approach scales very poorly when the number of worker threads increase. Ideally, each worker thread would idly wait until it is needed again. To accomplish this, we have to approach the problem from the other side: not from the perspective of the worker threads, but of that of the queue. Much like the scheduler of of an operating system, it is the scheduler which is aware of both tasks which require processing, as well as available worker threads. In this approach, a central scheduler instance would accept new tasks and actively assign them to worker threads. Said scheduler instance may also manage these worker threads, such as their number and priority depending on the number of incoming tasks and the type of task or other properties. High-level view At its core, our scheduleror dispatcher is quite simple, functioning like a queue with all of the scheduling logic built into it: As one can see from the high-level view, there really isn't much to it. As we'll see in a moment, the actual implementation does however have a number of complications. Implementation As is usual, we start off with the main function, contained in main.cpp: #include "dispatcher.h" #include "request.h" #include <iostream> #include <string> #include <csignal> #include <thread> #include <chrono> using namespace std; sig_atomic_t signal_caught = 0; mutex logMutex; The custom headers we include are those for our dispatcher implementation, as well as the Request class we'll be using. Globally we define an atomic variable to be used with the signal handler, as well as a mutex which will synchronize the output (on the standard output) from our logging method. void sigint_handler(int sig) { signal_caught = 1; } Our signal handler function (for SIGINT signals) simply sets the global atomic variable we defined earlier. void logFnc(string text) { logMutex.lock(); cout << text <<"n"; logMutex.unlock(); } In our logging function we use the global mutex to ensure writing to the standard output is synchronized. int main() { signal(SIGINT, &sigint_handler); Dispatcher::init(10); In the main function we install the signal handler for SIGINT to allow us to interrupt the execution of the application. We also call the static init() function on the Dispatcher class to initialize it. cout <<"Initialised.n"; int cycles = 0; Request* rq = 0; while (!signal_caught && cycles < 50) { rq = new Request(); rq->setValue(cycles); rq->setOutput(&logFnc); Dispatcher::addRequest(rq); cycles++; } Next we set up the loop in which we will create new requests. In each cycle we create a new Request instance and use its setValue() function to set an integer value (current cycle number). We also set our logging function on the request instance before adding this new request to the Dispatcher using its static addRequest() function. This loop will continue until the maximum number of cycles have been reached, or SIGINT has been signaled, using Ctrl+C or similar. his_thread::sleep_for(chrono::seconds(5)); Dispatcher::stop(); cout <<"Clean-up done.n"; return 0; } Finally we wait for five seconds, using the thread's sleep_for() function and the chrono::seconds() function from the chrono STL header. We also call the stop() function on the Dispatcher before returning. Request class A request for the Dispatcher always derives from the pure virtual AbstractRequest class: #pragma once #ifndef ABSTRACT_REQUEST_H #define ABSTRACT_REQUEST_H class AbstractRequest { // public: virtual void setValue(int value) = 0; virtual void process() = 0; virtual void finish() = 0; }; #endif This class defines an API with three functions which a deriving class always has to implement, of which the process() and finish() functions are the most generic and likely to be used in any practical implementation. The setValue() function is specific to this demonstration implementation and would likely be adapted or extended to fit a real-life scenario. The advantage of using an abstract class as the basis for a request is that it allows the Dispatcher class to handle many different types of requests, as long as they all adhere to this same basic API. Using this abstract interface, we implement a basic Request class: #pragma once #ifndef REQUEST_H #define REQUEST_H #include "abstract_request.h" #include <string> using namespace std; typedef void (*logFunction)(string text); class Request : public AbstractRequest { int value; logFunction outFnc; public: void setValue(int value) { this->value = value; } void setOutput(logFunction fnc) { outFnc = fnc; } void process(); void finish(); }; #endif In its header file we first define the logging function pointer's format. After this we implement the request API, adding the setOutput() function to the base API, which accepts a function pointer for logging. Both setter functions merely assign the provided parameter to their respective private class members. Next, the class function implementations: #include "request.h" void Request::process() { outFnc("Starting processing request " + std::to_string(value) + "..."); // } void Request::finish() { outFnc("Finished request " + std::to_string(value)); }Both of these implementations are very basic, merely using the function pointer to output a string indicating the status of the worker thread. In a practical implementation, one would add the business logic to the process()function, with the finish() function containing any functionality to finish up a request, such as writing a map into a string. Worker class Next, the Worker class. This contains the logic which will be called by the dispatcher in order to process a request: #pragma once #ifndef WORKER_H #define WORKER_H #include "abstract_request.h" #include <condition_variable> #include <mutex> using namespace std; class Worker { condition_variable cv; mutex mtx; unique_lock<mutex> ulock; AbstractRequest* request; bool running; bool ready; public: Worker() { running = true; ready = false; ulock = unique_lock<mutex>(mtx); } void run(); void stop() { running = false; } void setRequest(AbstractRequest* request) { this->request = request; ready = true; } void getCondition(condition_variable* &cv); }; #endif Whereas the adding of a request to the dispatcher does not require any special logic, the Worker class does require the use of condition variables to synchronize itself with the dispatcher. For the C++11 threads API, this requires a condition variable, a mutex and a unique lock. The unique lock encapsulates the mutex and will ultimately be used with the condition variable as we will see in a moment. Beyond this we define methods to start and stop the worker, to set a new request for processing and to obtain access to its internal condition variable. Moving on, the rest of the implementation: #include "worker.h" #include "dispatcher.h" #include <chrono> using namespace std; void Worker::getCondition(condition_variable* &cv) { cv = &(this)->cv; } void Worker::run() { while (running) { if (ready) { ready = false; request->process(); request->finish(); } if (Dispatcher::addWorker(this)) { // Use the ready loop to deal with spurious wake-ups. while (!ready && running) { if (cv.wait_for(ulock, chrono::seconds(1)) == cv_status::timeout) { // We timed out, but we keep waiting unless // the worker is // stopped by the dispatcher. } } } } } Beyond the getter function for the condition variable, we define the run() function, which the dispatcher will run for each worker thread upon starting it. Its main loop merely checks that the stop() function hasn't been called yet, which would have set the running boolean value to false and ended the work thread. This is used by the dispatcher when shutting down, allowing it to terminate the worker threads. Since boolean values are generally atomic, setting and checking can be done simultaneously without risk or requiring a mutex. Moving on, the check of the ready variable is to ensure that a request is actually waiting when the thread is first run. On the first run of the worker thread, no request will be waiting and thus attempting to process one would result in a crash. Upon the dispatcher setting a new request, this boolean variable will be set to true. If a request is waiting, the ready variable will be set to false again, after which the request instance will have its process() and finish() functions called. This will run the business logic of the request on the worker thread's thread and finalize it. Finally, the worker thread adds itself to the dispatcher using its static addWorker() function. This function will return false if no new request was available, causing the worker thread to wait until a new request has become available. Otherwise the worker thread will continue with the processing of the new request that the dispatcher will have set on it. If asked to wait, we enter a new loop which will ensure that upon waking up from waiting for the condition variable to be signaled, we woke up because we got signaled by the dispatcher (ready variable set to true), and not because of a spurious wake-up. Last of all, we enter the actual wait() function of the condition variable, using the unique lock instance we created before, along with a timeout. If a timeout occurs, we can either terminate the thread, or keep waiting. Here we choose to do nothing and just re-enter the waiting loop. Dispatcher As the last item, we have the Dispatcher class itself: #pragma once #ifndef DISPATCHER_H #define DISPATCHER_H #include "abstract_request.h" #include "worker.h" #include <queue> #include <mutex> #include <thread> #include <vector> using namespace std; class Dispatcher { static queue<AbstractRequest*> requests; static queue<Worker*> workers; static mutex requestsMutex; static mutex workersMutex; static vector<Worker*> allWorkers; static vector<thread*> threads; public: static bool init(int workers); static bool stop(); static void addRequest(AbstractRequest* request); static bool addWorker(Worker* worker); }; #endif Most of this should look familiar by now. As one should have surmised by now, this is a fully static class. Moving on with its implementation: #include "dispatcher.h" #include <iostream> using namespace std; queue<AbstractRequest*> Dispatcher::requests; queue<Worker*> Dispatcher::workers; mutex Dispatcher::requestsMutex; mutex Dispatcher::workersMutex; vector<Worker*> Dispatcher::allWorkers; vector<thread*> Dispatcher::threads; bool Dispatcher::init(int workers) { hread* t = 0; Worker* w = 0; for (int i = 0; i < workers; ++i) { w = new Worker; allWorkers.push_back(w); = new thread(&Worker::run, w); hreads.push_back(t); } } After setting up the static class members, the init() function is defined. It starts the specified number of worker threads, keeping a reference to each worker and thread instance in their respective vector data structures. bool Dispatcher::stop() { for (int i = 0; i < allWorkers.size(); ++i) { allWorkers[i]->stop(); } cout <<"Stopped workers.n"; for (int j = 0; j < threads.size(); ++j) { hreads[j]->join(); cout <<"Joined threads.n"; } } In the stop() function each worker instance has its stop() function called. This will cause each worker thread to terminate, as we saw earlier in the Worker class description. Finally, we wait for each thread to join (that is, finish), prior to returning. void Dispatcher::addRequest(AbstractRequest* request) { workersMutex.lock(); if (!workers.empty()) { Worker* worker = workers.front(); worker->setRequest(request); condition_variable* cv; worker->getCondition(cv); cv->notify_one(); workers.pop(); workersMutex.unlock(); } else { workersMutex.unlock(); requestsMutex.lock(); requests.push(request); requestsMutex.unlock(); } } The addRequest() function is where things get interesting. In this one function a new request is added. What happens next to it depends on whether a worker thread is waiting for a new request or not. If no worker thread is waiting (worker queue is empty), the request is added to the request queue. The use of mutexes ensures that the access to these queues occurs safely, as the worker threads will simultaneously try to access both queues as well. An import gotcha to note here is the possibility of a deadlock. That is, a situation where two threads will hold the lock on a resource, with the other thread waiting for the first thread to release its lock before releasing its own. Every situation where more than one mutex is used in a single scope holds this potential. In this function the potential for deadlock lies in the releasing of the lock on the workers mutex and when the lock on the requests mutex is obtained. In the case that this function holds the workers mutex and tries to obtain the requests lock (when no worker thread is available), there is a chance that another thread holds the requests mutex (looking for new requests to handle), while simultaneously trying to obtain the workers mutex (finding no requests and adding itself to the workers queue). The solution hereby is simple: release a mutex before obtaining the next one. In the situation where one feels that more than one mutex lock has to be held it is paramount to examine and test one's code for potential deadlocks. In this particular situation the workers mutex lock is explicitly released when it is no longer needed, or before the requests mutex lock is obtained, preventing a deadlock. Another important aspect of this particular section of code is the way it signals a worker thread. As one can see in the first section of the if/else block, when the workers queue is not empty, a worker is fetched from the queue, has the request set on it and then has its condition variable referenced and signaled, or notified. Internally the condition variable uses the mutex we handed it before in the Worker class definition to guarantee only atomic access to it. When the notify_one() function (generally called signal() in other APIs) is called on the condition variable, it will notify the first thread in the queue of threads waiting for the condition variable to return and continue. In the Worker class'run() function we would be waiting for this notification event. Upon receiving it, the worker thread would continue and process the new request. The thread reference will then be removed from the queue until it adds itself again once it is done processing the request. bool Dispatcher::addWorker(Worker* worker) { bool wait = true; requestsMutex.lock(); if (!requests.empty()) { AbstractRequest* request = requests.front(); worker->setRequest(request); requests.pop(); wait = false; requestsMutex.unlock(); } else { requestsMutex.unlock(); workersMutex.lock(); workers.push(worker); workersMutex.unlock(); } return wait; } With this function a worker thread will add itself to the queue once it is done processing a request. It is similar to the earlier function in that the incoming worker is first actively matched with any request which may be waiting in the request queue. If none are available, the worker is added to the worker queue. Important to note here is that we return a boolean value which indicates whether the calling thread should wait for a new request, or whether it already has received a new request while trying to add itself to the queue. While this code is less complex than that of the previous function, it still holds the same potential deadlock issue due to the handling of two mutexes within the same scope. Here, too, we first release the mutex we hold before obtaining the next one. Makefile The Makefile for this dispatcher example is very basic again, gathering all C++ source files in the current folder and compiling them into a binary using g++: GCC := g++ OUTPUT := dispatcher_demo SOURCES := $(wildcard *.cpp) CCFLAGS := -std=c++11 -g3 all: $(OUTPUT) $(OUTPUT): $(GCC) -o $(OUTPUT) $(CCFLAGS) $(SOURCES) clean: rm $(OUTPUT) .PHONY: all Output After compiling the application, running it produces the following output for the fifty total requests: $ ./dispatcher_demo.exe Initialised. Starting processing request 1... Starting processing request 2... Finished request 1 Starting processing request 3... Finished request 3 Starting processing request 6... Finished request 6 Starting processing request 8... Finished request 8 Starting processing request 9... Finished request 9 Finished request 2 Starting processing request 11... Finished request 11 Starting processing request 12... Finished request 12 Starting processing request 13... Finished request 13 Starting processing request 14... Finished request 14 Starting processing request 7... Starting processing request 10... Starting processing request 15... Finished request 7 Finished request 15 Finished request 10 Starting processing request 16... Finished request 16 Starting processing request 17... Starting processing request 18... Starting processing request 0… At this point we we can already clearly see that even with each request taking almost no time to process, the requests are clearly being executed in parallel. The first request (request 0) only starts being processed after the 16th request, while the second request already finishes after the ninth request, long before this. The factors which determine which thread and thus which request is processed first depends on the OS scheduler and hardware-based scheduling. This clearly shows just how few assumptions one can be made about how a multithreaded application will be executed, even on a single platform. Starting processing request 5... Finished request 5 Starting processing request 20... Finished request 18 Finished request 20 Starting processing request 21... Starting processing request 4... Finished request 21 Finished request 4 Here the fourth and fifth requests also finish in a rather delayed fashion. Starting processing request 23... Starting processing request 24... Starting processing request 22... Finished request 24 Finished request 23 Finished request 22 Starting processing request 26... Starting processing request 25... Starting processing request 28... Finished request 26 Starting processing request 27... Finished request 28 Finished request 27 Starting processing request 29... Starting processing request 30... Finished request 30 Finished request 29 Finished request 17 Finished request 25 Starting processing request 19... Finished request 0 At this point the first request finally finishes. This may indicate that the initialization time for the first request will always delay it relative to the successive requests. Running the application multiple times can confirm this. It's important that if the order of processing is relevant, that this randomness does not negatively affect one's application. Starting processing request 33... Starting processing request 35... Finished request 33 Finished request 35 Starting processing request 37... Starting processing request 38... Finished request 37 Finished request 38 Starting processing request 39... Starting processing request 40... Starting processing request 36... Starting processing request 31... Finished request 40 Finished request 39 Starting processing request 32... Starting processing request 41... Finished request 32 Finished request 41 Starting processing request 42... Finished request 31 Starting processing request 44... Finished request 36 Finished request 42 Starting processing request 45... Finished request 44 Starting processing request 47... Starting processing request 48... Finished request 48 Starting processing request 43... Finished request 47 Finished request 43 Finished request 19 Starting processing request 34... Finished request 34 Starting processing request 46... Starting processing request 49... Finished request 46 Finished request 49 Finished request 45 Request 19 also became fairly delayed, showing once again just how unpredictable a multithreaded application can be. If we were processing a large data set in parallel here, with chunks of data in each request, we might have to pause at some points to account for these delays as otherwise our output cache might grow too large. As doing so would negatively affect an application's performance, one might have to look at low-level optimizations, as well as the scheduling of threads on specific processor cores in order to prevent this from happening. Stopped workers. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Clean-up done. All ten worker threads which were launched in the beginning terminate here as we call the stop() function of the Dispatcher. Sharing data In this article's example we saw how to share information between threads in addition to the synchronizing of threads. This in the form of the requests we passed from the main thread into the dispatcher, from which each request gets passed on to a different thread. The essential idea behind the sharing of data between threads is that the data to be shared exists somewhere in a way which is accessible to two threads or more. After this we have to ensure that only one thread can modify the data, and that the data does not get modified while it's being read. Generally we would use mutexes or similar to ensure this. Using R/W-locks Readers-writer locks are a possible optimization here, because they allow multiple threads to read simultaneously from a single data source. If one has an application in which multiple worker threads read the same information repeatedly, it would be more efficient to use read-write locks than basic mutexes, because the attempts to read the data will not block the other threads. A read-write lock can thus be used as a more advanced version of a mutex, namely as one which adapts its behavior to the type of access. Internally it builds on mutexes (or semaphores) and condition variables. Using shared pointers First available via the Boost library and introduced natively with C++11, shared pointers are an abstraction of memory management using reference counting for heap-allocated instances. They are partially thread-safe, in that creating multiple shared pointer instances can be created, but the referenced object itself is not thread-safe. Depending on the application this may suffice, however. To make them properly thread-safe one can use atomics. Summary In this article we looked at how to pass data between threads in a safe manner as part of a fairly complex scheduler implementation. We also looked at the resulting asynchronous processing of said scheduler and considered some potential alternatives and optimizations for passing data between threads. At this point one should be able to safely pass data between threads, as well as synchronize the access to other shared resources. In the next article we will be looking at the native C++ threading and primitives API.  Resources for Article: Further resources on this subject: Multithreading with Qt [article] Understanding the Dependencies of a C++ Application [article] Boost.Asio C++ Network Programming [article]
Read more
  • 0
  • 0
  • 20849
article-image-introduction-cyber-extortion
Packt
19 Jun 2017
21 min read
Save for later

Introduction to Cyber Extortion

Packt
19 Jun 2017
21 min read
In this article by Dhanya Thakkar, the author of the book Preventing Digital Extortion, explains how often we make the mistake of relying on the past for predicting the future, and nowhere is this more relevant than in the sphere of the Internet and smart technology. People, processes, data, and things are tightly and increasingly connected, creating new, intelligent networks unlike anything else we have seen before. The growth is exponential and the consequences are far reaching for individuals, and progressively so for businesses. We are creating the Internet of Things and the Internet of Everything. (For more resources related to this topic, see here.) It has become unimaginable to run a business without using the Internet. It is not only an essential tool for current products and services, but an unfathomable well for innovation and fresh commercial breakthroughs. The transformative revolution is spillinginto the public sector, affecting companies like vanguards and diffusing to consumers, who are in a feedback loop with suppliers, constantly obtaining and demanding new goods. Advanced technologies that apply not only to machine-to-machine communication but also to smart sensors generate complex networks to which theoretically anything that can carry a sensor can be connected. Cloud computing and cloud-based applications provide immense yet affordable storage capacity for people and organizations and facilitate the spread of data in more ways than one. Keeping in mind the Internet’s nature, the physical boundaries of business become blurred, and virtual data protection must incorporate a new characteristic of security: encryption. In the middle of the storm of the IoT, major opportunities arise, and equally so, unprecedented risks lurk. People often think that what they put on the Internet is protected and closed information. It is hardly so. Sending an e-mail is not like sending a letter in a closed envelope. It is more like sending a postcard, where anyone who gets their hands on it can read what's written on it. Along with people who want to utilize the Internet as an open business platform, there are people who want to find ways of circumventing legal practices and misusing the wealth of data on computer networks by unlawfully gaining financial profits, assets, or authority that can be monetized. Being connected is now critical. As cyberspace is growing, so are attempts to violate vulnerable information gaining global scale. This newly discovered business dynamic is under persistent threat of criminals. Cyberspace, cyber crime, and cyber security are perceptibly being found in the same sentence. Cyber crime –under defined and under regulated A massive problem encouraging the perseverance and evolution of cyber crime is the lack of an adequate unanimous definition and the under regulation on a national, regional, and global level. Nothing is criminal unless stipulated by the law. Global law enforcement agencies, academia, and state policies have studied the constant development of the phenomenon since its first appearance in 1989, in the shape of the AIDS Trojan virus transferred from an infected floppy disk. Regardless of the bizarre beginnings, there is nothing entertaining about cybercrime. It is serious. It is dangerous. Significant efforts are made to define cybercrime on a conceptual level in academic research and in national and regional cybersecurity strategies. Still, as the nature of the phenomenon evolves, so must the definition. Research reports are still at a descriptive level, and underreporting is a major issue. On the other hand, businesses are more exposed due to ignorance of the fact that modern-day criminals increasingly rely on the Internet to enhance their criminal operations. Case in point: Aaushi Shah and Srinidhi Ravi from the Asian School of Cyber Laws have created a cybercrime list by compiling a set of 74 distinctive and creativelynamed actions emerging in the last three decades that can be interpreted as cybercrime. These actions target anything from e-mails to smartphones, personal computers, and business intranets: piggybacking, joe jobs, and easter eggs may sound like cartoons, but their true nature resembles a crime thriller. The concept of cybercrime Cyberspace is a giant community made out of connected computer users and data on a global level. As a concept, cybercrime involves any criminal act dealing withcomputers andnetworks, including traditional crimes in which the illegal activities are committed through the use of a computer and the Internet. As businesses become more open and widespread, the boundary between data freedom and restriction becomes more porous. Countless e-shopping transactions are made, hospitals keep record of patient histories, students pass exams, and around-the-clock payments are increasingly processed online. It is no wonder that criminals are relentlessly invading cyberspace trying to find a slipping crack. There are no recognizable border controls on the Internet, but a business that wants to evade harm needs to understand cybercrime's nature and apply means to restrict access to certain information. Instead of identifying it as a single phenomenon, Majid Jar proposes a common denominator approach for all ICT-related criminal activities. In his book Cybercrime and Society, Jar refers to Thomas and Loader’s working concept of cybercrime as follows: “Computer-mediated activities which are either illegal or considered illicit by certain parties and which can be conducted through global electronic network.” Jar elaborates the important distinction of this definition by emphasizing the difference between crime and deviance. Criminal activities are explicitly prohibited by formal regulations and bear sanctions, while deviances breach informal social norms. This is a key note to keep in mind. It encompasses the evolving definition of cybercrime, which keeps transforming after resourceful criminals who constantly think of new ways to gain illegal advantages. Law enforcement agencies on a global level make an essential distinction between two subcategories of cybercrime: Advanced cybercrime or high-tech crime Cyber-enabled crime The first subcategory, according to Interpol, includes newly emerged sophisticated attacks against computer hardware and software. On the other hand, the second category contains traditional crimes in modern clothes,for example crimes against children, such as exposing children to illegal content; financial crimes, such as payment card frauds, money laundering, and counterfeiting currency and security documents; social engineering frauds; and even terrorism. We are much beyond the limited impact of the 1989 cybercrime embryo. Intricate networks are created daily. They present new criminal opportunities, causing greater damage to businesses and individuals, and require a global response. Cybercrime is conceptualized as a service embracing a commercial component.Cybercriminals work as businessmen who look to sell a product or a service to the highest bidder. Critical attributes of cybercrime An abridged version of the cybercrime concept provides answers to three vital questions: Where are criminal activities committed and what technologies are used? What is the reason behind the violation? Who is the perpetrator of the activities? Where and how – realm Cybercrime can be an online, digitally committed, traditional offense. Even if the component of an online, digital, or virtual existence were not included in its nature, it would still have been considered crime in the traditional, real-world sense of the word. In this sense, as the nature of cybercrime advances, so mustthe spearheads of lawenforcement rely on laws written for the non-digital world to solve problems encountered online. Otherwise, the combat becomesstagnant and futile. Why – motivation The prefix "cyber" sometimes creates additional misperception when applied to the digital world. It is critical to differentiate cybercrime from other malevolent acts in the digital world by considering the reasoning behind the action. This is not only imperative for clarification purposes, but also for extending the definition of cybercrime over time to include previously indeterminate activities. Offenders commit a wide range of dishonest acts for selfish motives such as monetary gain, popularity, or gratification. When the intent behind the behavior is misinterpreted, confusion may arise and actions that should not have been classified as cybercrime could be charged with criminal prosecution. Who –the criminal deed component The action must be attributed to a perpetrator. Depending on the source, certain threats can be translated to the criminal domain only or expanded to endanger potential larger targets, representing an attack to national security or a terrorist attack. Undoubtedly, the concept of cybercrime needs additional refinement, and a comprehensive global definition is in progress. Along with global cybercrime initiatives, national regulators are continually working on implementing laws, policies, and strategies to exemplify cybercrime behaviors and thus strengthen combating efforts. Types of common cyber threats In their endeavors to raise cybercrime awareness, the United Kingdom'sNational Crime Agency (NCA) divided common and popular cybercrime activities by affiliating themwith the target under threat. While both individuals and organizations are targets of cyber criminals, it is the business-consumer networks that suffer irreparable damages due to the magnitude of harmful actions. Cybercrime targeting consumers Phishing The term encompasses behavior where illegitimate e-mails are sent to the receiver to collect security information and personal details Webcam manager A webcam manager is an instance of gross violating behavior in which criminals take over a person's webcam File hijacker Criminals hijack files and hold them "hostage" until the victim pays the demanded ransom Keylogging With keylogging, criminals have the means to record what the text behind the keysyou press on your keyboard is Screenshot manager A screenshot manager enables criminals to take screenshots of an individual’s computer screen Ad clicker Annoying but dangerous ad clickers direct victims’ computer to click on a specific harmful link Cybercrime targeting businesses Hacking Hacking is basically unauthorized access to computer data. Hackers inject specialist software with which they try to take administrative control of a computerized network or system. If the attack is successful, the stolen data can be sold on the dark web and compromise people’s integrity and safety by intruding and abusing the privacy of products as well as sensitive personal and business information. Hacking is particularly dangerous when it compromises the operation of systems that manage physical infrastructure, for example, public transportation. Distributed denial of service (DDoS) attacks When an online service is targeted by a DDoS attack, the communication links overflow with data from messages sent simultaneously by botnets. Botnets are a bunch of controlled computers that stop legitimate access to online services for users. The system is unable to provide normal access as it cannot handle the huge volume of incoming traffic. Cybercrime in relation to overall computer crime Many moons have passed since 2001, when the first international treatythat targeted Internet and computer crime—the Budapest Convention on Cybercrime—was adopted. The Convention’s intention was to harmonize national laws, improve investigative techniques, and increase cooperation among nations. It was drafted with the active participation of the Council of Europe's observer states Canada, Japan, South Africa, and the United States and drawn up by the Council of Europe in Strasbourg, France. Brazil and Russia, on the other hand, refused to sign the document on the basis of not being involved in the Convention's preparation. In The Understanding Cybercrime: A Guide to Developing Countries(Gercke, 2011), Marco Gercke makes an excellent final point: “Not all computer-related crimes come under the scope of cybercrime. Cybercrime is a narrower notion than all computer-related crime because it has to include a computer network. On the other hand, computer-related crime in general can also affect stand-alone computer systems.” Although progress has been made, consensus over the definition of cybercrime is not final. Keeping history in mind, a fluid and developing approach must be kept in mind when applying working and legal interpretations. In the end, international noncompliance must be overcome to establish a common and safe ground to tackle persistent threats. Cybercrime localized – what is the risk in your region? Europol’s heat map for the period between 2014 and 2015 reports on the geographical distribution of cybercrime on the basis of the United Nations geoscheme. The data in the report encompassed cyber-dependent crime and cyber-enabled fraud, but it did not include investigations into online child sexual abuse. North and South America Due to its overwhelming presence, it is not a great surprise that the North American region occupies several lead positions concerning cybercrime, both in terms of enabling malicious content and providing residency to victims in the regions that participate in the global cybercrime numbers. The United States hosted between 20% and nearly 40% of the total world's command-and-control servers during 2014. Additionally, the US currently hosts over 45% of the world's phishing domains and is in the pack of world-leading spam producers. Between 16% and 20% percent of all global bots are located in the United States, while almost a third of point-of-sale malware and over 40% of all ransomware incidents were detected there. Twenty EU member states have initiated criminal procedures in which the parties under suspicion were located in the United States. In addition, over 70 percent of the countries located in the Single European Payment Area have been subject to losses from skimmed payment cards because of the distinct way in which the US, under certain circumstances, processes card payments without chip-and-PIN technology. There are instances of cybercrime in South America, but the scope of participation by the southern continent is way smaller than that of its northern neighbor, both in industry reporting and in criminal investigations. Ecuador, Guatemala, Bolivia, Peru, and Brazil are constantly rated high on the malware infection scale, and the situation is not changing, while Argentina and Colombia remain among the top 10 spammer countries. Brazil has a critical role in point-of-sale malware, ATM malware, and skimming devices. Europe The key aspect making Europe a region with excellent cybercrime potential is the fast, modern, and reliable ICT infrastructure. According to The Internet Organised Crime Threat Assessment (IOCTA) 2015, Cybercriminals abuse Western European countries to host malicious content and launch attacks inside and outside the continent. EU countries host approximately 13 percent of the global malicious URLs, out of which Netherlands is the leading country, while Germany, the U.K., and Portugal come second, third, and fourth respectively. Germany, the U.K., the Netherlands, France, and Russia are important hosts for bot C&C infrastructure and phishing domains, while Italy, Germany, the Netherlands, Russia, and Spain are among the top sources of global spam. Scandinavian countries and Finland are famous for having the lowest malware infection rates. France, Germany, Italy, and to some extent the U.K. have the highest malware infection rates and the highest proportion of bots found within the EU. However, the findings are presumably the result of the high population of the aforementioned EU countries. A half of the EU member states identified criminal infrastructure or suspects in the Netherlands, Germany, Russia, or the United Kingdom. One third of the European law enforcement agencies confirmed connections to Austria, Belgium, Bulgaria, the Czech Republic, France, Hungary, Italy, Latvia, Poland, Romania, Spain, or Ukraine. Asia China is the United States' counterpart in Asia in terms of the top position concerning reported threats to Internet security. Fifty percent of the EU member states' investigations on cybercrime include offenders based in China. Moreover, certain authorities quote China as the source of one third of all global network attacks. In the company of India and South Korea, China is third among the top-10 countries hosting botnet C&C infrastructure, and it has one of the highest global malware infection rates. India, Indonesia, Malaysia, Taiwan, and Japan host serious bot numbers, too. Japan takes on a significant part both as a source country and as a victim of cybercrime. Apart from being an abundant spam source, Japan is included in the top three Asian countries where EU law enforcement agencies have identified cybercriminals. On the other hand, Japan, along with South Korea and the Philippines, is the most popular country in the East and Southeast region of Asia where organized crime groups run sextortion campaigns. Vietnam, India, and China are the top Asian countries featuring spamming sources. Alternatively, China and Hong Kong are the most prominent locations for hosting phishing domains. From another point of view, the country code top-level domains (ccTLDs) for Thailand and Pakistan are commonly used in phishing attacks. In this region, most SEPA members reported losses from the use of skimmed cards. In fact, five (Indonesia, Philippines, South Korea, Vietnam, and Malaysia) out of the top six countries are from this region. Africa Africa remains renowned for combined and sophisticated cybercrime practices. Data from the Europol heat map report indicates that the African region holds a ransomware-as-a-service presence equivalent to the one of the European black market. Cybercriminals from Africa make profits from the same products. Nigeria is on the list of the top 10 countries compiled by the EU law enforcement agents featuring identified cybercrime perpetrators and related infrastructure. In addition, four out of the top five top-level domains used for phishing are of African origin: .cf, .za, .ga, and .ml. Australia and Oceania Australia has two critical cybercrime claims on a global level: First, the country is present in several top-10 charts in the cybersecurity industry, including bot populations, ransomware detection, and network attack originators. Second, the country-code top-level domain for the Palau Islands in Micronesia is massively used by Chinese attackers as the TLD with the second highest proportion of domains used for phishing. Cybercrime in numbers Experts agree that the past couple of years have seen digital extortion flourishing. In 2015 and 2016, cybercrime reached epic proportions. Although there is agreement about the serious rise of the threat, putting each ransomware aspect into numbers is a complex issue. Underreporting is not an issue only in academic research but also in practical case scenarios. The threat to businesses around the world is growing, because businesses keep it quiet. The scope of extortion is obscured because companies avoid reporting and pay the ransom in order to settle the issue in a conducive way. As far as this goes for corporations, it is even more relevant for public enterprises or organizations that provide a public service of any kind. Government bodies, hospitals, transportation companies, and educational institutions are increasingly targeted with digital extortion. Cybercriminals estimate that these targets are likely to pay in order to protect drops in reputation and to enable uninterrupted execution of public services. When CEOs and CIOs keep their mouths shut, relying on reported cybercrime numbers can be a tricky question. The real picture is not only what is visible in the media or via professional networking, but also what remains hidden and is dealt with discreetly by the security experts. In the second quarter of 2015, Intel Security reported an increase in ransomware attacks by 58%. Just in the first 3 months of 2016, cybercriminals amassed $209 million from digital extortion. By making businesses and authorities pay the relatively small average ransom amount of $10,000 per incident, extortionists turn out to make smart business moves. Companies are not shaken to the core by this amount. Furthermore, they choose to pay and get back to business as usual, thus eliminating further financial damages that may arise due to being out of business and losing customers. Extortionists understand the nature of ransom payment and what it means for businesses and institutions. As sound entrepreneurs, they know their market. Instead of setting unreasonable skyrocketing prices that may cause major panic and draw severe law enforcement action, they keep it low profile. In this way, they maintain the dark business in flow, moving from one victim to the next and evading legal measures. A peculiar perspective – Cybercrime in absolute and normalized numbers “To get an accurate picture of the security of cyberspace, cybercrime statistics need to be expressed as a proportion of the growing size of the Internet similar to the routine practice of expressing crime as a proportion of a population, i.e., 15 murders per 1,000 people per year.” This statement by Eric Jardine from the Global Commission on Internet Governance (Jardine, 2015) launched a new perspective of cybercrime statistics, one that accounts for the changing nature and size of cyberspace. The approach assumes that viewing cybercrime findings isolated from the rest of the changes in cyberspace provides a distorted view of reality. The report aimed at normalizing crime statistics and thus avoiding negative, realistic cybercrime scenarios that emerge when drawing conclusions from the limited reliability of absolute numbers. In general, there are three ways in which absolute numbers can be misinterpreted: Absolute numbers can negatively distort the real picture, while normalized numbers show whether the situation is getting better Both numbers can show that things are getting better, but normalized numbers will show that the situation is improving more quickly Both numbers can indicate that things are deteriorating, but normalized numbers will indicate that the situation is deteriorating at a slower rate than absolute numbers Additionally, the GCIG (Global Commission on Internet Governance) report includes some excellent reasoning about the nature of empirical research undertaken in the age of the Internet. While almost everyone and anything is connected to the network and data can be easily collected, most of the information is fragmented across numerous private parties. Normally, this entangles the clarity of the findings of cybercrime presence in the digital world. When data is borrowed from multiple resources and missing slots are modified with hypothetical numbers, the end result can be skewed. Keeping in mind this observation, it is crucial to emphasize that the GCIG report measured the size of cyberspace by accounting for eight key aspects: The number of active mobile broadband subscriptions The number of smartphones sold to end users The number of domains and websites The volume of total data flow The volume of mobile data flow The annual number of Google searches The Internet’s contribution to GDP It has been illustrated several times during this introduction that as cyberspace grows, so does cybercrime. To fight the menace, businesses and individuals enhance security measures and put more money into their security budgets. A recent CIGI-Ipsos (Centre for International Governance Innovation - Ipsos) survey collected data from 23,376 Internet users in 24 countries, including Australia, Brazil, Canada, China, Egypt, France, Germany, Great Britain, Hong Kong, India, Indonesia, Italy, Japan, Kenya, Mexico, Nigeria, Pakistan, Poland, South Africa, South Korea, Sweden, Tunisia, Turkey, and the United States. Survey results showed that 64% of users were more concerned about their online privacy compared to the previous year, whereas 78% were concerned about having their banking credentials hacked. Additionally, 77% of users were worried about cyber criminals stealing private images and messages. These perceptions led to behavioral changes: 43% of users started avoiding certain sites and applications, some 39% regularly updated passwords, while about 10% used the Internet less (CIGI-Ipsos, 2014). GCIC report results are indicative of a heterogeneous cyber security picture. Although many cybersecurity aspects are deteriorating over time, there are some that are staying constant, and a surprising number are actually improving. Jardine compares cyberspace security to trends in crime rates in a specific country operationalizing cyber attacks via 13 measures presented in the following table, as seen in Table 2 of Summary Statistics for the Security of Cyberspace(E. Jardine, GCIC Report, p. 6):    Minimum Maximum Mean Standard Deviation New Vulnerabilities 4,814 6,787 5,749 781.880 Malicious Web Domains 29,927 74,000 53,317 13,769.99 Zero-day Vulnerabilities 8 24 14.85714 6.336 New Browser Vulnerabilities 232 891 513 240.570 Mobile Vulnerabilities 115 416 217.35 120.85 Botnets 1,900,000 9,437,536 4,485,843 2,724,254 Web-based Attacks 23,680,646 1,432,660,467 907,597,833 702,817,362 Average per Capita Cost 188 214 202.5 8.893818078 Organizational Cost 5,403,644 7,240,000 6,233,941 753,057 Detection and Escalation Costs 264,280 455,304 372,272 83,331 Response Costs 1,294,702 1,738,761 1,511,804 152,502.2526 Lost Business Costs 3,010,000 4,592,214 3,827,732 782,084 Victim Notification Costs 497,758 565,020 565,020 30,342   While reading the table results, an essential argument must be kept in mind. Statistics for cybercrime costs are not available worldwide. The author worked with the assumption that data about US costs of cybercrime indicate costs on a global level. For obvious reasons, however, this assumption may not be true, and many countries will have had significantly lower costs than the US. To mitigate the assumption's flaws, the author provides comparative levels of those measures. The organizational cost of data breaches in 2013 in the United States was a little less than six million US dollars, while the average number on the global level, which was drawn from the Ponemon Institute’s annual Cost of Data Breach Study (from 2011, 2013, and 2014 via Jardine, p.7) measured the overall cost of data breaches, including the US ones, as US$2,282,095. The conclusion is that US numbers will distort global cost findings by expanding the real costs and will work against the paper's suggestion, which is that normalized numbers paint a rosier picture than the one provided by absolute numbers. Summary In this article, we have covered the birth and concept of cyber crime and the challenges law enforcement, academia, and security professionals face when combating its threatening behavior. We also explored the impact of cyber crime by numbers on varied geographical regions, industries, and devices. Resources for Article:  Further resources on this subject: Interactive Crime Map Using Flask [article] Web Scraping with Python [article]
Read more
  • 0
  • 0
  • 15520

article-image-article-movie-recommendation
Packt
16 Jun 2017
14 min read
Save for later

Article: Movie Recommendation

Packt
16 Jun 2017
14 min read
In this article by Robert Layton author of the book Learning Data Mining with Python - Second Edition is the second revision of Learning Data Mining with Python by Robert Layton improves upon the first book with updated examples, more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. (For more resources related to this topic, see here.) Affinity Analysis Affinity Analysis is the task of determining when objects are used in similar ways. We focused on whether the objects themselves are similar. The data for Affinity Analysis are often described in the form of a transaction. Intuitively, this comes from a transaction at a store—determining when objects are purchased together as a way to recommend products to users that they might purchase. Other use cases for Affinity Analysis include: Fraud detection Customer segmentation Software optimization Product recommendations Affinity Analysis is usually much more exploratory than classification. At the very least, we often simply rank the results and choose the top 5 recommendations (or some other number), rather than expect the algorithm to give us a specific answer. Algorithms for Affinity Analysis A brute force solution, testing all possible combinations, is not efficient enough for real-world use. We could expect even a small store to have hundreds of items for sale, while many online stores would have thousands (or millions!). As we add more items, the time it takes to compute all rules increases significantly faster. Specifically, the total possible number of rules is 2n - 1. Even the drastic increase in computing power couldn't possibly keep up with the increases in the number of items stored online. Therefore, we need algorithms that work smarter, as opposed to computers that work harder. The Apriori algorithm addresses the exponential problem of creating sets of items that occur frequently within a database, called frequent itemsets. Once these frequent itemsets are discovered, creating association rules is straightforward. The intuition behind Apriori is both simple and clever. First, we ensure that a rule has sufficient support within the dataset. Defining a minimum support level is the key parameter for Apriori. To build a frequent itemset, for an itemset (A, B) to have a support of at least 30, both A and B must occur at least 30 times in the database. This property extends to larger sets as well. For an itemset (A, B, C, D) to be considered frequent, the set (A, B, C) must also be frequent (as must D). Apriori discovers larger frequent itemsets by building off smaller frequent itemsets. The picture below outlines the full process: The Movie Recommendation Problem Product recommendation is a big business. Online stores use it to up-sell to customers by recommending other products that they could buy. Making better recommendations leads to better sales. When online shopping is selling to millions of customers every year, there is a lot of potential money to be made by selling more items to these customers. Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews. The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this article is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code: import os import pandas as pd data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k") ratings_filename = os.path.join(data_folder, "u.data") Ensure that ratings_filename points to the u.data file in the unzipped folder. Loading with pandas The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code: all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names = ["UserID", "MovieID", "Rating", "Datetime"]) While we won't use it in this article, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly. all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s') Understanding the Apriori algorithm and its implementation The goal of this article is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person recommends a set of movies is likely to recommend another particular movie. To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie: all_ratings["Favorable"] = all_ratings["Rating"] > 3 We will sample our dataset to form a training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users: ratings = all_ratings[all_ratings['UserID'].isin(range(200))] Next, we can create a dataset of only the favorable reviews in our sample: favorable_ratings = ratings[ratings["Favorable"]] We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by the UserID and iterating over the movies in each group: favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"]) In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user. Sets are much faster than lists for this type of operation, and we will use them in a later code. Finally, we can create a DataFrame that tells us how frequently each movie has been given a favorable review: num_favorable_by_movie = ratings[["MovieID", "Favorable"]].groupby("MovieID").sum() We can see the top five movies by running the following code: num_favorable_by_movie.sort_values(by="Favorable", ascending=False).head() Implementing the Apriori algorithm On the first iteration of Apriori, the newly discovered itemsets will have a length of 2, as they will be supersets of the initial itemsets created in the first step. On the second iteration (after applying the fourth step and going back to step 2), the newly discovered itemsets will have a length of 3. This allows us to quickly identify the newly discovered itemsets, as needed in the second step. We can store our discovered frequent itemsets in a dictionary, where the key is the length of the itemsets. This allows us to quickly access the itemsets of a given length, and therefore the most recently discovered frequent itemsets, with the help of the following code: frequent_itemsets = {} We also need to define the minimum support needed for an itemset to be considered frequent. This value is chosen based on the dataset but try different values to see how that affects the result. I recommend only changing it by 10 percent at a time though, as the time the algorithm takes to run will be significantly different! Let's set a minimum support value: min_support = 50 To implement the first step of the Apriori algorithm, we create an itemset with each movie individually and test if the itemset is frequent. We use frozenset, as they allow us to perform faster set-based operations later on, and they can also be used as keys in our counting dictionary (normal sets cannot). Let's look at the following example of frozenset code: frequent_itemsets[1] = dict((frozenset((movie_id,)), row["Favorable"]) for movie_id, row in num_favorable_by_movie.iterrows() if row["Favorable"] > min_support) We implement the second and third steps together for efficiency by creating a function that takes the newly discovered frequent itemsets, creates the supersets, and then tests if they are frequent. First, we set up the function to perform these steps: from collections import defaultdict def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support): counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for itemset in k_1_itemsets: if itemset.issubset(reviews): for other_reviewed_movie in reviews - itemset: current_superset = itemset | frozenset((other_reviewed_movie,)) counts[current_superset] += 1 return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support]) In keeping with our rule of thumb of reading through the data as little as possible, we iterate over the dataset once per call to this function. While this doesn't matter too much in this implementation (our dataset is relatively small compared to the average computer), single-pass is a good practice to get into for larger applications. Let's have a look at the core of this function in detail. We iterate through each user, and each of the previously discovered itemsets, and then check if it is a subset of the current set of reviews, which are stored in k_1_itemsets (note that here, k_1 means k-1). If it is, this means that the user has reviewed each movie in the itemset. This is done by the itemset.issubset(reviews) line. We can then go through each individual movie that the user has reviewed (that is not already in the itemset), create a superset by combining the itemset with the new movie and record that we saw this superset in our counting dictionary. These are the candidate frequent itemsets for this value of k. We end our function by testing which of the candidate itemsets have enough support to be considered frequent and return only those that have a support more than our min_support value. This function forms the heart of our Apriori implementation and we now create a loop that iterates over the steps of the larger algorithm, storing the new itemsets as we increase k from 1 to a maximum value. In this loop, k represents the length of the soon-to-be discovered frequent itemsets, allowing us to access the previously most discovered ones by looking in our frequent_itemsets dictionary using the key k - 1. We create the frequent itemsets and store them in our dictionary by their length. Let's look at the code: for k in range(2, 20): # Generate candidates of length k, using the frequent itemsets of length k-1 # Only store the frequent itemsets cur_frequent_itemsets = find_frequent_itemsets(favorable_reviews_by_users, frequent_itemsets[k-1], min_support) if len(cur_frequent_itemsets) == 0: print("Did not find any frequent itemsets of length {}".format(k)) sys.stdout.flush() break else: print("I found {} frequent itemsets of length {}".format(len(cur_frequent_itemsets), k)) sys.stdout.flush() frequent_itemsets[k] = cur_frequent_itemsets Extracting association rules After the Apriori algorithm has completed, we have a list of frequent itemsets. These aren't exactly association rules, but they can easily be converted into these rules. For each itemset, we can generate a number of association rules by setting each movie to be the conclusion and the remaining movies as the premise.  candidate_rules = [] for itemset_length, itemset_counts in frequent_itemsets.items(): for itemset in itemset_counts.keys(): for conclusion in itemset: premise = itemset - set((conclusion,)) candidate_rules.append((premise, conclusion)) In these rules, the first partis the list of movies in the premise, while the number after it is the conclusion. In the first case, if a reviewer recommends movie 79, they are also likely to recommend movie 258. The process of computing confidence starts by creating dictionaries to store how many times we see the premise leading to the conclusion (a correct example of the rule) and how many times it doesn't (an incorrect example). We then iterate over all reviews and rules, working out whether the premise of the rule applies and, if it does, whether the conclusion is accurate. correct_counts = defaultdict(int) incorrect_counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for candidate_rule in candidate_rules: premise, conclusion = candidate_rule if premise.issubset(reviews): if conclusion in reviews: correct_counts[candidate_rule] += 1 else: incorrect_counts[candidate_rule] += 1 We then compute the confidence for each rule by dividing the correct count by the total number of times the rule was seen: rule_confidence = {candidate_rule: (correct_counts[candidate_rule] / float(correct_counts[candidate_rule] + incorrect_counts[candidate_rule])) for candidate_rule in candidate_rules} Now we can print the top five rules by sorting this confidence dictionary and printing the results: from operator import itemgetter sorted_confidence = sorted(rule_confidence.items(), key=itemgetter(1), reverse=True) for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] print("Rule: If a person recommends {0} they will also recommend {1}".format(premise, conclusion)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The resulting printout shows only the movie IDs, which isn't very helpful without the names of the movies also. The dataset came with a file called u.items, which stores the movie names and their corresponding MovieID (as well as other information, such as the genre). We can load the titles from this file using pandas. Additional information about the file and categories is available in the README file that came with the dataset. The data in the files is in CSV format, but with data separated by the | symbol; it has no header and the encoding is important to set. The column names were found in the README file. movie_name_filename = os.path.join(data_folder, "u.item") movie_name_data = pd.read_csv(movie_name_filename, delimiter="|", header=None, encoding = "mac-roman") movie_name_data.columns = ["MovieID", "Title", "Release Date", "Video Release", "IMDB", "<UNK>", "Action", "Adventure", "Animation", "Children's", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"] Let's also create a helper function for finding the name of a movie by its ID: def get_movie_name(movie_id): title_object = movie_name_data[movie_name_data["MovieID"] == movie_id]["Title"] title = title_object.values[0] return title We can now adjust our previous code for printing out the top rules to also include the titles: for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] premise_names = ", ".join(get_movie_name(idx) for idx in premise) conclusion_name = get_movie_name(conclusion) print("Rule: If a person recommends {0} they will also recommend {1}".format(premise_names, conclusion_name)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The results gives a recommendation for movies, based on previous movies that person liked. Give it a shot and see if it matches your expectations! Learning Data Mining with Python In this short section of Learning Data Mining with Python, Revision 2, we performed Affinity Analysis in order to recommend movies based on a large set of reviewers. We did this in two stages. First, we found frequent itemsets in the data using the Apriori algorithm. Then, we created association rules from those itemsets. We performed training on a subset of our data in order to find the association rules, and then tested those rules on the rest of the data—a testing set. We could extend this concept to use cross-fold validation to better evaluate the rules. This would lead to a more robust evaluation of the quality of each rule. We cover topics such as classification, clusters, text analysis, image recognition, TensorFlow and Big Data. Each section comes with a practical real-world example, steps through the code in detail and provides suggestions for your to continue your (machine) learning. Summary In this article we have covered more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. The most recent upgrades to the HTMLG online editor are the tag manager and the attribute filter. Try it for free and purchase a subscription if you like it! Resources for Article: Further resources on this subject: Expanding Your Data Mining Toolbox [article] Data mining [article] Big Data Analysis [article]
Read more
  • 0
  • 0
  • 3366

article-image-understanding-puppet-resources
Packt
16 Jun 2017
15 min read
Save for later

Understanding the Puppet Resources

Packt
16 Jun 2017
15 min read
A little learning is a dangerous thing, but a lot of ignorance is just as bad. —Bob Edwards In this article by John Arundel, the author of Puppet 4.10 Beginner’s Guide - Second Edition, we’ll go into details of packages, files, and services to see how to exploit their power to the full. Along the way, we’ll talk about the following topics: Managing files, directories, and trees Ownership and permissions Symbolic links Installing and uninstallingpackages Specific and latest versions of packages Installing Ruby gems Services: hasstatus and pattern Services: hasrestart, restart, stop, and start (For more resources related to this topic, see here.) Files Puppet can manage files on the server using the file resource, and the following example sets the contents of a file[TJ1] to a particular string using the content attribute (file_hello.pp): file { ‘/tmp/hello.txt’: content =>“hello, worldn”, } Managing whole files While it’s useful to be able to set the contents of a file to a short text string, most files we’re likely to want to manage, will be too large to include directly in our Puppet manifests. Ideally, we would put a copy of the file in the Puppet repo, and have Puppet simply copy it to the desired place in the filesystem. The source attribute (file_source.pp)does exactly that: file { ‘/etc/motd’: source =>‘/vagrant/examples/files/motd.txt’, } To try this example with your Vagrant box, run the following commands: sudo puppet apply /vagrant/examples/file_source.pp cat /etc/motd The best software in the world only sucks. The worst software is significantly worse than that. -Luke Kanies To run such examples, just apply them using sudo puppet apply as shown in the preceding example. Why do we have to run sudo puppet apply instead of just puppet apply? Puppet has the permissions of the user who runs it, so if Puppet needs to modify a file owned by root, it must be run with the root’s permissions (which is what sudo does). You will usually run Puppet as root because it needs those permissions to do things such as installing packagesand modifying config files owned by root. The value of the source attribute can be a path to a file on the server, as here, or an HTTP URL, as shown in the following example (file_http.pp): file { ‘/tmp/README.md’: source =>‘https://raw.githubusercontent.com/puppetlabs/puppet/master/README.md’, } Although this is a handy feature, bear in mind that every time you add an external dependency like this to your Puppet manifest, you’re adding a potential point of failure. Wherever you can, use a local copy of such a file instead of having Puppet fetch it remotely every time. This particularly applies to software which needs to be built from a tarball downloaded from a website. If possible, download the tarball and serve it from a local web server or file server. If this isn’t practical, using a caching proxy server can help save time and bandwidth when you’re building a large number of machines. Ownership On Unix-like systems, files are associated with an owner, a group, and a set of permissions to read, write, or execute the file. Since we normally run Puppet with the permissions of the root user (via sudo), the files Puppet manages will be owned by that user: ls -l /etc/motd -rw-r--r-- 1 root root 109 Aug 31 04:03 /etc/motd Often, this is just fine, but if we need the file to belong to another user (for example, if that user needs to be able to write to the file), we can express this by setting the owner attribute (file_owner.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, } Run the following command: ls -l /etc/owned_by_vagrant -rw-r--r-- 1 vagrant root 0 Aug 31 04:48 /etc/owned_by_vagrant You can see that Puppet has created the file and its owner attribute has been set to vagrant. You can also set the group ownership of the file using the group attribute (file_group.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, group =>‘vagrant’, } Run the following command: ls -l /etc/owned_by_vagrant -rw-r--r-- 1 vagrant vagrant 0 Aug 31 04:48 /etc/owned_by_vagrant This time, we didn’t specify either a content or source attribute for the file, but simply ensure => present. In this case, Puppet will create a file of zero size (useful, for example, if you want to make sure the file exists and is writeable, but doesn’t need to have any contents yet). Permissions Files on Unix-like systems have an associated mode, which determines access permissions for the file. It governs read, write, and execute permissions for the file’s owner, any user in the file’s group, and other users. Puppet supports setting permissions on files using the mode attribute. This takes an octal value, with each digit representing the permissions for owner, group, and other, in that order. In the following example, we use the mode attribute to set a mode of 0644 (read and write for owner, read-only for group, read-only for other) on a file (file_mode.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, mode =>‘0644’, } This will be quite familiar to experienced system administrators, as the octal values for file permissions are exactly the same as those understood by the Unixchmod command. For more information, run theman chmod command. Directories Creating or managing permissions on a directory is a common task, and Puppet uses the file resource to do this too. If the value of the ensure attribute is directory, the file will be a directory (file_directory.pp): file { ‘/etc/config_dir’: ensure => directory, } As with regular files, you can use the owner, group, and mode attributes to control access to directories. Trees of files Puppet can copy a single file to the server, but what about a whole directory of files, possibly including subdirectories (known as a file tree)? The recurse attribute will take care of this (file_tree.pp): file { ‘/etc/config_dir’: source =>‘/vagrant/examples/files/config_dir’, recurse => true, } Run the following command: ls /etc/config_dir/ 1 2 3 When recurse attribute is true, Puppet will copy all the files and directories (and their subdirectories) in the source directory (/vagrant/examples/files/config_dir in this example) to the target directory (/etc/config_dir). If the target directory already exists and has files in it, Puppet will not interfere with them, but you can change this behavior using the purge attribute.[JR4]  If this is true, Puppet will delete any files and directories in the target directory which are not present in the source directory. Use this attribute with care! Symbolic links Another common requirement for managing files is to create or modify a symbolic link (known as a symlink for short). You can have Puppet do this by setting ensure => link on the file resource, and specifying the target attribute (file_symlink.pp): file { ‘/etc/this_is_a_link’: ensure => link, target =>‘/etc/motd’, } Run the following command: ls -l /etc/this_is_a_link lrwxrwxrwx 1 root root 9 Aug 31 05:05 /etc/this_is_a_link -> /etc/motd Packages To install a package usethe package resource, and this is all you need to do with most packages. However, the package resource has a few extra features which may be useful. Uninstalling packages The ensure attribute normally takes the installedvalue in order to install a package, but if you specify absent instead, Puppet will remove the package if it happens to be installed. Otherwise, it will take no action. The following example will remove the apparmor package if it’s installed (package_remove.pp): package { ‘apparmor’: ensure => absent, } Installing specific versions If there are multiple versions of a package available to the system’s package manager, specifying ensure => installed will cause Puppet to install the default version (usually the latest). But if you need a specific version, you can specify that version string as the value of ensure, and Puppet will install that version (package_version.pp): package { ‘openssl’: ensure =>‘1.0.2g-1ubuntu4.2’, } It’s a good idea to specify an exact version whenever you manage packages with Puppet, so that all servers will get the same version of a given package. Otherwise, if you use ensure => installed, they will just get whatever version was current at the time they were built, leading to a situation where different machines have different package versions. When a newer version of the package is released, and you decide it’s time to upgrade to it, you can update the version string specified in the Puppet manifest and Puppet will upgrade the package everywhere. Installing the latest version On the other hand, if you specify ensure => latest for a package, Puppet will make sure that the latest available version is installed every time it runs. When a new version of the package becomes available, it will be installed automatically on the next Puppet run. This is not generally what you want when using a package repository that’s not under your control (for example, the main Ubuntu repository). It means that packages will be upgraded at unexpected times, which may break your application (or at least result in unplanned downtime). A better strategy is to tell Puppet to install a specific version that you know works, and test upgrades in a controlled environment before rolling them out to production. If you maintain your own package repository, and control the release of new packages to it, ensure => latest can be a useful feature: Puppet will update a package as soon as you push a new version to the repo. If you are relying on upstream repositories, such as the Ubuntu repositories, it’s better to tell Puppet to install a specific version and upgrade that as necessary.[JR10]  Installing Ruby gems Although the package resource is most often used to install packages using the normal system package manager (in the case of Ubuntu, that’s APT), it can install other kinds of packages as well. Library packages for the Ruby programming language are known as gems. Puppet can install Ruby gems for you using the provider => gem attribute (package_gem.pp): package { ‘ruby’: ensure => installed, } package { ‘bundler’: ensure => installed, provider => gem, } In the preceding code, bundler is a Ruby gem, and therefore we have to specify provider => gem for this package so that Puppet doesn’t think it’s a standard system package and try to install it via APT. Since the gem provider is not available unless Ruby is installed, we install the ruby package first, and then the bundler gem. Installing gems in Puppet’s context Puppet itself is written at least partly in Ruby, and makes use of several Ruby gems. To avoid any conflicts with the version of Ruby and gems, which the server might need for other applications, Puppet packages its own version of Ruby and associated gems under the /etc/puppetlabs directory. This means you can install (or remove) whichever version of Ruby you like, and Puppet will not be affected. However, if you need to install a gem to extend Puppet’s capabilities in some way, then doing it with a package resource and provider => gem won’t work. That is, the gem will be installed, but only in the system Ruby context, and it won’t be visible to Puppet. Fortunately, the puppet_gem provider is available for exactly this purpose. When you use this provider, the gem will be installed in Puppet’s context (and, naturally, won’t be visible in the system context). The following example demonstrates how to use this provider (package_puppet_gem.pp): package { ‘hiera-eyaml’: ensure => installed, provider => puppet_gem, } To see the gems installed in Puppet’s context, use Puppet’s own version of the gem command, with the following path: /opt/puppetlabs/puppet/bin/gem list Services Although services are implemented in a number of varied and complicated ways at the operating system level, Puppet does a good job of abstracting away most of this with the service resource and exposing just the two attributes of services which really matter , whether they’re running (ensure) and whether they start at boot time (enable). However, you’ll occasionally encounter services that don’t play well with Puppet, for a variety of reasons. Sometimes, Puppet is unable to detect that the service is already running, and keeps trying to start it. At other times, Puppet may not be able to properly restart the service when a dependent resource changes. There are a few useful attributes for service resources that can help resolve these problems. The hasstatus attribute When a service resource has theensure => running attribute, Puppet needs to be able to check whether the service is, in fact, running. The way it does this depends on the underlying operating system, but on Ubuntu 16+, for example, it runs systemctl is-active SERVICE. If the service is packaged to work with systemd, that should be just fine, but in many cases, particularly with older software, it may not respond properly. If you find that Puppet keeps attempting to start the service on every Puppet run, even though the service is running, it may be that Puppet’s default service status detection isn’t working. In this case, you can specify the hasstatus => false attribute for the service (service_hasstatus.pp): service { ‘ntp’: ensure =>running, enable => true, hasstatus => false, } When hasstatus is false, Puppet knows not to try to check the service status using the default system service management command, and instead will look in the process table for a running process thatmatches the name of the service. If it finds one, it will infer that the service is running and take no further action. The pattern attribute Sometimes when using hasstatus => false, the service name as defined in Puppet doesn’t actually appear in the process table because the command that provides the service has a different name. If this is the case, you can tell Puppet exactly what to look for using the pattern attribute (service_pattern.pp): service { ‘ntp’: ensure =>running, enable => true, hasstatus => false, pattern =>‘ntpd’, } If hasstatus is false and pattern is specified, Puppet will search for the value of pattern in the process table to determine whether or not the service is running. To find the pattern you need, you can use the ps command to see the list of running processes: ps ax The hasrestart and restart attributes When a service is notified (for example, if a file resource uses the notify attribute to tell the service its config file has changed), Puppet’s default behavior is to stop the service, then start it again. This usually works, but many services implement a restart command in their management scripts. If this is available, it’s usually a good idea to use itas it may be faster or safer than stopping and starting the service. Some services take a while to shut down properly when stopped, for example, and Puppet may not wait long enough before trying to restart them, so that you end up with the service not running at all. If you specify hasrestart => true for a service, then Puppet will try to send a restart command to it, using whatever service management command is appropriate (systemctl, for example). The following example shows the use of hasrestart (service_hasrestart.pp): service { ‘ntp’: ensure =>running, enable => true, hasrestart => true, } To further complicate things, the default system service restart command may not work, or you may need to take certain special actions when the service is restarted (disabling monitoring notifications, for example). You can specify any restart command you like for the service using the restart attribute (service_custom_restart.pp): service { ‘ntp’: ensure => running, enable => true, restart =>‘/bin/echo Restarting >>/tmp/debug.log && systemctl restart ntp’, } In this example, the restart command writes a message to a log file before restarting the service in the usual way, but it could, of course, do anything you need it to. In the extremely rare event that the service cannot be stopped or started using the default service management command, Puppet also provides the stop and start attributes so that you can specify custom commands to stop and start the service, in just the same way as with the restart attribute. If you need to use either of these, though, it’s probably safe to say that you’re having a bad day.  Summary In this article, we explored Puppet’s file resource in detail, covering file sources, ownership, permissions, directories, symbolic links, and file trees. You learned how to manage packages by installing specific versions, or the latest version, and how to uninstall packages. We also covered Ruby gems, both in the system context and Puppet’s internal context. We looked at service resources, including the has status, pattern, has restart, restart, stop, and start attributes. Resources for Article: Further resources on this subject: My First Puppet Module [article] Puppet Language and Style [article] External Tools and the Puppet Ecosystem [article]
Read more
  • 0
  • 0
  • 5621
article-image-exploring-functions
Packt
16 Jun 2017
12 min read
Save for later

Exploring Functions

Packt
16 Jun 2017
12 min read
In this article by Marius Bancila, author of the book Modern C++ Programming Cookbook covers the following recipes: Defaulted and deleted functions Using lambdas with standard algorithms (For more resources related to this topic, see here.) Defaulted and deleted functions In C++, classes have special members (constructors, destructor and operators) that may be either implemented by default by the compiler or supplied by the developer. However, the rules for what can be default implemented are a bit complicated and can lead to problems. On the other hand, developers sometimes want to prevent objects to be copied, moved or constructed in a particular way. That is possible by implementing different tricks using these special members. The C++11 standard has simplified many of these by allowing functions to be deleted or defaulted in the manner we will see below. Getting started For this recipe, you need to know what special member functions are, and what copyable and moveable means. How to do it... Use the following syntax to specify how functions should be handled: To default a function use =default instead of the function body. Only special class member functions that have defaults can be defaulted. struct foo { foo() = default; }; To delete a function use =delete instead of the function body. Any function, including non-member functions, can be deleted. struct foo { foo(foo const &) = delete; }; void func(int) = delete; Use defaulted and deleted functions to achieve various design goals such as the following examples: To implement a class that is not copyable, and implicitly not movable, declare the copy operations as deleted. class foo_not_copiable { public: foo_not_copiable() = default; foo_not_copiable(foo_not_copiable const &) = delete; foo_not_copiable& operator=(foo_not_copiable const&) = delete; }; To implement a class that is not copyable, but it is movable, declare the copy operations as deleted and explicitly implement the move operations (and provide any additional constructors that are needed). class data_wrapper { Data* data; public: data_wrapper(Data* d = nullptr) : data(d) {} ~data_wrapper() { delete data; } data_wrapper(data_wrapper const&) = delete; data_wrapper& operator=(data_wrapper const &) = delete; data_wrapper(data_wrapper&& o) :data(std::move(o.data)) { o.data = nullptr; } data_wrapper& operator=(data_wrapper&& o) { if (this != &o) { delete data; data = std::move(o.data); o.data = nullptr; } return *this; } }; To ensure a function is called only with objects of a specific type, and perhaps prevent type promotion, provide deleted overloads for the function (the example below with free functions can also be applied to any class member functions). template <typename T> void run(T val) = delete; void run(long val) {} // can only be called with long integers How it works... A class has several special members that can be implemented by default by the compiler. These are the default constructor, copy constructor, move constructor, copy assignment, move assignment and destructor. If you don't implement them, then the compiler does it, so that instances of a class can be created, moved, copied and destructed. However, if you explicitly provide one or more, then the compiler will not generate the others according to the following rules: If a user defined constructor exists, the default constructor is not generated by default. If a user defined virtual destructor exists, the default constructor is not generated by default. If a user-defined move constructor or move assignment operator exist, then the copy constructor and copy assignment operator are not generated by default. If a user defined copy constructor, move constructor, copy assignment operator, move assignment operator or destructor exist, then the move constructor and move assignment operator are not generated by default. If a user defined copy constructor or destructor exists, then the copy assignment operator is generated by default. If a user-defined copy assignment operator or destructor exists, then the copy constructor is generated by default. Note that the last two are deprecated rules and may no longer be supported by your compiler. Sometimes developers need to provide empty implementations of these special members or hide them in order to prevent the instances of the class to be constructed in a specific manner. A typical example is a class that is not supposed to be copyable. The classical pattern for this is to provide a default constructor and hide the copy constructor and copy assignment operators. While this works, the explicitly defined default constructor makes the class to no longer be considered trivial and therefore a POD type (that can be constructed with reinterpret_cast). The modern alternative to this is using deleted function as shown in the previous section. When the compiler encounters the =default in the definition of a function it will provide the default implementation. The rules for special member functions mentioned earlier still apply. Functions can be declared =default outside the body of a class if and only if they are inlined. class foo     {      public:      foo() = default;      inline foo& operator=(foo const &);     };     inline foo& foo::operator=(foo const &) = default;     When the compiler encounters the =delete in the definition of a function it will prevent the calling of the function. However, the function is still considered during overload resolution and only if the deleted function is the best match the compiler generates an error. For example, giving the previously defined overloads for function run() only calls with long integers are possible. Calls with arguments of any other type, including int, for which an automatic type promotion to long exists, would determine a deleted overload to be considered the best match and therefore the compiler will generate an error: run(42); // error, matches a deleted overload     run(42L); // OK, long integer arguments are allowed     Note that previously declared functions cannot be deleted, as the =delete definition must be the first declaration in a translation unit: void forward_declared_function();     // ...     void forward_declared_function() = delete; // error     The rule of thumb (also known as The Rule of Five) for class special member functions is: if you explicitly define any of copy constructor, move constructor, copy assignment, move assignment or destructor then you must either explicitly define or default all of them. Using lambdas with standard algorithms One of the most important modern features of C++ is lambda expressions, also referred as lambda functions or simply lambdas. Lambda expressions enable us to define anonymous function objects that can capture variables in the scope and be invoked or passed as arguments to functions. Lambdas are useful for many purposes and in this recipe, we will see how to use them with standard algorithms. Getting ready In this recipe, we discuss standard algorithms that take an argument that is a function or predicate that is applied to the elements it iterates through. You need to know what unary and binary functions are, and what are predicates and comparison functions. You also need to be familiar with function objects because lambda expressions are syntactic sugar for function objects. How to do it... Prefer to use lambda expressions to pass callbacks to standard algorithms instead of functions or function objects: Define anonymous lambda expressions in the place of the call if you only need to use the lambda in a single place. auto numbers = std::vector<int>{ 0, 2, -3, 5, -1, 6, 8, -4, 9 }; auto positives = std::count_if( std::begin(numbers), std::end(numbers), [](int const n) {return n > 0; }); Define a named lambda, that is, assigned to a variable (usually with the auto specifier for the type), if you need to call the lambda in multiple places. auto ispositive = [](int const n) {return n > 0; }; auto positives = std::count_if( std::begin(numbers), std::end(numbers), ispositive); Use generic lambda expressions if you need lambdas that only differ in their argument types (available since C++14). auto positives = std::count_if( std::begin(numbers), std::end(numbers), [](auto const n) {return n > 0; }); How it works... The non-generic lambda expression shown above takes a constant integer and returns true if it is greater than 0, or false otherwise. The compiler defines an unnamed function object with the call operator having the signature of the lambda expression. struct __lambda_name__     {     bool operator()(int const n) const { return n > 0; }     };     The way the unnamed function object is defined by the compiler depends on the way we define the lambda expression, that can capture variables, use the mutable specifier or exception specifications or may have a trailing return type. The __lambda_name__ function object shown earlier is actually a simplification of what the compiler generates because it also defines a default copy and move constructor, a default destructor, and a deleted assignment operator. It must be well understood that the lambda expression is actually a class. In order to call it, the compiler needs to instantiate an object of the class. The object instantiated from a lambda expression is called a lambda closure. In the next example, we want to count the number of elements in a range that are greater or equal to 5 and less or equal than 10. The lambda expression, in this case, will look like this: auto numbers = std::vector<int>{ 0, 2, -3, 5, -1, 6, 8, -4, 9 };     auto start{ 5 };     auto end{ 10 };     auto inrange = std::count_if(      std::begin(numbers), std::end(numbers),      [start,end](int const n) {return start <= n && n <= end;});     This lambda captures two variables, start and end, by copy (that is, value). The result unnamed function object created by the compiler looks very much like the one we defined above. With the default and deleted special members mentioned earlier, the class looks like this: class __lambda_name_2__     {    int start_; int end_; public: explicit __lambda_name_2__(int const start, int const end) : start_(start), end_(end) {}    __lambda_name_2__(const __lambda_name_2__&) = default;    __lambda_name_2__(__lambda_name_2__&&) = default;    __lambda_name_2__& operator=(const __lambda_name_2__&)     = delete;    ~__lambda_name_2__() = default;      bool operator() (int const n) const    { return start_ <= n && n <= end_; }     };     The lambda expression can capture variables by copy (or value) or by reference, and different combinations of the two are possible. However, it is not possible to capture a variable multiple times and it is only possible to have & or = at the beginning of the capture list. A lambda can only capture variables from an enclosing function scope. It cannot capture variables with static storage duration (that means variables declared in namespace scope or with the static or external specifier). The following table shows various combinations for the lambda captures semantics. Lambda Description [](){} Does not capture anything [&](){} Captures everything by reference [=](){} Captures everything by copy [&x](){} Capture only x by reference [x](){} Capture only x by copy [&x...](){} Capture pack extension x by reference [x...](){} Capture pack extension x by copy [&, x](){} Captures everything by reference except for x that is captured by copy [=, &x](){} Captures everything by copy except for x that is captured by reference [&, this](){} Captures everything by reference except for pointer this that is captured by copy (this is always captured by copy) [x, x](){} Error, x is captured twice [&, &x](){} Error, everything is captured by reference, cannot specify again to capture x by reference [=, =x](){} Error, everything is captured by copy, cannot specify again to capture x by copy [&this](){} Error, pointer this is always captured by copy [&, =](){} Error, cannot capture everything both by copy and by reference The general form of a lambda expression, as of C++17, looks like this:  [capture-list](params) mutable constexpr exception attr -> ret { body }    All parts shown in this syntax are actually optional except for the capture list, that can, however, be empty, and the body, that can also be empty. The parameter list can actually be omitted if no parameters are needed. The return type does not need to be specified as the compiler can infer it from the type of the returned expression. The mutable specifier (that tells the compiler the lambda can actually modify variables captured by copy), the constexpr specifier (that tells the compiler to generate a constexpr call operator) and the exception specifiers and attributes are all optional. The simplest possible lambda expression is []{}, though it is often written as [](){}. There's more... There are cases when lambda expressions only differ in the type of their arguments. In this case, the lambdas can be written in a generic way, just like templates, but using the auto specifier for the type parameters (no template syntax is involved). Summary Functions are a fundamental concept in programming; regardless the topic we discussed we end up writing functions. This article contains recipes related to functions. This article, however, covers modern language features related to functions and callable objects. Resources for Article: Further resources on this subject: Understanding the Dependencies of a C++ Application [article] Boost.Asio C++ Network Programming [article] Application Development in Visual C++ - The Tetris Application [article]
Read more
  • 0
  • 0
  • 12317

Packt
16 Jun 2017
9 min read
Save for later

Streaming and the Actor Model – Akka Streams!

Packt
16 Jun 2017
9 min read
In this article by Piyush Mishra, author of the Akka Cookbook, we will learn about the streaming and the actor model with Akka streams. (For more resources related to this topic, see here.) Akka is a popular toolkit designed to ease the pain of dealing with concurrency and distributed systems. It provides easy APIs to create reactive, fault-tolerant, scalable, and concurrent applications, thanks to the actor model. The actor model was introduced by Carl Hewitt in the 70s, and it has been successfully implemented by different programming languages, frameworks, or toolkits, such as Erlang or Akka. The concepts around the actor model are simple. All actors are created inside an actor system. Every actor has a unique address within the actor system, a mailbox, a state (in the case of being a stateful actor) and a behavior. The only way of interacting with an actor is by sending messages to it using its address. Messages will be stored in the mailbox until the actor is ready to process them. Once it is ready, the actor will pick one message at a time and will execute its behavior against the message. At this point, the actor might update its state, create new actors, or send messages to other already-created actors. Akka provides all this and many other features, thanks to the vast ecosystem around the core component, such as Akka Cluster, Akka Cluster Sharding, Akka Persistence, Akka HTTP, or Akka Streams. We will dig a bit more into the later one. Streaming framework and toolkits are gaining momentum lately. This is motivated by the massive number of connected devices that are generating new data constantly that needs to be consumed, processed, analyzed, and stored. This is basically the idea of Internet of Things (IoT) or the newer term Internet of Everything. Some time ago, the Akka team decided that they could build a Streaming library leveraging all the power of Akka and the actor model: Akka Streams. Akka Streams uses Akka actors as its foundation to provide a set of easy APIs to create back-pressured streams. Each stream consists of one or more sources, zero or more flows, and one or more sinks. All these different modules are also known as stages in the Akka Streams terminology. The best way to understand how a stream works is to think about it as a graph. Each stage (source, flow, or sink) has zero or more input ports and zero or more output ports. For instance, a source has zero input ports and one output port. A flow has one input port and one output port. And finally, a sink has one input port and zero output ports. To have a runnable stream, we need to ensure that all ports of all our stages are connected. Only then, we can run our stream to process some elements: Akka Streams provides a rich set of predefined stages to cover the most common streaming functions. However, if a use case requires a new custom stage, it is also possible to create it from scratch or extend an existing one. The full list of predefined stages can be found at http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html. Now that we know about the different components Akka Streams provides, it is a good moment to introduce the actor materializer. As we mentioned earlier, Akka is the foundation of Akka Streams. This means the code you define in the high-level API is eventually run inside an actor. The actor materializer is the entity responsible to create these low-level actors. By default, all processing stages get created within the same actor. This means only one element at a time can be processed by your stream. It is also possible to indicate that you want to have a different actor per stage, therefore having the possibility to process multiple messages at the same time. You can indicate this to the materializer by calling the async method in the proper stage. There are also asynchronous predefined stages. For performance reasons, Akka Streams batches messages when pushing them to the next stage to reduce overhead. After this quick introduction, let's start putting together some code to create and run a stream. We will use the Scala build tool (famously known as sbt) to retrieve the Akka dependencies and run our code. To begin with, we need a build.sbt file with the following content: name := "akka-async-streams" version := "1.0" scalaVersion := "2.11.7" libraryDependencies += "com.typesafe.akka" % "akka-actor_2.11" % "2.4.17" libraryDependencies += "com.typesafe.akka" % "akka-stream_2.11" % "2.4.17" Once we have the file ready, we need to run sbt update to let sbt fetch the required dependencies. Our first stream will push a list of words, capitalize each of them, and log the resulting values. This can easily be achieved by doing the following: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() val stream = Source(List("hello","from","akka","streams!")) .map(_.capitalize) .to(Sink.foreach(actorSystem.log.info)) stream.run() In this small code snippet, we can see how our stream has one source with a list of strings, one flow that is capitalizing each stream, and finally one sink logging the result. If we run our code, we should see the following in the output: [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Streams! The execution of this stream is happening synchronously and ordered. In our next example, we will do the same stream; however, we can see how all stages are modular: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val capitalizer = Flow[String].map(_.capitalize) val stream = source.via(capitalizer).to(sink) stream.run() In this code snippet, we can see how stages can be treated as immutable modules. We see that we can use the via helper method to provide a flow stage in a stream. This stream is still running synchronously. To run it asynchronously, we can take advantage of the mapAsync flow. For this, let's create a small actor that will do the capitalization for us: class Capitalizer extends Actor with ActorLogging { def receive = { case str : String => log.info(s"Capitalizing $str") sender ! str.capitalize } } Once we have our actor defined, we can set up our asynchronous stream. For this, we will create a round robin pool of capitalizer actors. Then, we will use the ask pattern to send a message to an actor and wait for a response. This happens using the operator? The stream definition will be something like this: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsync(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) val stream = source.via(flow).to(sink) stream.run() If we execute this small piece of code, we can see something similar: [INFO] [default-akka.actor.default-dispatcher-16] [akka://default/user/$a/$a] Capitalizing hello [INFO] [default-akka.actor.default-dispatcher-15] [akka://default/user/$a/$b] Capitalizing from [INFO] [default-akka.actor.default-dispatcher-6] [akka://default/user/$a/$c] Capitalizing akka [INFO] [default-akka.actor.default-dispatcher-14] [akka://default/user/$a/$d] Capitalizing streams! [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Streams! We can see how each word is being processed by a different capitalizer actor ($a/$b/$c/$d) and by different threads (default-dispatcher 16,15,6 and 14). Even if these executions are happening asynchronously in the pool of actors, the stream is still maintaining the order of the elements. If we do not need to maintain order and we are looking for a faster approach, where an element can be pushed to the next stage in the stream as soon as it is ready, we can use mapAsyncUnordered: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsyncUnordered(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) val stream = source.via(flow).to(sink) stream.run() When running this code, we can see that the order is not preserved and the capitalized words arrive to the sink differently every time we execute our code. Consider the following example: [INFO] [default-akka.actor.default-dispatcher-10] [akka://default/user/$a/$b] Capitalizing from [INFO] [default-akka.actor.default-dispatcher-4] [akka://default/user/$a/$d] Capitalizing streams! [INFO] [default-akka.actor.default-dispatcher-13] [akka://default/user/$a/$c] Capitalizing akka [INFO] [default-akka.actor.default-dispatcher-14] [akka://default/user/$a/$a] Capitalizing hello [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Streams! Akka Streams also provides a graph DSL to define your stream. In this DSL, it is possible to connect stages just using the ~> operator: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit b => import GraphDSL.Implicits._ val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsyncUnordered(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) source ~> flow ~> sink ClosedShape }) graph.run() These code snippets show only a few features of the vast available options inside the Akka Streams framework. Actors can be seamlessly integrated with streams. This brings a whole new set of possibilities to process things in a stream fashion. We have seen how we can preserve or avoid order of elements, either synchronously or asynchronously. In addition, we saw how to use the graph DSL to define our stream. Summary In this article, we covered the concept of the actor model and the core components of Akka. We also described the stages in Akka Streams and created an example code for stream. If you want to learn more about Akka, Akka Streams, and all other modules around them, you can find useful and handy recipes like these ones in the Akka Cookbook at https://www.packtpub.com/application-development/akka-cookbook.  Resources for Article: Further resources on this subject: Creating First Akka Application [article] Working with Entities in Google Web Toolkit 2 [article] Skinner's Toolkit for Plone 3 Theming (Part 1) [article]
Read more
  • 1
  • 0
  • 4966
Modal Close icon
Modal Close icon