Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-angularjs-performance
Packt
04 Mar 2015
20 min read
Save for later

AngularJS Performance

Packt
04 Mar 2015
20 min read
In this article by Chandermani, the author of AngularJS by Example, we focus our discussion on the performance aspect of AngularJS. For most scenarios, we can all agree that AngularJS is insanely fast. For standard size views, we rarely see any performance bottlenecks. But many views start small and then grow over time. And sometimes the requirement dictates we build large pages/views with a sizable amount of HTML and data. In such a case, there are things that we need to keep in mind to provide an optimal user experience. Take any framework and the performance discussion on the framework always requires one to understand the internal working of the framework. When it comes to Angular, we need to understand how Angular detects model changes. What are watches? What is a digest cycle? What roles do scope objects play? Without a conceptual understanding of these subjects, any performance guidance is merely a checklist that we follow without understanding the why part. Let's look at some pointers before we begin our discussion on performance of AngularJS: The live binding between the view elements and model data is set up using watches. When a model changes, one or many watches linked to the model are triggered. Angular's view binding infrastructure uses these watches to synchronize the view with the updated model value. Model change detection only happens when a digest cycle is triggered. Angular does not track model changes in real time; instead, on every digest cycle, it runs through every watch to compare the previous and new values of the model to detect changes. A digest cycle is triggered when $scope.$apply is invoked. A number of directives and services internally invoke $scope.$apply: Directives such as ng-click, ng-mouse* do it on user action Services such as $http and $resource do it when a response is received from server $timeout or $interval call $scope.$apply when they lapse A digest cycle tracks the old value of the watched expression and compares it with the new value to detect if the model has changed. Simply put, the digest cycle is a workflow used to detect model changes. A digest cycle runs multiple times till the model data is stable and no watch is triggered. Once you have a clear understanding of the digest cycle, watches, and scopes, we can look at some performance guidelines that can help us manage views as they start to grow. (For more resources related to this topic, see here.) Performance guidelines When building any Angular app, any performance optimization boils down to: Minimizing the number of binding expressions and hence watches Making sure that binding expression evaluation is quick Optimizing the number of digest cycles that take place The next few sections provide some useful pointers in this direction. Remember, a lot of these optimization may only be necessary if the view is large. Keeping the page/view small The sanest advice is to keep the amount of content available on a page small. The user cannot interact/process too much data on the page, so remember that screen real estate is at a premium and only keep necessary details on a page. The lesser the content, the lesser the number of binding expressions; hence, fewer watches and less processing are required during the digest cycle. Remember, each watch adds to the overall execution time of the digest cycle. The time required for a single watch can be insignificant but, after combining hundreds and maybe thousands of them, they start to matter. Angular's data binding infrastructure is insanely fast and relies on a rudimentary dirty check that compares the old and the new values. Check out the stack overflow (SO) post (http://stackoverflow.com/questions/9682092/databinding-in-angularjs), where Misko Hevery (creator of Angular) talks about how data binding works in Angular. Data binding also adds to the memory footprint of the application. Each watch has to track the current and previous value of a data-binding expression to compare and verify if data has changed. Keeping a page/view small may not always be possible, and the view may grow. In such a case, we need to make sure that the number of bindings does not grow exponentially (linear growth is OK) with the page size. The next two tips can help minimize the number of bindings in the page and should be seriously considered for large views. Optimizing watches for read-once data In any Angular view, there is always content that, once bound, does not change. Any read-only data on the view can fall into this category. This implies that once the data is bound to the view, we no longer need watches to track model changes, as we don't expect the model to update. Is it possible to remove the watch after one-time binding? Angular itself does not have something inbuilt, but a community project bindonce (https://github.com/Pasvaz/bindonce) is there to fill this gap. Angular 1.3 has added support for bind and forget in the native framework. Using the syntax {{::title}}, we can achieve one-time binding. If you are on Angular 1.3, use it! Hiding (ng-show) versus conditional rendering (ng-if/ng-switch) content You have learned two ways to conditionally render content in Angular. The ng-show/ng-hide directive shows/hides the DOM element based on the expression provided and ng-if/ng-switch creates and destroys the DOM based on an expression. For some scenarios, ng-if can be really beneficial as it can reduce the number of binding expressions/watches for the DOM content not rendered. Consider the following example: <div ng-if='user.isAdmin'>   <div ng-include="'admin-panel.html'"></div></div> The snippet renders an admin panel if the user is an admin. With ng-if, if the user is not an admin, the ng-include directive template is neither requested nor rendered saving us of all the bindings and watches that are part of the admin-panel.html view. From the preceding discussion, it may seem that we should get rid of all ng-show/ng-hide directives and use ng-if. Well, not really! It again depends; for small size pages, ng-show/ng-hide works just fine. Also, remember that there is a cost to creating and destroying the DOM. If the expression to show/hide flips too often, this will mean too many DOMs create-and-destroy cycles, which are detrimental to the overall performance of the app. Expressions being watched should not be slow Since watches are evaluated too often, the expression being watched should return results fast. The first way we can make sure of this is by using properties instead of functions to bind expressions. These expressions are as follows: {{user.name}}ng-show='user.Authorized' The preceding code is always better than this: {{getUserName()}}ng-show = 'isUserAuthorized(user)' Try to minimize function expressions in bindings. If a function expression is required, make sure that the function returns a result quickly. Make sure a function being watched does not: Make any remote calls Use $timeout/$interval Perform sorting/filtering Perform DOM manipulation (this can happen inside directive implementation) Or perform any other time-consuming operation Be sure to avoid such operations inside a bound function. To reiterate, Angular will evaluate a watched expression multiple times during every digest cycle just to know if the return value (a model) has changed and the view needs to be synchronized. Minimizing the deep model watch When using $scope.$watch to watch for model changes in controllers, be careful while setting the third $watch function parameter to true. The general syntax of watch looks like this: $watch(watchExpression, listener, [objectEquality]); In the standard scenario, Angular does an object comparison based on the reference only. But if objectEquality is true, Angular does a deep comparison between the last value and new value of the watched expression. This can have an adverse memory and performance impact if the object is large. Handling large datasets with ng-repeat The ng-repeat directive undoubtedly is the most useful directive Angular has. But it can cause the most performance-related headaches. The reason is not because of the directive design, but because it is the only directive that allows us to generate HTML on the fly. There is always the possibility of generating enormous HTML just by binding ng-repeat to a big model list. Some tips that can help us when working with ng-repeat are: Page data and use limitTo: Implement a server-side paging mechanism when a number of items returned are large. Also use the limitTo filter to limit the number of items rendered. Its syntax is as follows: <tr ng-repeat="user in users |limitTo:pageSize">…</tr> Look at modules such as ngInfiniteScroll (http://binarymuse.github.io/ngInfiniteScroll/) that provide an alternate mechanism to render large lists. Use the track by expression: The ng-repeat directive for performance tries to make sure it does not unnecessarily create or delete HTML nodes when items are added, updated, deleted, or moved in the list. To achieve this, it adds a $$hashKey property to every model item allowing it to associate the DOM node with the model item. We can override this behavior and provide our own item key using the track by expression such as: <tr ng-repeat="user in users track by user.id">…</tr> This allows us to use our own mechanism to identify an item. Using your own track by expression has a distinct advantage over the default hash key approach. Consider an example where you make an initial AJAX call to get users: $scope.getUsers().then(function(users){ $scope.users = users;}) Later again, refresh the data from the server and call something similar again: $scope.users = users; With user.id as a key, Angular is able to determine what elements were added/deleted and moved; it can also determine created/deleted DOM nodes for such elements. Remaining elements are not touched by ng-repeat (internal bindings are still evaluated). This saves a lot of CPU cycles for the browser as fewer DOM elements are created and destroyed. Do not bind ng-repeat to a function expression: Using a function's return value for ng-repeat can also be problematic, depending upon how the function is implemented. Consider a repeat with this: <tr ng-repeat="user in getUsers()">…</tr> And consider the controller getUsers function with this: $scope.getUser = function() {   var orderBy = $filter('orderBy');   return orderBy($scope.users, predicate);} Angular is going to evaluate this expression and hence call this function every time the digest cycle takes place. A lot of CPU cycles were wasted sorting user data again and again. It is better to use scope properties and presort the data before binding. Minimize filters in views, use filter elements in the controller: Filters defined on ng-repeat are also evaluated every time the digest cycle takes place. For large lists, if the same filtering can be implemented in the controller, we can avoid constant filter evaluation. This holds true for any filter function that is used with arrays including filter and orderBy. Avoiding mouse-movement tracking events The ng-mousemove, ng-mouseenter, ng-mouseleave, and ng-mouseover directives can just kill performance. If an expression is attached to any of these event directives, Angular triggers a digest cycle every time the corresponding event occurs and for events like mouse move, this can be a lot. We have already seen this behavior when working with 7 Minute Workout, when we tried to show a pause overlay on the exercise image when the mouse hovers over it. Avoid them at all cost. If we just want to trigger some style changes on mouse events, CSS is a better tool. Avoiding calling $scope.$apply Angular is smart enough to call $scope.$apply at appropriate times without us explicitly calling it. This can be confirmed from the fact that the only place we have seen and used $scope.$apply is within directives. The ng-click and updateOnBlur directives use $scope.$apply to transition from a DOM event handler execution to an Angular execution context. Even when wrapping the jQuery plugin, we may require to do a similar transition for an event raised by the JQuery plugin. Other than this, there is no reason to use $scope.$apply. Remember, every invocation of $apply results in the execution of a complete digest cycle. The $timeout and $interval services take a Boolean argument invokeApply. If set to false, the lapsed $timeout/$interval services does not call $scope.$apply or trigger a digest cycle. Therefore, if you are going to perform background operations that do not require $scope and the view to be updated, set the last argument to false. Always use Angular wrappers over standard JavaScript objects/functions such as $timeout and $interval to avoid manually calling $scope.$apply. These wrapper functions internally call $scope.$apply. Also, understand the difference between $scope.$apply and $scope.$digest. $scope.$apply triggers $rootScope.$digest that evaluates all application watches whereas, $scope.$digest only performs dirty checks on the current scope and its children. If we are sure that the model changes are not going to affect anything other than the child scopes, we can use $scope.$digest instead of $scope.$apply. Lazy-loading, minification, and creating multiple SPAs I hope you are not assuming that the apps that we have built will continue to use the numerous small script files that we have created to separate modules and module artefacts (controllers, directives, filters, and services). Any modern build system has the capability to concatenate and minify these files and replace the original file reference with a unified and minified version. Therefore, like any JavaScript library, use minified script files for production. The problem with the Angular bootstrapping process is that it expects all Angular application scripts to be loaded before the application can bootstrap. We cannot load modules, controllers, or in fact, any of the other Angular constructs on demand. This means we need to provide every artefact required by our app, upfront. For small applications, this is not a problem as the content is concatenated and minified; also, the Angular application code itself is far more compact as compared to the traditional JavaScript of jQuery-based apps. But, as the size of the application starts to grow, it may start to hurt when we need to load everything upfront. There are at least two possible solutions to this problem; the first one is about breaking our application into multiple SPAs. Breaking applications into multiple SPAs This advice may seem counterintuitive as the whole point of SPAs is to get rid of full page loads. By creating multiple SPAs, we break the app into multiple small SPAs, each supporting parts of the overall app functionality. When we say app, it implies a combination of the main (such as index.html) page with ng-app and all the scripts/libraries and partial views that the app loads over time. For example, we can break the Personal Trainer application into a Workout Builder app and a Workout Runner app. Both have their own start up page and scripts. Common scripts such as the Angular framework scripts and any third-party libraries can be referenced in both the applications. On similar lines, common controllers, directives, services, and filters too can be referenced in both the apps. The way we have designed Personal Trainer makes it easy to achieve our objective. The segregation into what belongs where has already been done. The advantage of breaking an app into multiple SPAs is that only relevant scripts related to the app are loaded. For a small app, this may be an overkill but for large apps, it can improve the app performance. The challenge with this approach is to identify what parts of an application can be created as independent SPAs; it totally depends upon the usage pattern of the application. For example, assume an application has an admin module and an end consumer/user module. Creating two SPAs, one for admin and the other for the end customer, is a great way to keep user-specific features and admin-specific features separate. A standard user may never transition to the admin section/area, whereas an admin user can still work on both areas; but transitioning from the admin area to a user-specific area will require a full page refresh. If breaking the application into multiple SPAs is not possible, the other option is to perform the lazy loading of a module. Lazy-loading modules Lazy-loading modules or loading module on demand is a viable option for large Angular apps. But unfortunately, Angular itself does not have any in-built support for lazy-loading modules. Furthermore, the additional complexity of lazy loading may be unwarranted as Angular produces far less code as compared to other JavaScript framework implementations. Also once we gzip and minify the code, the amount of code that is transferred over the wire is minimal. If we still want to try our hands on lazy loading, there are two libraries that can help: ocLazyLoad (https://github.com/ocombe/ocLazyLoad): This is a library that uses script.js to load modules on the fly angularAMD (http://marcoslin.github.io/angularAMD): This is a library that uses require.js to lazy load modules With lazy loading in place, we can delay the loading of a controller, directive, filter, or service script, until the page that requires them is loaded. The overall concept of lazy loading seems to be great but I'm still not sold on this idea. Before we adopt a lazy-load solution, there are things that we need to evaluate: Loading multiple script files lazily: When scripts are concatenated and minified, we load the complete app at once. Contrast it to lazy loading where we do not concatenate but load them on demand. What we gain in terms of lazy-load module flexibility we lose in terms of performance. We now have to make a number of network requests to load individual files. Given these facts, the ideal approach is to combine lazy loading with concatenation and minification. In this approach, we identify those feature modules that can be concatenated and minified together and served on demand using lazy loading. For example, Personal Trainer scripts can be divided into three categories: The common app modules: This consists of any script that has common code used across the app and can be combined together and loaded upfront The Workout Runner module(s): Scripts that support workout execution can be concatenated and minified together but are loaded only when the Workout Runner pages are loaded. The Workout Builder module(s): On similar lines to the preceding categories, scripts that support workout building can be combined together and served only when the Workout Builder pages are loaded. As we can see, there is a decent amount of effort required to refactor the app in a manner that makes module segregation, concatenation, and lazy loading possible. The effect on unit and integration testing: We also need to evaluate the effect of lazy-loading modules in unit and integration testing. The way we test is also affected with lazy loading in place. This implies that, if lazy loading is added as an afterthought, the test setup may require tweaking to make sure existing tests still run. Given these facts, we should evaluate our options and check whether we really need lazy loading or we can manage by breaking a monolithic SPA into multiple smaller SPAs. Caching remote data wherever appropriate Caching data is the one of the oldest tricks to improve any webpage/application performance. Analyze your GET requests and determine what data can be cached. Once such data is identified, it can be cached from a number of locations. Data cached outside the app can be cached in: Servers: The server can cache repeated GET requests to resources that do not change very often. This whole process is transparent to the client and the implementation depends on the server stack used. Browsers: In this case, the browser caches the response. Browser caching depends upon the server sending HTTP cache headers such as ETag and cache-control to guide the browser about how long a particular resource can be cached. Browsers can honor these cache headers and cache data appropriately for future use. If server and browser caching is not available or if we also want to incorporate any amount of caching in the client app, we do have some choices: Cache data in memory: A simple Angular service can cache the HTTP response in the memory. Since Angular is SPA, the data is not lost unless the page refreshes. This is how a service function looks when it caches data: var workouts;service.getWorkouts = function () {   if (workouts) return $q.resolve(workouts);   return $http.get("/workouts").then(function (response){       workouts = response.data;       return workouts;   });}; The implementation caches a list of workouts into the workouts variable for future use. The first request makes a HTTP call to retrieve data, but subsequent requests just return the cached data as promised. The usage of $q.resolve makes sure that the function always returns a promise. Angular $http cache: Angular's $http service comes with a configuration option cache. When set to true, $http caches the response of the particular GET request into a local cache (again an in-memory cache). Here is how we cache a GET request: $http.get(url, { cache: true}); Angular caches this cache for the lifetime of the app, and clearing it is not easy. We need to get hold of the cache dedicated to caching HTTP responses and clear the cache key manually. The caching strategy of an application is never complete without a cache invalidation strategy. With cache, there is always a possibility that caches are out of sync with respect to the actual data store. We cannot affect the server-side caching behavior from the client; consequently, let's focus on how to perform cache invalidation (clearing) for the two client-side caching mechanisms described earlier. If we use the first approach to cache data, we are responsible for clearing cache ourselves. In the case of the second approach, the default $http service does not support clearing cache. We either need to get hold of the underlying $http cache store and clear the cache key manually (as shown here) or implement our own cache that manages cache data and invalidates cache based on some criteria: var cache = $cacheFactory.get('$http');cache.remove("http://myserver/workouts"); //full url Using Batarang to measure performance Batarang (a Chrome extension), as we have already seen, is an extremely handy tool for Angular applications. Using Batarang to visualize app usage is like looking at an X-Ray of the app. It allows us to: View the scope data, scope hierarchy, and how the scopes are linked to HTML elements Evaluate the performance of the application Check the application dependency graph, helping us understand how components are linked to each other, and with other framework components. If we enable Batarang and then play around with our application, Batarang captures performance metrics for all watched expressions in the app. This data is nicely presented as a graph available on the Performance tab inside Batarang: That is pretty sweet! When building an app, use Batarang to gauge the most expensive watches and take corrective measures, if required. Play around with Batarang and see what other features it has. This is a very handy tool for Angular applications. This brings us to the end of the performance guidelines that we wanted to share in this article. Some of these guidelines are preventive measures that we should take to make sure we get optimal app performance whereas others are there to help when the performance is not up to the mark. Summary In this article, we looked at the ever-so-important topic of performance, where you learned ways to optimize an Angular app performance. Resources for Article: Further resources on this subject: Role of AngularJS [article] The First Step [article] Recursive directives [article]
Read more
  • 0
  • 0
  • 5548

article-image-mapreduce-functions
Packt
03 Mar 2015
11 min read
Save for later

MapReduce functions

Packt
03 Mar 2015
11 min read
 In this article, by John Zablocki, author of the book, Couchbase Essentials, you will be acquainted to MapReduce and how you'll use it to create secondary indexes for our documents. At its simplest, MapReduce is a programming pattern used to process large amounts of data that is typically distributed across several nodes in parallel. In the NoSQL world, MapReduce implementations may be found on many platforms from MongoDB to Hadoop, and of course, Couchbase. Even if you're new to the NoSQL landscape, it's quite possible that you've already worked with a form of MapReduce. The inspiration for MapReduce in distributed NoSQL systems was drawn from the functional programming concepts of map and reduce. While purely functional programming languages haven't quite reached mainstream status, languages such as Python, C#, and JavaScript all support map and reduce operations. (For more resources related to this topic, see here.) Map functions Consider the following Python snippet: numbers = [1, 2, 3, 4, 5] doubled = map(lambda n: n * 2, numbers) #doubled == [2, 4, 6, 8, 10] These two lines of code demonstrate a very simple use of a map() function. In the first line, the numbers variable is created as a list of integers. The second line applies a function to the list to create a new mapped list. In this case, the map() function is supplied as a Python lambda, which is just an inline, unnamed function. The body of lambda multiplies each number by two. This map() function can be made slightly more complex by doubling only odd numbers, as shown in this code: numbers = [1, 2, 3, 4, 5] defdouble_odd(num):   if num % 2 == 0:     return num   else:     return num * 2   doubled = map(double_odd, numbers) #doubled == [2, 2, 6, 4, 10] Map functions are implemented differently in each language or platform that supports them, but all follow the same pattern. An iterable collection of objects is passed to a map function. Each item of the collection is then iterated over with the map function being applied to that iteration. The final result is a new collection where each of the original items is transformed by the map. Reduce functions Like maps, the reduce functions also work by applying a provided function to an iterable data structure. The key difference between the two is that the reduce function works to produce a single value from the input iterable. Using Python's built-in reduce() function, we can see how to produce a sum of integers, as follows: numbers = [1, 2, 3, 4, 5] sum = reduce(lambda x, y: x + y, numbers) #sum == 15 You probably noticed that unlike our map operation, the reduce lambda has two parameters (x and y in this case). The argument passed to x will be the accumulated value of all applications of the function so far, and y will receive the next value to be added to the accumulation. Parenthetically, the order of operations can be seen as ((((1 + 2) + 3) + 4) + 5). Alternatively, the steps are shown in the following list: x = 1, y = 2 x = 3, y = 3 x = 6, y = 4 x = 10, y = 5 x = 15 As this list demonstrates, the value of x is the cumulative sum of previous x and y values. As such, reduce functions are sometimes termed accumulate or fold functions. Regardless of their name, reduce functions serve the common purpose of combining pieces of a recursive data structure to produce a single value. Couchbase MapReduce Creating an index (or view) in Couchbase requires creating a map function written in JavaScript. When the view is created for the first time, the map function is applied to each document in the bucket containing the view. When you update a view, only new or modified documents are indexed. This behavior is known as incremental MapReduce. You can think of a basic map function in Couchbase as being similar to a SQL CREATE INDEX statement. Effectively, you are defining a column or a set of columns, to be indexed by the server. Of course, these are not columns, but rather properties of the documents to be indexed. Basic mapping To illustrate the process of creating a view, first imagine that we have a set of JSON documents as shown here: var books=[     { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow"     },     { "id": 2, "title": "The Godfather", "author": "Mario Puzzo"     },     { "id": 3, "title": "Wiseguy", "author": "Nicholas Pileggi"     } ]; Each document contains title and author properties. In Couchbase, to query these documents by either title or author, we'd first need to write a map function. Without considering how map functions are written in Couchbase, we're able to understand the process with vanilla JavaScript: books.map(function(book) {   return book.author; }); In the preceding snippet, we're making use of the built-in JavaScript array's map() function. Similar to the Python snippets we saw earlier, JavaScript's map() function takes a function as a parameter and returns a new array with mapped objects. In this case, we'll have an array with each book's author, as follows: ["Robert Ludlow", "Mario Puzzo", "Nicholas Pileggi"] At this point, we have a mapped collection that will be the basis for our author index. However, we haven't provided a means for the index to be able to refer back to its original document. If we were using a relational database, we'd have effectively created an index on the Title column with no way to get back to the row that contained it. With a slight modification to our map function, we are able to provide the key (the id property) of the document as well in our index: books.map(function(book) {   return [book.author, book.id]; }); In this slightly modified version, we're including the ID with the output of each author. In this way, the index has its document's key stored with its title. [["The Bourne Identity", 1], ["The Godfather", 2], ["Wiseguy", 3]] We'll soon see how this structure more closely resembles the values stored in a Couchbase index. Basic reducing Not every Couchbase index requires a reduce component. In fact, we'll see that Couchbase already comes with built-in reduce functions that will provide you with most of the reduce behavior you need. However, before relying on only those functions, it's important to understand why you'd use a reduce function in the first place. Returning to the preceding example of the map, let's imagine we have a few more documents in our set, as follows: var books=[     { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow"     },     { "id": 2, "title": "The Bourne Ultimatum", "author": "Robert Ludlow"     },     { "id": 3, "title": "The Godfather", "author": "Mario Puzzo"     },     { "id": 4, "title": "The Bourne Supremacy", "author": "Robert Ludlow"     },     { "id": 5, "title": "The Family", "author": "Mario Puzzo"     },  { "id": 6, "title": "Wiseguy", "author": "Nicholas Pileggi"     } ]; We'll still create our index using the same map function because it provides a way of accessing a book by its author. Now imagine that we want to know how many books an author has written, or (assuming we had more data) the average number of pages written by an author. These questions are not possible to answer with a map function alone. Each application of the map function knows nothing about the previous application. In other words, there is no way for you to compare or accumulate information about one author's book to another book by the same author. Fortunately, there is a solution to this problem. As you've probably guessed, it's the use of a reduce function. As a somewhat contrived example, consider this JavaScript: mapped = books.map(function (book) {     return ([book.id, book.author]); });   counts = {} reduced = mapped.reduce(function(prev, cur, idx, arr) { var key = cur[1];     if (! counts[key]) counts[key] = 0;     ++counts[key] }, null); This code doesn't quite accurately reflect the way you would count books with Couchbase but it illustrates the basic idea. You look for each occurrence of a key (author) and increment a counter when it is found. With Couchbase MapReduce, the mapped structure is supplied to the reduce() function in a better format. You won't need to keep track of items in a dictionary. Couchbase views At this point, you should have a general sense of what MapReduce is, where it came from, and how it will affect the creation of a Couchbase Server view. So without further ado, let's see how to write our first Couchbase view. In fact, there were two to choose from. The bucket we'll use is beer-sample. If you didn't install it, don't worry. You can add it by opening the Couchbase Console and navigating to the Settings tab. Here, you'll find the option to install the bucket, as shown next: First, you need to understand the document structures with which you're working. The following JSON object is a beer document (abbreviated for brevity): {  "name": "Sundog",  "type": "beer",  "brewery_id": "new_holland_brewing_company",  "description": "Sundog is an amber ale...",  "style": "American-Style Amber/Red Ale",  "category": "North American Ale" } As you can see, the beer documents have several properties. We're going to create an index to let us query these documents by name. In SQL, the query would look like this: SELECT Id FROM Beers WHERE Name = ? You might be wondering why the SQL example includes only the Id column in its projection. For now, just know that to query a document using a view with Couchbase, the property by which you're querying must be included in an index. To create that index, we'll write a map function. The simplest example of a map function to query beer documents by name is as follows: function(doc) {   emit(doc.name); } This body of the map function has only one line. It calls the built-in Couchbase emit() function. This function is used to signal that a value should be indexed. The output of this map function will be an array of names. The beer-sample bucket includes brewery data as well. These documents look like the following code (abbreviated for brevity): {   "name": "Thomas Hooker Brewing",   "city": "Bloomfield",   "state": "Connecticut",   "website": "http://www.hookerbeer.com/",   "type": "brewery" } If we reexamine our map function, we'll see an obvious problem; both the brewery and beer documents have a name property. When this map function is applied to the documents in the bucket, it will create an index with documents from either the brewery or beer documents. The problem is that Couchbase documents exist in a single container—the bucket. There is no namespace for a set of related documents. The solution has typically involved including a type or docType property on each document. The value of this property is used to distinguish one document from another. In the case of the beer-sample database, beer documents have type = "beer" and brewery documents have type = "brewery". Therefore, we are easily able to modify our map function to create an index only on beer documents: function(doc) {   if (doc.type == "beer") {     emit(doc.name);   } } The emit() function actually takes two arguments. The first, as we've seen, emits a value to be indexed. The second argument is an optional value and is used by the reduce function. Imagine that we want to count the number of beer types in a particular category. In SQL, we would write the following query: SELECT Category, COUNT(*) FROM Beers GROUP BY Category To achieve the same functionality with Couchbase Server, we'll need to use both map and reduce functions. First, let's write the map. It will create an index on the category property: function(doc) {   if (doc.type == "beer") {     emit(doc.category, 1);   } } The only real difference between our category index and our name index is that we're including an argument for the value parameter of the emit() function. What we'll do with that value is simply count them. This counting will be done in our reduce function: function(keys, values) {   return values.length; } In this example, the values parameter will be given to the reduce function as a list of all values associated with a particular key. In our case, for each beer category, there will be a list of ones (that is, [1, 1, 1, 1, 1, 1]). Couchbase also provides a built-in _count function. It can be used in place of the entire reduce function in the preceding example. Now that we've seen the basic requirements when creating an actual Couchbase view, it's time to add a view to our bucket. The easiest way to do so is to use the Couchbase Console. Summary In this article, you learned the purpose of secondary indexes in a key/value store. We dug deep into MapReduce, both in terms of its history in functional languages and as a tool for NoSQL and big data systems. Resources for Article: Further resources on this subject: Map Reduce? [article] Introduction to Mapreduce [article] Working with Apps Splunk [article]
Read more
  • 0
  • 0
  • 4795

article-image-creating-brick-breaking-game
Packt
03 Mar 2015
32 min read
Save for later

Creating a Brick Breaking Game

Packt
03 Mar 2015
32 min read
Have you ever thought about procedurally generated levels? Have you thought about how this could be done, how their logic works, and how their resources are managed? With our example bricks game, you will get to the core point of generating colors procedurally for each block, every time the level gets loaded. Physics has always been a huge and massively important topic in the process of developing a game. However, a brick breaking game can be made in many ways and using the many techniques that the engine can provide, but I choose to make it a physics-based game to cover the usage of the new, unique, and amazing component that Epic has recently added to its engine. The Projectile component is a physics-based component for which you can tweak many attributes to get a huge variation of behaviors that you can use with any game genre. By the end of this article by Muhammad A.Moniem, the author of Learning Unreal Engine iOS Game Development, you will be able to: Build your first multicomponent blueprints Understand more about the game modes Script a touch input Understand the Projectile component in depth Build a simple emissive material Use the dynamic material instances Start using the construction scripts Detect collisions Start adding sound effects to the game Restart a level Have a fully functional gameplay (For more resources related to this topic, see here.) The project structure For this game sample, I made a blank project template and selected to use the starter content so that I could get some cubes, spheres, and all other 3D basic meshes that will be used in the game. So, you will find the project structure still in the same basic structure, and the most important folder where you will find all the content is called Blueprints. Building the blueprints The game, as you might see in the project files, contains only four blueprints. As I said earlier, a blueprint can be an object in your world or even a piece of logic without any physical representation inside the game view. The four blueprints responsible for the game are explained here: ball: This is the blueprint that is responsible for the ball rendering and movement. You can consider it as an entity in the game world, as it has its own representation, which is a 3D ball. platform: This one also has its visual representation in the game world. This is the platform that will receive the player input. levelLayout: This one represents the level itself and its layout, walls, blocks, and game camera. bricksBreakingMode: Every game or level made with Unreal Engine should have a game mode blueprint type. This defines the main player, the controller used to control the gameplay, the pawn that works in the same way as the main player but has no input, the HUD for the main UI controller, and the game state that is useful in multiplayer games. Even if you are using the default setting, it will be better to make a space holder one! Gameplay mechanics I've always been a big fan of planning the code before writing or scripting it. So, I'll try to keep the same habit here as well; before making each game, I'll explain how the gameplay workflow should be. With such a habit, you can figure out the weak points of your logic, even if you didn't build it. It helps you develop quickly and more efficiently. As I mentioned earlier, the game has only three working blueprints, and the fourth one is used to organize the level (which is not gameplay logic and has no logic at all). Here are the steps that the game should follow one by one: At the start of the game, the levelLayout blueprint will start instantiating the bricks and set a different color for each one. The levelLayut blueprint sets the rendering camera to the one we want. The ball blueprint starts moving the ball with a proper velocity and sets a dynamic material for the ball mesh. The platform blueprint starts accepting the input events on a frame-by-frame basis from mouse or touch inputs, and sets a dynamic material for the platform mesh. If the ball blueprint hits any other object, it should never speed up or slow down; it should keep the same speed. If the ball blueprint crossed the bottom line, it should restart the level. If the player pressed the screen or clicked on the mouse, the platform blueprint should move only on the y axis to follow the finger or the mouse cursor. If the ball blueprint hits any brick from the levelLayout blueprint, it should destroy it. The ball plays some sound effects. Depending on the surface it hits, it plays a different sound. Starting a new level As the game will be based on one level only and the engine already gives us this new pretty level with a sky dome and light effects with some basic assets, all of this will not be necessary for our game. So, you need to go to the File menu, select New Level, add it somewhere inside your project files, and give it a special name. In my case, I made a new folder named gameScene to hold my level (or any other levels if my game is a multilevel game) and named it mainLevel. Now, this level will never get loaded into the game without forcing the engine to do that. The Unreal Editor gives you a great set of options to define which is the default map/level to be loaded when the game starts or when the editor runs. Even when you ship the game, the Unreal Editor tells us which levels should be shipped and which levels shouldn't be shipped to save some space. Open the Edit menu and then open Project Settings. When the window pops up, select the Maps & Modes section and set Game Default Map to the newly created level. Editor Startup Map should also have the same level: Building the game mode Although a game mode is a blueprint, I prefer to always separate its creation from the creation of the game blueprints, as it contains zero work for logic or even graphs. A game mode is essential for each level, not only for each game. Right-click in an empty space inside your project directory and select Blueprint under the Basic assets section. When the Pick Parent Class window pops up, select the last type of blueprint, which is called Game Mode, and give your newly created blueprint a name, which, in my case, is bricksBreakingMode. Now, we have a game mode for the game level; this mode will not work at all without being connected to the current level (the empty level I made in the previous section) somehow. Go to World Settings by clicking on the icon in the top shelf of the editor (you need to get used to accessing World Settings, as it has so many options that you will need to tweak them to fit your games):   The World Settings panel will be on the right-hand side of your screen. Scroll down to the Game Mode part and select the one you made from the Game Mode Override drop-down menu. If you cannot find the one you've made, just type its name, and the smart menu will search over the project to find it.   Building the game's main material As the game is an iOS game, we should work with caution when adding elements and code to save the game from any performance overhead, glitches, or crashes. Although the engine can run a game with the Light option on an iOS device, I always prefer to stay as far away as possible from using lights/directional lights in an iOS game, as a directional light source on mealtime would mean recalculating all the vertices. So, if the level has 10k vertices with two directional lights, it will be calculated as 30k vertices. The best way to avoid using a light source for such a simple game like the brick breaking game is to build a special material that can emulate a light emission; this material is called an emissive material. In your project panel, right-click in an empty space (perhaps inside the materialsfolder) and choose a material from the Basic Assets section. Give this material a name (which, in my case, is gameEmissiveMaterial) and then double-click to open the material editor. As you can see, the material editor for a default new material is almost empty, apart from one big node that contains the material outputs with a black colored material. To start adding new nodes, you will need to right-click in an empty space of your editor grid and then either select a node or search for nodes by name; both ways work fine.   The emissive material is just a material with Color and Emissive Color; you can see these names in your output list, which means you will need to connect some sort of nodes or graphs to these two sockets of the material output. Now, add the following three new nodes: VectorParameter: This represents the color; you can pick a color by clicking on the color area on the left-hand panel of the screen or on the Default Value parameter. ScalarParameter: This represents a factor to scale the color of the material; you can set its Default Value to 2, which works fine for the game. Multiply: This will multiply two values (the color and the scalar) to give a value to be used for the emission. With these three nodes in your graph, you might figure out how it works. The basic color has to be added to the base color output, and then the Multiply result of the base color and scalar will be added to the emissive color output of the material: You can rename the nodes and give them special names, which will be useful later on. I named the VectorParameter node BaseColor and the Scalar node EmissiveScalar. You can check out the difference between the emissive material you made and another default material by applying both to two meshes in a level without any light. The default material will light the mesh in black as it expects a light source, but the emissive one will make it colored and shiny. Building the blueprints and components I prefer to call all the blueprints for this game actors as all of them will be based on a class in the engine core. This class usually represents any object with or without logic in the level. Although blueprints based on the actor class are not accepting input, you will learn a way to force any actor blueprint to get input events. In this section, you will build the different blueprints for the game and add components for each one of them. Later on, in another section, you will build the logic and graphs. As I always say, building and setting all the components and the default values should be the first thing you do in any game, and then adding the logic should follow. Do not work on both simultaneously! Building the layout blueprint The layout blueprint should include the bricks that the players are going to break, the camera that renders the level, and the walls that the ball is going to collide with. Start making it by adding an Actor blueprint in your project directory. Name it levelLayout and double-click on it to open the blueprint editor. The blueprint editor, by default, contains the following three subeditors inside it; you can navigate between them via the buttons in the top-right corner: Defaults: This is used to set the default values of the blueprint class type Components: This is used to add different components to build and structure the blueprint Graph: This is where we will add scripting logic The majority of the time, you will be working with the components and graph editors only, as the default editor's default values always work the best:   Open the component graph and start adding these components: Camera: This will be the component that renders the game. As you can see in the preceding screenshot, I added one component and left its name as Camera1. It was set as ROOT of the blueprint; it holds all the other components as children underneath its hierarchy. Changed Values: The only value you need to change in the camera component is Projection Mode. You need to set it to Orthographic, as it will be rendered as a 2D game, and keep Ortho Width as 512, as it will make the screen show all the content in a good size. Feel free to use different values based on the content of your level design. Orthographic cameras work without depth, and they are recommended more in 2D games. On the other hand, the perspective camera has more depth, and it is better to be used with any games with 3D content. Static Mesh: To be able to add meshes as boundaries or triggering areas to collide with the ball, you will need to add cubes to work as collision walls, perhaps hidden walls. The best way to add this is by adding four static meshes and aligning and moving them to build them as a scene stage. Renaming all of them is also a good way to go. To be able to distinguish between them, you can name them as I named them: StaticMeshLeftMargin, StaticMeshRightMargin, StaticMeshTopMargin, and StaticMeshBottomMargin. The first three are the left, right, and top margins; they will be working as collision walls to force the ball to bounce in different directions. However, the bottom one will work as a trigger area to restart the level when the ball passes through it. Changed Values: You need to set Static Mesh for them as the cube and then start to scale and move it to build the scene. For the walls, you need to add the Wall tag for the first three meshes in the Component Tags options area, and for the bottom trigger, you need to add another tag; something like deathTrigger works fine. These tags will be used by the gameplay logic to detect whether the ball hits a wall and you need to play a sound or whether it hits a death area and you need to restart the level. In the Collision section for each static mesh, you need to set both SimulationGeneratesHitEvents and GenerateOverlapEvents to True. Also, for CollisionPreset, you can select BlockAll, as this will create solid walls to block any other object from passing: Finally, from the Rendering options section, you need to select the emissive material we have made to be able to see those static meshes, and you need to mark Hidden in Game as True to hide those objects. Keep in mind that you can keep those objects in the game for debugging reasons, and when you are sure that they are in the correct place, you can move to this option again and remark it as True. Billboard: For now, you can think about the billboard component as a point in space with a representation icon, and this is how it is mostly used inside UE4 as the engine does not support an independent transform component yet. However, billboards have always been used to show the contents that always face the camera, such as particles, text, or any other thing you need to always get rendered from the same angle. As the game will be generating the blocks/bricks during the gameplay, you will need to have some points to define where to build or to start building those bricks. You can add five billboard points, rename them, and rearrange them to look like a column. You don't have to change any values for them, as you will be using their position in space values only! I named those five points as firstRowPoint, SecondRowPoint, thirdRowPoint, fourthRowPoint, and fifthRowPoint. Building the ball blueprint Start making the ball blueprint by adding an Actor blueprint in your project directory. Name it Ball and double-click on it to open the blueprint editor. Then, navigate to the Components subeditor if you are not ready. Start adding the following components to the blueprint: The sphere will work as the collision surface for the Ball blueprint. So, for this reason, you will need to set its Collision option to SimulationGeneratesHitEvents and GenerateOverlapEvents to True. Also, set the CollisionPreset option to BlockAll to act in a manner similar to the walls from the layout blueprint. You need to set the SphereRadius option from the Shape section to 26.0 so that it is of a good size that fits the screen's overall size. The process for adding static meshes is the same as you did earlier, but this time, you will need to select a sphere mesh from the standard assets that came with the project. You will also need to set its material to the project default material you made earlier in this article. Also, after selecting it, you might need to adjust its Scale to 0.5 in all three axes to fit the collision sphere size. Feel free to move the static mesh component on the x, y, and z axes till it fits the collision surface. The projectile movement component is the most important one for the Ball blueprint, or perhaps it is the most important one throughout this article, as it is the one responsible for the ball movement and velocity and for its physics behaviors. After adding the components, you will need to make some tweaks to it to allow it to give the behavior that matches the game. Keep in mind that any small amount of change in values or variables will lead you to have a completely different behavior, so feel free to play through the values and test them to get some crazy ideas about what you can achieve and what you can get. For changed values, you need to set Projectile Gravity Scale to 0.0 from within the Projectile options; this will allow the ball to fly in the air without a gravity force to bring it down (or any other direction for a custom gravity). For Projectile Bounces, you will need to mark Should Bounce as True. In this case, the projectile physics will be forced to keep bouncing with the amount of bounciness you set. As you want the ball to keep bouncing over the walls, you need to set the value to 1.0 to give it full bounciness power: From the Velocity section, you will need to enter a velocity for the ball to start using when the game runs; otherwise, the ball will never move. As you want the first bounce of the ball to be towards the blocks, you need to set the Z value to a high number, such as 300, and give it more level design sense. It shouldn't bounce in a vertical line, so it is better to give some force on the horizontal axis Y as well as move the ball in a diagonal direction. So, let's add 300 into Y as well. Building the platform blueprint Start making the platform blueprint by adding an Actor blueprint in your project directory. Name it platform and double-click on it to open the blueprint editor. Then, navigate to the Components subeditor if you are not there already. You will add only one component, and it will work for everything. You want to add a Static Mesh component, but this time, you will be selecting the Pipe mesh; you can select whatever you want, but the pipe works the best. Don't forget to set its material to be the same emissive material as we used earlier to be able to see it in the game view, and set its Collision option to SimulationGeneratesHitEvents and GenerateOverlapEvents to True. Also, CollisionPreset should be set to BlockAll to act in the same manner as the walls from the layout blueprint. Building the graphs and logic Now, as all the blueprints have been set up with their components, it's time to start adding the gameplay logic/scripting. However, to be able to see the result of what you are going to build, you first need to drag and drop the three blueprints inside your scene and organize them to look like an actual level. As the engine is a 3D engine and there is no support yet for 2D physics, you might notice that I added two extra objects to the scene (giant cubes), which I named depthPreservingCube and depthPreservingCube2. These objects are here basically to prevent the ball from moving in the depth axis, which is X in Unreal Editor. This is how both the new preserving cubes look from a top view: One general step that you will perform for all blueprints is to set the dynamic material for them. As you know, you made only one material and applied it to the platform and to the ball. However, you also want both to look different during the gameplay. Changing the material color right now will change both objects' visibility. However, changing it during the gameplay via the construction script and the dynamic material instances feature will allow you to have many colors for many different objects, but they will still share the same material. So, in this step, you will make the platform blueprint and the ball blueprint. I'll explain how to make it for the ball, and you will perform the same steps to make it for the platform. Select the ball blueprint first and double-click to open the editor; then, this time navigate to the subeditor graphs to start working with the nodes. You will see that there are two major tabs inside the graph; one of them is named Construction Script. This unique tab is responsible for the construction of the blueprint itself. Open the Construction Script tab that always has a Construction Script node by default; then, drag and drop the StaticMesh component of the ball from the panel on the left-hand side. This will cause you to have a small context menu that has only two options: Get and Set. Select Get, and this will add a reference to the static mesh. Now, drag a line from Construction Script, leave it in an empty space, add a Create Dynamic Material Instance node from the context menu, and set its Source Material option to the material we want to instance (which is the emissive material). However, keep in mind that if you are using a later version, Epic introduces a more easy way to access the Create Dynamic Material Instance node by just dragging a line from Static Mesh-ball inside Graph, and not Construction Script. Now, connect the static mesh to be the target and drag a line out of Return Value of the Create Dynamic Material Instance node. From the context menu, select the first option, which is Promote to a Variable; this will add a variable to the left-panel list. Feel free to give it a name you can recognize, which, in my case, is thisColor. Now, the whole thing should look like this: Now that you've created the dynamic material instance, you need to set the new color for it. To do this, you need to go back to the event graph and start adding the logic for it. I'll add it to the ball also, and you need to apply it again in Event Graph of the platform blueprint. Add an Event Begin Play node, which is responsible for the execution of some procedurals when the game starts. Drag a wire out of it and select the Set Vector Parameter Value node that is responsible for setting the value for the material. Now, add a reference for the thisColor variable and connect it to Target of the Set Vector Parameter Value node. Last but not least, enter Parameter name that you used to build the material, which, in my case, is BaseColor. Finally, set Value to a color you like; I picked yellow for the ball. Which color would you like to pick? The layout blueprint graph Before you start working with this section, you need to make several copies of the material we made earlier and give each one its own color. I made six different ones to give a variation of six colors to the blocks. The scripts here will be responsible for creating the blocks, changing their colors, and finally, setting the game view to the current camera. To serve this goal, you need to add several variables with several types. Here are some variables: numberOfColumns: This is an integer variable that has a default value of six, which is the total number of columns per row. currentProgressBlockPosition: This is a vector type variable to hold the position of the last created block. It is very important because you are going to add blocks one after the other, so you want to define the position of the last block and then add spacing to it. aBlockMaterial: This is the material that will be applied to a specific block. materialRandomIndex: This is a random integer value to be used for procedural selected colors for each block. To make things more organized, I managed to make several custom events. You can think about them as a set of functions; each one has a block of procedurals to execute: Initialize The Blocks: This Custom Event node has a set of for loops that are working one by one on initializing the target blocks when the game starts. Each loop cycles six times from Index 0 to the number of columns index. When it is finished, it runs the next loop. Each loop body is a custom function itself, and they all run the same set of procedurals, except that they use a different row. chooseRandomMaterial: This custom event handles the process of picking a random material to be applied to in the process of creation. It works by setting a random value between 1 and 6 to the materialRandomIndex variable, and depending on the selected value, the aBlockMaterial variable will be set to a different material. This aBlockMaterial variable is the one that will be used to set the material of each created block in each iteration of the loop for each row. addRowX: I named this X here, but in fact, there are five functions to add the rows; they are addRow1, addRow2, addRow3, addRow4, and addRow5. All of them are responsible for adding rows; the main difference is the start point of adding the row; each one of them uses a different billboard transform, starting from firstRowPoint and ending with fifthRowPoint. You need to connect your first node as Add Static Mesh and set its properties as any other static mesh. You need to set its material to the emissive one. Set Static Mesh to Shape_Pipe_180, give it a brickPiece tag, and set its Collision options to Simulation Generates Hit Events and Generate Overlap Events to True. Also, Collision Preset has to be set to Block All to act in the same manner as the walls from the layout blueprint and receive the hit events, which will be the core of the ball detection. This created mesh will need a transform point to be instantiated in its cords. This is where you will need to pick the row point transform reference (depending on your row, you will select the point number), add it to a Make Transform node, and finally, set the new transform Y Rotation to -90 and its XYZ scale to 0.7, 0.7, 0.5 to fit the correct size and flip the block to have a better convex look. This second part of the addRow event should use the ChooseRandomMaterial custom event that you already made to select a material from among six random ones. Then, you can execute SetMaterial, make its Target the same mesh that was created via Add Static Mesh, and set its Material to aBlockMaterial; the material changes every time the chooseRandomMaterial event gets called. Finally, you can use SetRelativeLocation of the billboard point that is responsible for that row to another position on the y axis, using the Make Vector and Add Int(+) nodes to add 75 units every time as a spacing between every two created blocks: Now, if you check the project files, you will find that the only difference is that there are five functions called addRow, and each of them uses a different billboard as a starting point to add the blocks. Now, if you run the version you made or the one within the project files, you will be able to see the generated blocks, and each time you stop and run the game, you will get a completely different color variation of the blocks. There is one last thing to completely finish this blueprint. As you might have noticed, this blueprint contains the camera in its components. This means it should be the one that holds the functionality of setting this camera to be the rendering camera. So, in EvenBeginPlay, this functionality will be fired when the level starts. You need to connect the the Set View Target With Blend node that will set the camera to the Target camera, and you need to connect Get Player Controller (player 0 is the player number 1) to the Target socket. This blueprint refers to New View Target. Finally, you need to call the initializeTheBlocks custom event, which will call all the other functions. Congratulations! Now you have built your first functional and complex blueprint that contains the main and important functionalities everyone must use in any game. Also, you got the trick of how you can randomly generate or change things such as the color of the blocks to make the levels feel different every time. The Ball blueprint graph The main event node that will be used in the ball graph is Event Hit, which will be fired automatically every time the ball collider hits another collider. If you still remember, while creating the platform, walls, and blocks, we used to add tags for every static mesh to define them. Those names are used now. Using a node called Component Has Tag, we can compare the object component that the ball has hit with the value of the Component Has Tag node, and then, we either get a positive or negative result. So, this is how it should work: Whenever the ball gets hit with another collider, check whether it is a brickPiece tagged component. If this is true, then disable the collision of the brick piece via the Set Collision Enabled node and set it to No Collision to stop responding to any other collisions. Then, hide the brick mesh using the Set Visibility node and keep the New Visibility option unmarked, which means that it will be hidden. Then, play a sound effect of the hit to make it a more dynamic gameplay. You can play sound in many different ways, but let's use the Play Sound at Location node now, use the location of the ball itself, and use the hitBrick sound effect from the Audio folder by assigning it to the Sound slot of the Play Sound at Location node. Finally, reset the velocity of the ball using the Set Velocity node referenced by the Projectile Movement component and set it to XYZ 300, 0, 300: If it wasn't a brickPiece tag, then let's check whether it is Component Has Tag of Wall. If this is the case, then let's use Play Sound at Location, use the location of the ball itself, and use the hitBlockingWall sound effect from the Audio folder by assigning it to the Sound slot of the Play Sound at Location node: If it wasn't tagged with Wall, then check whether it is finally tagged with deathTrigger. If this is the case, then the player has missed it, and the ball is not below the platform. So, you can use the Open Level node to load the level again and assign the level name as mainLevel (or any other level you want to load) to the Level Name slot: The platform blueprint graph The platform blueprint will be the one that receives the input from the player. You just need to define the player input to make the blueprint able to receive those events from the mouse, touch, or any other available input device. To do this, there are two ways, and I always like to use both these ways: Enable input node: I assume that you've already added the scripting nodes inside Event graph to set the dynamic material color via Set Vector Parameter Value. This means you already have an Event Begin Play node, so you need to connect its network to another node called Enable Input; this node is responsible for forcing the current blueprint to accept input events. Finally, you can set its Player Controller value to a Get Player Controller node and leave Player Index as 0 for the player number 1: Autoreceive input option: By selecting the platform blueprint instance that you've dropped inside the scene from the Scene Outliner, you will see that it has many options in the Details panel on the right-hand side. By changing the Auto Receive Input option to Player 0 under the Input option, this will have the same effect as the previous solution: Now, we can build the logic for the platform movement, and anything that is built can be tested directly in the editor or on the device. I prefer to break the logic into two pieces, and this will make it easier than it looks like for you: Get the touch state: In this phase, you will use the Input Touch event that can be executed when a touch gets pressed or released. So based on the touch state, you will check via a Branch node whether the state is True or False. Your condition for this node should be Touch 1 index, as the game will not need more than one touch. Based on the state, I would like to set a custom Boolean variable named Touched and set its value to match the touch state. Then, you can add a Gate node to control the execution of the following procedurals based on the touch state (Pressed or Released) by connecting the two cases with the Open gate and the Close gate execution sockets. Finally, you can set the actor location and set it to use the Self actor as its target (which is the platform actor/blueprint) to change the platform location based on touches. Defining the New Location value is the next chunk of the logic: Actor location: Using a Make Vector node, you can construct a new point position in the world made of X, Y, and Z coordinates. As the y axis will be the horizontal position, which will be based on the player's touch, only this needs to be changed over time. However, the X and Z positions will stay the same all the time, as the platform will never move vertically or in depth. The new vector position will be based on the touch phase. If the player is pressing, then the position should be matching the touch input position. However, if the players are not pressing, then the position should be the same as the last point the player had pressed. I managed to make a float variable named horizontalAxis; this variable will hold the correct Y position to be added to the Make Vector node. If the player is pressing the screen, then you need to get the finger press position by returning Impact Point by Break Hit Result via a Get Hit Result Under FingerBy Channel node from the current active player. However, if the player is not touching the screen, then the horizontalAxis variable should stay the same as the last-know location for the Self actor. Then, it will set as it is into the MakeVector Y position value: Now, you can save and build all the blueprints. Don't hesitate now or any time during the process of building the game logic to build or launch the game into a real device to check where you are. The best way to learn more about the nodes and those minor changes is by building all the time into the divide and changing some values every time. Summary In this article, you went through the process of building your first Unreal iOS game. Also, you got used to making blueprints by adding nodes in different ways, connecting nodes, and adding several component types into the blueprint and changing its values. Also, you learned how to enable input in an actor blueprint and get the touch and mouse input and fit them to your custom use. You also got your hands on one of the most famous and powerful rendering techniques in the editor, which is called dynamic material instancing. You learned how to make a custom material and change its parameters whenever you want. Procedurally, changing the look of the level is something interesting nowadays, and we barely scratched its surface by setting different materials every time we load the level. Resources for Article: Further resources on this subject: UnrealScript Game Programming Cookbook [article] Unreal Development Toolkit: Level Design HQ [article] The Unreal Engine [article]
Read more
  • 0
  • 0
  • 8162

article-image-introducing-splunk
Packt
03 Mar 2015
14 min read
Save for later

Introducing Splunk

Packt
03 Mar 2015
14 min read
In this article by Betsy Page Sigman, author of the book Splunk Essentials, Splunk, whose "name was inspired by the process of exploring caves, or splunking, helps analysts, operators, programmers, and many others explore data from their organizations by obtaining, analyzing, and reporting on it. This multinational company, cofounded by Michael Baum, Rob Das, and Erik Swan, has a core product called "Splunk Enterprise. This manages searches, inserts, deletes, and filters, and analyzes big data that is generated by machines, as well as other types of data. "They also have a free version that has most of the capabilities of Splunk Enterprise and is an excellent learning tool. (For more resources related to this topic, see here.) Understanding events, event types, and fields in Splunk An understanding of events and event types is important before going further. Events In Splunk, an event is not just one of" the many local user meetings that are set up between developers to help each other out (although those can be very useful), "but also refers to a record of one activity that is recorded in a log file. Each event usually has: A timestamp indicating the date and exact time the event was created Information about what happened on the system that is being tracked Event types An event type is a way to allow "users to categorize similar events. It is field-defined by the user. You can define an event type in several ways, and the easiest way is by using the SplunkWeb interface. One common reason for setting up an event type is to examine why a system has failed. Logins are often problematic for systems, and a search for failed logins can help pinpoint problems. For an interesting example of how to save "a search on failed logins as an event type, visit http://docs.splunk.com/Documentation/Splunk/6.1.3/Knowledge/ClassifyAndGroupSimilarEvents#Save_a_search_as_a_new_event_type. Why are events and event types so important in Splunk? Because without events, there would be nothing to search, of course. And event types allow us to make meaningful searches easily and quickly according to our needs, as we'll see later. Sourcetypes Sourcetypes are also "important to understand, as they help define the rules for an event. A sourcetype is one of the default fields that Splunk assigns to data as it comes into the system. It determines what type of data it is so that Splunk can format it appropriately as it indexes it. This also allows the user who wants to search the "data to easily categorize it. Some of the common sourcetypes are listed as follows: access_combined, for "NCSA combined format HTTP web server logs apache_error, for standard "Apache web server error logs cisco_syslog, for the "standard syslog produced by Cisco network devices (including PIX firewalls, routers, and ACS), usually via remote syslog to a central log host websphere_core, a core file" export from WebSphere (Source: http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter) Fields Each event in Splunk is" associated with a number of fields. The core fields of host, course, sourcetype, and timestamp are key to Splunk. These fields are extracted from events at multiple points in the data processing pipeline that Splunk uses, and each of these fields includes a name and a value. The name describes the field (such as the userid) and the value says what that field's value is (susansmith, for example). Some of these fields are default fields that are given because of where the event came from or what it is. When data is processed by Splunk, and when it is indexed or searched, it uses these fields. For indexing, the default fields added include those of host, source, and sourcetype. When searching, Splunk is able to select from a bevy of fields that can either be defined by the user or are very basic, such as action results in a purchase (for a website event). Fields are essential for doing the basic work of Splunk – that is, indexing and searching. Getting data into Splunk It's time to spring into action" now and input some data into Splunk. Adding data is "simple, easy, and quick. In this section, we will use some data and tutorials created by Splunk to learn how to add data: Firstly, to obtain your data, visit the tutorial data at http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchTutorial/GetthetutorialdataintoSplunk that is readily available on Splunk. Here, download the folder tutorialdata.zip. Note that this will be a fresh dataset that has been collected over the last 7 days. Download it but don't extract the data from it just yet. You then need to log in to Splunk, using admin as the username and then by using your password. Once logged in, you will notice that toward the upper-right corner of your screen is the button Add Data, as shown in the following screenshot. Click "on this button: Button to Add Data Once you have "clicked on this button, you'll see a screen" similar to the "following screenshot: Add Data to Splunk by Choosing a Data Type or Data Source Notice here the "different types of data that you can select, as "well as the different data sources. Since the data we're going to use is a file, under "Or Choose a Data Source, click on From files and directories. Once you have clicked on this, you can then click on the radio button next to Skip preview, as indicated in the following screenshot, since you don't need to preview the data" now. You then need to click on "Continue: Preview data You can download the tutorial files at: http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchTutorial/GetthetutorialdataintoSplunk As shown in the next screenshot, click on Upload and index a file, find the tutorialdata.zip file you just downloaded (it is probably in your Downloads folder), and then click on More settings, filling it in as shown in the following screenshot. (Note that you will need to select Segment in path under Host and type 1 under Segment Number.) Click on Save when you are done: Can specify source, additional settings, and source type Following this, you "should see a screen similar to the following" screenshot. Click on Start Searching, we will look at the data now: You should see this if your data has been successfully indexed into Splunk. You will now" see a screen similar to the following" screenshot. Notice that the number of events you have will be different, as will the time of the earliest event. At this point, click on Data Summary: The Search screen You should see the Data Summary screen like in the following screenshot. However, note that the Hosts shown here will not be the same as the ones you get. Take a quick look at what is on the Sources tab and the Sourcetypes tab. Then find the most recent data (in this case 127.0.0.1) and click on it. Data Summary, where you can see Hosts, Sources, and Sourcetypes After" clicking on the most recent data, which in "this case is bps-T341s, look at the events contained there. Later, when we use streaming data, we can see how the events at the top of this list change rapidly. Here, you will see a listing of events, similar to those shown in the "following screenshot: Events lists for the host value You can click on the Splunk logo in the upper-left corner "of the web page to return to the home page. Under Administrator at the "top-right of the page, click on Logout. Searching Twitter data We will start here by doing a simple search of our Twitter index, which is automatically created by the app once you have enabled Twitter input (as explained previously). In our earlier searches, we used the default index (which the tutorial data was downloaded to), so we didn't have to specify the index we wanted to use. Here, we will use just the Twitter index, so we need to specify that in the search. A simple search Imagine that we wanted to search for tweets containing the word coffee. We could use the code presented here and place it in the search bar: index=twitter text=*coffee* The preceding code searches only your Twitter index and finds all the places where the word coffee is mentioned. You have to put asterisks there, otherwise you will only get the tweets with just "coffee". (Note that the text field is not case sensitive, so tweets with either "coffee" or "Coffee" will be included in the search results.) The asterisks are included before and after the text "coffee" because otherwise we would only get events where just "coffee" was tweeted – a rather rare occurrence, we expect. In fact, when we search our indexed Twitter data without the asterisks around coffee, we got no results. Examining the Twitter event Before going further, it is useful to stop and closely examine the events that are collected as part of the search. The sample tweet shown in the following screenshot shows the large number of fields that are part of each tweet. The > was clicked to expand the event: A Twitter event There are several items to look closely at here: _time: Splunk assigns a timestamp for every event. This is done in UTC (Coordinated Universal Time) time format. contributors: The value for this field is null, as are the values of many Twitter fields. Retweeted_status: Notice the {+} here; in the following event list, you will see there are a number of fields associated with this, which can be seen when the + is selected and the list is expanded. This is the case wherever you see a {+} in a list of fields: Various retweet fields In addition to those shown previously, there are many other fields associated with a tweet. The 140 character (maximum) text field that most people consider to be the tweet is actually a small part of the actual data collected. The implied AND If you want to search on more than one term, there is no need to add AND as it is already implied. If, for example, you want to search for all tweets that include both the text "coffee" and the text "morning", then use: index=twitter text=*coffee* text=*morning* If you don't specify text= for the second term and just put *morning*, Splunk assumes that you want to search for *morning* in any field. Therefore, you could get that word in another field in an event. This isn't very likely in this case, although coffee could conceivably be part of a user's name, such as "coffeelover". But if you were searching for other text strings, such as a computer term like log or error, such terms could be found in a number of fields. So specifying the field you are interested in would be very important. The need to specify OR Unlike AND, you must always specify the word OR. For example, to obtain all events that mention either coffee or morning, enter: index=twitter text=*coffee* OR text=*morning* Finding other words used Sometimes you might want to find out what other words are used in tweets about coffee. You can do that with the following search: index=twitter text=*coffee* | makemv text | mvexpand text | top 30 text This search first searches for the word "coffee" in a text field, then creates a multivalued field from the tweet, and then expands it so that each word is treated as a separate piece of text. Then it takes the top 30 words that it finds. You might be asking yourself how you would use this kind of information. This type of analysis would be of interest to a marketer, who might want to use words that appear to be associated with coffee in composing the script for an advertisement. The following screenshot shows the results that appear (1 of 2 pages). From this search, we can see that the words love, good, and cold might be words worth considering: Search of top 30 text fields found with *coffee* When you do a search like this, you will notice that there are a lot of filler words (a, to, for, and so on) that appear. You can do two things to remedy this. You can increase the limit for top words so that you can see more of the words that come up, or you can rerun the search using the following code. "Coffee" (with a capital C) is listed (on the unshown second page) separately here from "coffee". The reason for this is that while the search is not case sensitive (thus both "coffee" and "Coffee" are picked up when you search on "coffee"), the process of putting the text fields through the makemv and the mvexpand processes ends up distinguishing on the basis of case. We could rerun the search, excluding some of the filler words, using the code shown here: index=twitter text=*coffee* | makemv text | mvexpand text |search NOT text="RT" AND NOT text="a" AND NOT text="to" ANDNOT text="the" | top 30 text Using a lookup table Sometimes it is useful to use a lookup file to avoid having to use repetitive code. It would help us to have a list of all the small words that might be found often in a tweet just by the nature of each word's frequent use in language, so that we might eliminate them from our quest to find words that would be relevant for use in the creation of advertising. If we had a file of such small words, we could use a command indicating not to use any of these more common, irrelevant words when listing the top 30 words associated with our search topic of interest. Thus, for our search for words associated with the text "coffee", we would be interested in words like " dark", "flavorful", and "strong", but not words like "a", "the", and "then". We can do this using a lookup command. There are three types of lookup commands, which are presented in the following table: Command Description lookup Matches a value of one field with a value of another, based on a .csv file with the two fields. Consider a lookup table named lutable that contains fields for machine_name and owner. Consider what happens when the following code snippet is used after a preceding search (indicated by . . . |): . . . | lookup lutable owner Splunk will use the lookup table to match the owner's name with its machine_name and add the machine_name to each event. inputlookup All fields in the .csv file are returned as results. If the following code snippet is used, both machine_name and owner would be searched: . . . | inputlookup lutable outputlookup This code outputs search results to a lookup table. The following code outputs results from the preceding research directly into a table it creates: . . . | outputlookup newtable.csv saves The command we will use here is inputlookup, because we want to reference a .csv file we can create that will include words that we want to filter out as we seek to find possible advertising words associated with coffee. Let's call the .csv file filtered_words.csv, and give it just a single text field, containing words like "is", "the", and "then". Let's rewrite the search to look like the following code: index=twitter text=*coffee*| makemv text | mvexpand text| search NOT [inputlookup filtered_words | fields text ]| top 30 text Using the preceding code, Splunk will search our Twitter index for *coffee*, and then expand the text field so that individual words are separated out. Then it will look for words that do NOT match any of the words in our filtered_words.csv file, and finally output the top 30 most frequently found words among those. As you can see, the lookup table can be very useful. To learn more about Splunk lookup tables, go to http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchReference/Lookup. Summary In this article, we have learned more about how to use Splunk to create reports, dashboards. Splunk Enterprise Software, or Splunk, is an extremely powerful tool for searching, exploring, and visualizing data of all types. Splunk is becoming increasingly popular, as more and more businesses, both large and small, discover its ease and usefulness. Analysts, managers, students, and others can quickly learn how to use the data from their systems, networks, web traffic, and social media to make attractive and informative reports. This is a straightforward, practical, and quick introduction to Splunk that should have you making reports and gaining insights from your data in no time. Resources for Article: Further resources on this subject: Lookups [article] Working with Apps in Splunk [article] Loading data, creating an app, and adding dashboards and reports in Splunk [article]
Read more
  • 0
  • 0
  • 11723

article-image-basics-programming-julia
Packt
03 Mar 2015
17 min read
Save for later

Basics of Programming in Julia

Packt
03 Mar 2015
17 min read
 In this article by Ivo Balbaert, author of the book Getting Started with Julia Programming, we will explore how Julia interacts with the outside world, reading from standard input and writing to standard output, files, networks, and databases. Julia provides asynchronous networking I/O using the libuv library. We will see how to handle data in Julia. We will also discover the parallel processing model of Julia. In this article, the following topics are covered: Working with files (including the CSV files) Using DataFrames (For more resources related to this topic, see here.) Working with files To work with files, we need the IOStream type. IOStream is a type with the supertype IO and has the following characteristics: The fields are given by names(IOStream) 4-element Array{Symbol,1}:  :handle   :ios    :name   :mark The types are given by IOStream.types (Ptr{None}, Array{Uint8,1}, String, Int64) The file handle is a pointer of the type Ptr, which is a reference to the file object. Opening and reading a line-oriented file with the name example.dat is very easy: // code in Chapter 8io.jl fname = "example.dat"                                 f1 = open(fname) fname is a string that contains the path to the file, using escaping of special characters with when necessary; for example, in Windows, when the file is in the test folder on the D: drive, this would become d:\test\example.dat. The f1 variable is now an IOStream(<file example.dat>) object. To read all lines one after the other in an array, use data = readlines(f1), which returns 3-element Array{Union(ASCIIString,UTF8String),1}: "this is line 1.rn" "this is line 2.rn" "this is line 3." For processing line by line, now only a simple loop is needed: for line in data   println(line) # or process line end close(f1) Always close the IOStream object to clean and save resources. If you want to read the file into one string, use readall. Use this only for relatively small files because of the memory consumption; this can also be a potential problem when using readlines. There is a convenient shorthand with the do syntax for opening a file, applying a function process, and closing it automatically. This goes as follows (file is the IOStream object in this code): open(fname) do file     process(file) end The do command creates an anonymous function, and passes it to open. Thus, the previous code example would have been equivalent to open(process, fname). Use the same syntax for processing a file fname line by line without the memory overhead of the previous methods, for example: open(fname) do file     for line in eachline(file)         print(line) # or process line     end end Writing a file requires first opening it with a "w" flag, then writing strings to it with write, print, or println, and then closing the file handle that flushes the IOStream object to the disk: fname =   "example2.dat" f2 = open(fname, "w") write(f2, "I write myself to a filen") # returns 24 (bytes written) println(f2, "even with println!") close(f2) Opening a file with the "w" option will clear the file if it exists. To append to an existing file, use "a". To process all the files in the current folder (or a given folder as an argument to readdir()), use this for loop: for file in readdir()   # process file end Reading and writing CSV files A CSV file is a comma-separated file. The data fields in each line are separated by commas "," or another delimiter such as semicolons ";". These files are the de-facto standard for exchanging small and medium amounts of tabular data. Such files are structured so that one line contains data about one data object, so we need a way to read and process the file line by line. As an example, we will use the data file Chapter 8winequality.csv that contains 1,599 sample measurements, 12 data columns, such as pH and alcohol per sample, separated by a semicolon. In the following screenshot, you can see the top 20 rows:   In general, the readdlm function is used to read in the data from the CSV files: # code in Chapter 8csv_files.jl: fname = "winequality.csv" data = readdlm(fname, ';') The second argument is the delimiter character (here, it is ;). The resulting data is a 1600x12 Array{Any,2} array of the type Any because no common type could be found:     "fixed acidity"   "volatile acidity"      "alcohol"   "quality"      7.4                        0.7                                9.4              5.0      7.8                        0.88                              9.8              5.0      7.8                        0.76                              9.8              5.0   … If the data file is comma separated, reading it is even simpler with the following command: data2 = readcsv(fname) The problem with what we have done until now is that the headers (the column titles) were read as part of the data. Fortunately, we can pass the argument header=true to let Julia put the first line in a separate array. It then naturally gets the correct datatype, Float64, for the data array. We can also specify the type explicitly, such as this: data3 = readdlm(fname, ';', Float64, 'n', header=true) The third argument here is the type of data, which is a numeric type, String or Any. The next argument is the line separator character, and the fifth indicates whether or not there is a header line with the field (column) names. If so, then data3 is a tuple with the data as the first element and the header as the second, in our case, (1599x12 Array{Float64,2}, 1x12 Array{String,2}) (There are other optional arguments to define readdlm, see the help option). In this case, the actual data is given by data3[1] and the header by data3[2]. Let's continue working with the variable data. The data forms a matrix, and we can get the rows and columns of data using the normal array-matrix syntax). For example, the third row is given by row3 = data[3, :] with data:  7.8  0.88  0.0  2.6  0.098  25.0  67.0  0.9968  3.2  0.68  9.8  5.0, representing the measurements for all the characteristics of a certain wine. The measurements of a certain characteristic for all wines are given by a data column, for example, col3 = data[ :, 3] represents the measurements of citric acid and returns a column vector 1600-element Array{Any,1}:   "citric acid" 0.0  0.0  0.04  0.56  0.0  0.0 …  0.08  0.08  0.1  0.13  0.12  0.47. If we need columns 2-4 (volatile acidity to residual sugar) for all wines, extract the data with x = data[:, 2:4]. If we need these measurements only for the wines on rows 70-75, get these with y = data[70:75, 2:4], returning a 6 x 3 Array{Any,2} outputas follows: 0.32   0.57  2.0 0.705  0.05  1.9 … 0.675  0.26  2.1 To get a matrix with the data from columns 3, 6, and 11, execute the following command: z = [data[:,3] data[:,6] data[:,11]] It would be useful to create a type Wine in the code. For example, if the data is to be passed around functions, it will improve the code quality to encapsulate all the data in a single data type, like this: type Wine     fixed_acidity::Array{Float64}     volatile_acidity::Array{Float64}     citric_acid::Array{Float64}     # other fields     quality::Array{Float64} end Then, we can create objects of this type to work with them, like in any other object-oriented language, for example, wine1 = Wine(data[1, :]...), where the elements of the row are splatted with the ... operator into the Wine constructor. To write to a CSV file, the simplest way is to use the writecsv function for a comma separator, or the writedlm function if you want to specify another separator. For example, to write an array data to a file partial.dat, you need to execute the following command: writedlm("partial.dat", data, ';') If more control is necessary, you can easily combine the more basic functions from the previous section. For example, the following code snippet writes 10 tuples of three numbers each to a file: // code in Chapter 8tuple_csv.jl fname = "savetuple.csv" csvfile = open(fname,"w") # writing headers: write(csvfile, "ColName A, ColName B, ColName Cn") for i = 1:10   tup(i) = tuple(rand(Float64,3)...)   write(csvfile, join(tup(i),","), "n") end close(csvfile) Using DataFrames If you measure n variables (each of a different type) of a single object of observation, then you get a table with n columns for each object row. If there are m observations, then we have m rows of data. For example, given the student grades as data, you might want to know "compute the average grade for each socioeconomic group", where grade and socioeconomic group are both columns in the table, and there is one row per student. The DataFrame is the most natural representation to work with such a (m x n) table of data. They are similar to pandas DataFrames in Python or data.frame in R. A DataFrame is a more specialized tool than a normal array for working with tabular and statistical data, and it is defined in the DataFrames package, a popular Julia library for statistical work. Install it in your environment by typing in Pkg.add("DataFrames") in the REPL. Then, import it into your current workspace with using DataFrames. Do the same for the packages DataArrays and RDatasets (which contains a collection of example datasets mostly used in the R literature). A common case in statistical data is that data values can be missing (the information is not known). The DataArrays package provides us with the unique value NA, which represents a missing value, and has the type NAtype. The result of the computations that contain the NA values mostly cannot be determined, for example, 42 + NA returns NA. (Julia v0.4 also has a new Nullable{T} type, which allows you to specify the type of a missing value). A DataArray{T} array is a data structure that can be n-dimensional, behaves like a standard Julia array, and can contain values of the type T, but it can also contain the missing (Not Available) values NA and can work efficiently with them. To construct them, use the @data macro: // code in Chapter 8dataarrays.jl using DataArrays using DataFrames dv = @data([7, 3, NA, 5, 42]) This returns 5-element DataArray{Int64,1}: 7  3   NA  5 42. The sum of these numbers is given by sum(dv) and returns NA. One can also assign the NA values to the array with dv[5] = NA; then, dv becomes [7, 3, NA, 5, NA]). Converting this data structure to a normal array fails: convert(Array, dv) returns ERROR: NAException. How to get rid of these NA values, supposing we can do so safely? We can use the dropna function, for example, sum(dropna(dv)) returns 15. If you know that you can replace them with a value v, use the array function: repl = -1 sum(array(dv, repl)) # returns 13 A DataFrame is a kind of an in-memory database, versatile in the ways you can work with the data. It consists of columns with names such as Col1, Col2, Col3, and so on. Each of these columns are DataArrays that have their own type, and the data they contain can be referred to by the column names as well, so we have substantially more forms of indexing. Unlike two-dimensional arrays, columns in a DataFrame can be of different types. One column might, for instance, contain the names of students and should therefore be a string. Another column could contain their age and should be an integer. We construct a DataFrame from the program data as follows: // code in Chapter 8dataframes.jl using DataFrames # constructing a DataFrame: df = DataFrame() df[:Col1] = 1:4 df[:Col2] = [e, pi, sqrt(2), 42] df[:Col3] = [true, false, true, false] show(df) Notice that the column headers are used as symbols. This returns the following 4 x 3 DataFrame object: We could also have used the full constructor as follows: df = DataFrame(Col1 = 1:4, Col2 = [e, pi, sqrt(2), 42],    Col3 = [true, false, true, false]) You can refer to the columns either by an index (the column number) or by a name, both of the following expressions return the same output: show(df[2]) show(df[:Col2]) This gives the following output: [2.718281828459045, 3.141592653589793, 1.4142135623730951,42.0] To show the rows or subsets of rows and columns, use the familiar splice (:) syntax, for example: To get the first row, execute df[1, :]. This returns 1x3 DataFrame.  | Row | Col1 | Col2    | Col3 |  |-----|------|---------|------|  | 1   | 1    | 2.71828 | true | To get the second and third row, execute df [2:3, :] To get only the second column from the previous result, execute df[2:3, :Col2]. This returns [3.141592653589793, 1.4142135623730951]. To get the second and third column from the second and third row, execute df[2:3, [:Col2, :Col3]], which returns the following output: 2x2 DataFrame  | Row | Col2    | Col3  |  |---- |-----   -|-------|  | 1   | 3.14159 | false |  | 2   | 1.41421 | true  | The following functions are very useful when working with DataFrames: The head(df) and tail(df) functions show you the first six and the last six lines of data respectively. The names function gives the names of the columns names(df). It returns 3-element Array{Symbol,1}:  :Col1  :Col2  :Col3. The eltypes function gives the data types of the columns eltypes(df). It gives the output as 3-element Array{Type{T<:Top},1}:  Int64  Float64  Bool. The describe function tries to give some useful summary information about the data in the columns, depending on the type, for example, describe(df) gives for column 2 (which is numeric) the min, max, median, mean, number, and percentage of NAs: Col2 Min      1.4142135623730951 1st Qu.  2.392264761937558  Median   2.929937241024419 Mean     12.318522011105483  3rd Qu.  12.856194490192344  Max      42.0  NAs      0  NA%      0.0% To load in data from a local CSV file, use the method readtable. The returned object is of type DataFrame: // code in Chapter 8dataframes.jl using DataFrames fname = "winequality.csv" data = readtable(fname, separator = ';') typeof(data) # DataFrame size(data) # (1599,12) Here is a fraction of the output: The readtable method also supports reading in gzipped CSV files. Writing a DataFrame to a file can be done with the writetable function, which takes the filename and the DataFrame as arguments, for example, writetable("dataframe1.csv", df). By default, writetable will use the delimiter specified by the filename extension and write the column names as headers. Both readtable and writetable support numerous options for special cases. Refer to the docs for more information (refer to http://dataframesjl.readthedocs.org/en/latest/). To demonstrate some of the power of DataFrames, here are some queries you can do: Make a vector with only the quality information data[:quality] Give the wines with alcohol percentage equal to 9.5, for example, data[ data[:alcohol] .== 9.5, :] Here, we use the .== operator, which does element-wise comparison. data[:alcohol] .== 9.5 returns an array of Boolean values (true for datapoints, where :alcohol is 9.5, and false otherwise). data[boolean_array, : ] selects those rows where boolean_array is true. Count the number of wines grouped by quality with by(data, :quality, data -> size(data, 1)), which returns the following: 6x2 DataFrame | Row | quality | x1  | |-----|---------|-----| | 1    | 3      | 10  | | 2    | 4      | 53  | | 3    | 5      | 681 | | 4    | 6      | 638 | | 5    | 7      | 199 | | 6    | 8      | 18  | The DataFrames package contains the by function, which takes in three arguments: A DataFrame, here it takes data A column to split the DataFrame on, here it takes quality A function or an expression to apply to each subset of the DataFrame, here data -> size(data, 1), which gives us the number of wines for each quality value Another easy way to get the distribution among quality is to execute the histogram hist function hist(data[:quality]) that gives the counts over the range of quality (2.0:1.0:8.0,[10,53,681,638,199,18]). More precisely, this is a tuple with the first element corresponding to the edges of the histogram bins, and the second denoting the number of items in each bin. So there are, for example, 10 wines with quality between 2 and 3, and so on. To extract the counts as a variable count of type Vector, we can execute _, count = hist(data[:quality]); the _ means that we neglect the first element of the tuple. To obtain the quality classes as a DataArray class, we will execute the following: class = sort(unique(data[:quality])) We can now construct a df_quality DataFrame with the class and count columns as df_quality = DataFrame(qual=class, no=count). This gives the following output: 6x2 DataFrame | Row | qual | no  | |-----|------|-----| | 1   | 3    | 10  | | 2   | 4    | 53  | | 3   | 5    | 681 | | 4   | 6    | 638 | | 5   | 7    | 199 | | 6   | 8    | 18  | To deepen your understanding and learn about the other features of Julia DataFrames (such as joining, reshaping, and sorting), refer to the documentation available at http://dataframesjl.readthedocs.org/en/latest/. Other file formats Julia can work with other human-readable file formats through specialized packages: For JSON, use the JSON package. The parse method converts the JSON strings into Dictionaries, and the json method turns any Julia object into a JSON string. For XML, use the LightXML package For YAML, use the YAML package For HDF5 (a common format for scientific data), use the HDF5 package For working with Windows INI files, use the IniFile package Summary In this article we discussed the basics of network programming in Julia. Resources for Article: Further resources on this subject: Getting Started with Electronic Projects? [article] Getting Started with Selenium Webdriver and Python [article] Handling The Dom In Dart [article]
Read more
  • 0
  • 0
  • 18945

article-image-scipy-signal-processing
Packt
03 Mar 2015
14 min read
Save for later

SciPy for Signal Processing

Packt
03 Mar 2015
14 min read
In this article by Sergio J. Rojas G. and Erik A Christensen, authors of the book Learning SciPy for Numerical and Scientific Computing - Second Edition, we will focus on the usage of some most commonly used routines that are included in SciPy modules—scipy.signal, scipy.ndimage, and scipy.fftpack, which are used for signal processing, multidimensional image processing, and computing Fourier transforms, respectively. We define a signal as data that measures either a time-varying or spatially varying phenomena. Sound or electrocardiograms are excellent examples of time-varying quantities, while images embody the quintessential spatially varying cases. Moving images are treated with the techniques of both types of signals, obviously. The field of signal processing treats four aspects of this kind of data: its acquisition, quality improvement, compression, and feature extraction. SciPy has many routines to treat effectively tasks in any of the four fields. All these are included in two low-level modules (scipy.signal being the main module, with an emphasis on time-varying data, and scipy.ndimage, for images). Many of the routines in these two modules are based on Discrete Fourier Transform of the data. SciPy has an extensive package of applications and definitions of these background algorithms, scipy.fftpack, which we will start covering first. (For more resources related to this topic, see here.) Discrete Fourier Transforms The Discrete Fourier Transform (DFT from now on) transforms any signal from its time/space domain into a related signal in the frequency domain. This allows us not only to be able to analyze the different frequencies of the data, but also for faster filtering operations, when used properly. It is possible to turn a signal in the frequency domain back to its time/spatial domain; thanks to the Inverse Fourier Transform. We will not go into detail of the mathematics behind these operators, since we assume familiarity at some level with this theory. We will focus on syntax and applications instead. The basic routines in the scipy.fftpack module compute the DFT and its inverse, for discrete signals in any dimension, which are fft and ifft (one dimension), fft2 and ifft2 (two dimensions), and fftn and ifftn (any number of dimensions). All of these routines assume that the data is complex valued. If we know beforehand that a particular dataset is actually real valued, and should offer real-valued frequencies, we use rfft and irfft instead, for a faster algorithm. All these routines are designed so that composition with their inverses always yields the identity. The syntax is the same in all cases, as follows: fft(x[, n, axis, overwrite_x]) The first parameter, x, is always the signal in any array-like form. Note that fft performs one-dimensional transforms. This means in particular, that if x happens to be two-dimensional, for example, fft will output another two-dimensional array where each row is the transform of each row of the original. We can change it to columns instead, with the optional parameter, axis. The rest of parameters are also optional; n indicates the length of the transform, and overwrite_x gets rid of the original data to save memory and resources. We usually play with the integer n when we need to pad the signal with zeros, or truncate it. For higher dimension, n is substituted by shape (a tuple), and axis by axes (another tuple). To better understand the output, it is often useful to shift the zero frequencies to the center of the output arrays with fftshift. The inverse of this operation, ifftshift, is also included in the module. The following code shows some of these routines in action, when applied to a checkerboard image: >>> import numpy >>> from scipy.fftpack import fft,fft2, fftshift >>> import matplotlib.pyplot as plt >>> B=numpy.ones((4,4)); W=numpy.zeros((4,4)) >>> signal = numpy.bmat("B,W;W,B") >>> onedimfft = fft(signal,n=16) >>> twodimfft = fft2(signal,shape=(16,16)) >>> plt.figure() >>> plt.gray() >>> plt.subplot(121,aspect='equal') >>> plt.pcolormesh(onedimfft.real) >>> plt.colorbar(orientation='horizontal') >>> plt.subplot(122,aspect='equal') >>> plt.pcolormesh(fftshift(twodimfft.real)) >>> plt.colorbar(orientation='horizontal') >>> plt.show() Note how the first four rows of the one-dimensional transform are equal (and so are the last four), while the two-dimensional transform (once shifted) presents a peak at the origin, and nice symmetries in the frequency domain. In the following screenshot (obtained from the preceding code), the left-hand side image is fft and the right-hand side image is fft2 of a 2 x 2 checkerboard signal: The scipy.fftpack module also offers the Discrete Cosine Transform with its inverse (dct, idct) as well as many differential and pseudo-differential operators defined in terms of all these transforms: diff (for derivative/integral), hilbert and ihilbert (for the Hilbert transform), tilbert and itilbert (for the h-Tilbert transform of periodic sequences), and so on. Signal construction To aid in the construction of signals with predetermined properties, the scipy.signal module has a nice collection of the most frequent one-dimensional waveforms in the literature: chirp and sweep_poly (for the frequency-swept cosine generator), gausspulse (a Gaussian modulated sinusoid) and sawtooth and square (for the waveforms with those names). They all take as their main parameter a one-dimensional ndarray representing the times at which the signal is to be evaluated. Other parameters control the design of the signal, according to frequency or time constraints. Let's take a look into the following code snippet, which illustrates the use of these one dimensional waveforms that we just discussed: >>> import numpy >>> from scipy.signal import chirp, sawtooth, square, gausspulse >>> import matplotlib.pyplot as plt >>> t=numpy.linspace(-1,1,1000) >>> plt.subplot(221); plt.ylim([-2,2]) >>> plt.plot(t,chirp(t,f0=100,t1=0.5,f1=200))   # plot a chirp >>> plt.subplot(222); plt.ylim([-2,2]) >>> plt.plot(t,gausspulse(t,fc=10,bw=0.5))     # Gauss pulse >>> plt.subplot(223); plt.ylim([-2,2]) >>> t*=3*numpy.pi >>> plt.plot(t,sawtooth(t))                     # sawtooth >>> plt.subplot(224); plt.ylim([-2,2]) >>> plt.plot(t,square(t))                       # Square wave >>> plt.show() Generated by this code, the following diagram shows waveforms for chirp (upper-left), gausspulse (upper-right), sawtooth (lower-left), and square (lower-right): The usual method of creating signals is to import them from the file. This is possible by using purely NumPy routines, for example fromfile: fromfile(file, dtype=float, count=-1, sep='') The file argument may point to either a file or a string, the count argument is used to determine the number of items to read, and sep indicates what constitutes a separator in the original file/string. For images, we have the versatile routine, imread in either the scipy.ndimage or scipy.misc module: imread(fname, flatten=False) The fname argument is a string containing the location of an image. The routine infers the type of file, and reads the data into an array, accordingly. In case the flatten argument is turned to True, the image is converted to gray scale. Note that, in order to work, the Python Imaging Library (PIL) needs to be installed. It is also possible to load .wav files for analysis, with the read and write routines from the wavfile submodule in the scipy.io module. For instance, given any audio file with this format, say audio.wav, the command, rate,data = scipy.io.wavfile.read("audio.wav"), assigns an integer value to the rate variable, indicating the sample rate of the file (in samples per second), and a NumPy ndarray to the data variable, containing the numerical values assigned to the different notes. If we wish to write some one-dimensional ndarray data into an audio file of this kind, with the sample rate given by the rate variable, we may do so by issuing the following command: >>> scipy.io.wavfile.write("filename.wav",rate,data) Filters A filter is an operation on signals that either removes features or extracts some component. SciPy has a very complete set of known filters, as well as the tools to allow construction of new ones. The complete list of filters in SciPy is long, and we encourage the reader to explore the help documents of the scipy.signal and scipy.ndimage modules for the complete picture. We will introduce in these pages, as an exposition, some of the most used filters in the treatment of audio or image processing. We start by creating a signal worth filtering: >>> from numpy import sin, cos, pi, linspace >>> f=lambda t: cos(pi*t) + 0.2*sin(5*pi*t+0.1) + 0.2*sin(30*pi*t)    + 0.1*sin(32*pi*t+0.1) + 0.1*sin(47* pi*t+0.8) >>> t=linspace(0,4,400); signal=f(t) We first test the classical smoothing filter of Wiener and Kolmogorov, wiener. We present in a plot, the original signal (in black) and the corresponding filtered data, with a choice of a Wiener window of the size 55 samples (in blue). Next, we compare the result of applying the median filter, medfilt, with a kernel of the same size as before (in red): >>> from scipy.signal import wiener, medfilt >>> import matplotlib.pylab as plt >>> plt.plot(t,signal,'k') >>> plt.plot(t,wiener(signal,mysize=55),'r',linewidth=3) >>> plt.plot(t,medfilt(signal,kernel_size=55),'b',linewidth=3) >>> plt.show() This gives us the following graph showing the comparison of smoothing filters (wiener is the one that has its starting point just below 0.5 and medfilt has its starting point just above 0.5): Most of the filters in the scipy.signal module can be adapted to work in arrays of any dimension. But in the particular case of images, we prefer to use the implementations in the scipy.ndimage module, since they are coded with these objects in mind. For instance, to perform a median filter on an image for smoothing, we use scipy.ndimage.median_filter. Let's see an example. We will start by loading Lena to the array and corrupting the image with Gaussian noise (zero mean and standard deviation of 16): >>> from scipy.stats import norm     # Gaussian distribution >>> import matplotlib.pyplot as plt >>> import scipy.misc >>> import scipy.ndimage >>> plt.gray() >>> lena=scipy.misc.lena().astype(float) >>> plt.subplot(221); >>> plt.imshow(lena) >>> lena+=norm(loc=0,scale=16).rvs(lena.shape) >>> plt.subplot(222); >>> plt.imshow(lena) >>> denoised_lena = scipy.ndimage.median_filter(lena,3) >>> plt.subplot(224); >>> plt.imshow(denoised_lena) The set of filters for images come in two flavors—statistical and morphological. For example, among the filters of statistical nature, we have the Sobel algorithm oriented to detection of edges (singularities along curves). Its syntax is as follows: sobel(image, axis=-1, output=None, mode='reflect', cval=0.0) The optional parameter, axis, indicates the dimension in which the computations are performed. By default, this is always the last axis (-1). The mode parameter, which is one of the strings 'reflect', 'constant', 'nearest', 'mirror', or 'wrap', indicates how to handle the border of the image, in case there is insufficient data to perform the computations there. In case the mode is 'constant', we may indicate the value to use in the border, with the cval parameter. Let's look into the following code snippet, which illustrates the use of the sobel filter: >>> from scipy.ndimage.filters import sobel >>> import numpy >>> lena=scipy.misc.lena() >>> sblX=sobel(lena,axis=0); sblY=sobel(lena,axis=1) >>> sbl=numpy.hypot(sblX,sblY) >>> plt.subplot(223); >>> plt.imshow(sbl) >>> plt.show() The following screenshot illustrates Lena (upper-left) and noisy Lena (upper-right) with the preceding two filters in action—edge map with sobel (lower-left) and median filter (lower-right): Morphology We also have the possibility of creating and applying filters to images based on mathematical morphology, both to binary and gray-scale images. The four basic morphological operations are opening (binary_opening), closing (binary_closing), dilation (binary_dilation), and erosion (binary_erosion). Note that the syntax for each of these filters is very simple, since we only need two ingredients—the signal to filter and the structuring element to perform the morphological operation. Let's take a look into the general syntax for these morphological operations: binary_operation(signal, structuring_element) We may use combinations of these four basic morphological operations to create more complex filters for removal of holes, hit-or-miss transforms (to find the location of specific patterns in binary images), denoising, edge detection, and many more. The SciPy module also allows for creating some common filters using the preceding syntax. For instance, for the location of the letter e in a text, we could use the following command instead: >>> binary_hit_or_miss(text, letterE) For comparative purposes, let's use this command in the following code snippet: >>> import numpy >>> import scipy.ndimage >>> import matplotlib.pylab as plt >>> from scipy.ndimage.morphology import binary_hit_or_miss >>> text = scipy.ndimage.imread('CHAP_05_input_textImage.png') >>> letterE = text[37:53,275:291] >>> HitorMiss = binary_hit_or_miss(text, structure1=letterE,    origin1=1) >>> eLocation = numpy.where(HitorMiss==True) >>> x=eLocation[1]; y=eLocation[0] >>> plt.imshow(text, cmap=plt.cm.gray, interpolation='nearest') >>> plt.autoscale(False) >>> plt.plot(x,y,'wo',markersize=10) >>> plt.axis('off') >>> plt.show() The output for the preceding lines of code is generated as follows: For gray-scale images, we may use a structuring element (structuring_element) or a footprint. The syntax is, therefore, a little different: grey_operation(signal, [structuring_element, footprint, size, ...]) If we desire to use a completely flat and rectangular structuring element (all ones), then it is enough to indicate the size as a tuple. For instance, to perform gray-scale dilation of a flat element of size (15,15) on our classical image of Lena, we issue the following command: >>> grey_dilation(lena, size=(15,15)) The last kind of morphological operations coded in the scipy.ndimage module perform distance and feature transforms. Distance transforms create a map that assigns to each pixel, the distance to the nearest object. Feature transforms provide with the index of the closest background element instead. These operations are used to decompose images into different labels. We may even choose different metrics such as Euclidean distance, chessboard distance, and taxicab distance. The syntax for the distance transform (distance_transform) using a brute force algorithm is as follows: distance_transform_bf(signal, metric='euclidean', sampling=None, return_distances=True, return_indices=False,                      distances=None, indices=None) We indicate the metric with the strings such as 'euclidean', 'taxicab', or 'chessboard'. If we desire to provide the feature transform instead, we switch return_distances to False and return_indices to True. Similar routines are available with more sophisticated algorithms—distance_transform_cdt (using chamfering for taxicab and chessboard distances). For Euclidean distance, we also have distance_transform_edt. All these use the same syntax. Summary In this article, we explored signal processing (any dimensional) including the treatment of signals in frequency space, by means of their Discrete Fourier Transforms. These correspond to the fftpack, signal, and ndimage modules. Resources for Article: Further resources on this subject: Signal Processing Techniques [article] SciPy for Computational Geometry [article] Move Further with NumPy Modules [article]
Read more
  • 0
  • 0
  • 13934
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-going-beyond-zabbix-agents
Packt
03 Mar 2015
17 min read
Save for later

Going beyond Zabbix agents

Packt
03 Mar 2015
17 min read
In this article by Andrea Dalle Vacche and Stefano Kewan Lee, author of Zabbix Network Monitoring Essentials, we will learn the different possibilities Zabbix offers to the enterprising network administrator. There are certainly many advantages in using Zabbix's own agents and protocol when it comes to monitoring Windows and Unix operating systems or the applications that run on them. However, when it comes to network monitoring, the vast majority of monitored objects are network appliances of various kinds, where it's often impossible to install and run a dedicated agent of any type. This by no means implies that you'll be unable to fully leverage Zabbix's power to monitor your network. Whether it's a simple ICMP echo request, an SNMP query, an SNMP trap, netflow logging, or a custom script, there are many possibilities to extract meaningful data from your network. This section will show you how to set up these different methods of gathering data, and give you a few examples on how to use them. (For more resources related to this topic, see here.) Simple checks An interesting use case is using one or more net.tcp.service items to make sure that some services are not running on a given interface. Take for example, the case of a border router or firewall. Unless you have some very special and specific needs, you'll typically want to make sure that no admin consoles are available on the external interfaces. You might have double-checked the appliance's initial configuration, but a system update, a careless admin, or a security bug might change the aforesaid configuration and open your appliance's admin interfaces to a far wider audience than intended. A security breach like this one could pass unobserved for a long time unless you configure a few simple TCP/IP checks on your appliance's external interfaces and then set up some triggers that will report a problem if those checks report an open and responsive port. Let's take the example of the router with two production interfaces and a management interface shown in the section about host interfaces. If the router's HTTPS admin console is available on TCP port 8000, you'll want to configure a simple check item for every interface: Item name Item key management_https_console net.tcp.service[https,192.168.1.254,8000] zoneA_https_console net.tcp.service[https,10.10.1.254,8000] zoneB_https_console net.tcp.service[https,172.16.7.254,8000] All these checks will return 1 if the service is available, and 0 if the service is not available. What changes is how you implement the triggers on these items. For the management item, you'll have a problem if the service is not available, while for the other two, you'll have a problem if the service is indeed available, as shown in the following table: Trigger name Trigger expression Management console down {it-1759-r1:net.tcp.service[http,192.168.1.254,8000].last()}=0 Console available from zone A {it-1759-r1:net.tcp.service[http,10.10.1.254,8000].last()}=1 Console available from zone B {it-1759-r1:net.tcp.service[http,172.16.7.254,8000].last()}=1 This way, you'll always be able to make sure that your device's configuration when it comes to open or closed ports will always match your expected setup and be notified when it diverges from the standard you set. To summarize, simple checks are great for all cases where you don't need complex monitoring data from your network as they are quite fast and lightweight. For the same reason, they could be the preferred solution if you have to monitor availability for hundreds to thousands of hosts as they will impart a relatively low overhead on your overall network traffic. When you do need more structure and more detail in your monitoring data, it's time to move to the bread and butter of all network monitoring solutions: SNMP. Keeping SNMP simple The Simple Network Monitoring Protocol (SNMP) is an excellent, general purpose protocol that has become widely used beyond its original purpose. When it comes to network monitoring though, it's also often the only protocol supported by many appliances, so it's often a forced, albeit natural and sensible, choice to integrate it into your monitoring scenarios. As a network administrator, you probably already know all there is to know about SNMP and how it works, so let's focus on how it's integrated into Zabbix and what you can do with it. Mapping SNMP OIDs to Zabbix items An SNMP value is composed of three different parts: the OID, the data type, and the value itself. When you use snmpwalk or snmpget to get values from an SNMP agent, the output looks like this: SNMPv2-MIB::sysObjectID.0 = OID: CISCO-PRODUCTS-MIB::cisco3640DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (83414) 0:13:54.14SNMPv2-MIB::sysContact.0 = STRING:SNMPv2-MIB::sysName.0 = STRING: R1SNMPv2-MIB::sysLocation.0 = STRING: Upper floor room 13SNMPv2-MIB::sysServices.0 = INTEGER: 78SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00...IF-MIB::ifPhysAddress.24 = STRING: c4:1:22:4:f2:fIF-MIB::ifPhysAddress.26 = STRING:IF-MIB::ifPhysAddress.27 = STRING: c4:1:1e:c8:0:0IF-MIB::ifAdminStatus.1 = INTEGER: up(1)IF-MIB::ifAdminStatus.2 = INTEGER: down(2)… And so on. The first part, the one before the = sign is, naturally, the OID. This will go into the SNMP OID field in the Zabbix item creation page and is the unique identifier for the metric you are interested in. Some OIDs represent a single and unique metric for the device, so they are easy to identify and address. In the above excerpt, one such OID is DISMAN-EVENT-MIB::sysUpTimeInstance. If you are interested in monitoring that OID, you'd only have to fill out the item creation form with the OID itself and then define an item name, a data type, and a retention policy, and you are ready to start monitoring it. In the case of an uptime value, time-ticks are expressed in seconds, so you'll choose a numeric decimal data type. We'll see in the next section how to choose Zabbix item data types and how to store values based on SNMP data types. You'll also want to store the value as is and optionally specify a unit of measure. This is because an uptime is already a relative value as it expresses the time elapsed since a device's latest boot. There would be no point in calculating a further delta when getting this measurement. Finally, you'll define a polling interval and choose a retention policy. In the following example, the polling interval is shown to be 5 minutes (300 seconds), the history retention policy as 3 days, and the trend storage period as one year. These should be sensible values as you don't normally need to store the detailed history of a value that either resets to zero, or, by definition, grows linearly by one tick every second. The following screenshot encapsulates what has been discussed in this paragraph: Remember that the item's key value still has to be unique at the host/template level as it will be referenced to by all other Zabbix components, from calculated items to triggers, maps, screens, and so on. Don't forget to put the right credentials for SNMPv3 if you are using this version of the protocol. Many of the more interesting OIDs, though, are a bit more complex: multiple OIDs can be related to one another by means of the same index. Let's look at another snmpwalk output excerpt: IF-MIB::ifNumber.0 = INTEGER: 26IF-MIB::ifIndex.1 = INTEGER: 1IF-MIB::ifIndex.2 = INTEGER: 2IF-MIB::ifIndex.3 = INTEGER: 3…IF-MIB::ifDescr.1 = STRING: FastEthernet0/0IF-MIB::ifDescr.2 = STRING: Serial0/0IF-MIB::ifDescr.3 = STRING: FastEthernet0/1…IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)IF-MIB::ifType.2 = INTEGER: propPointToPointSerial(22)IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6)…IF-MIB::ifMtu.1 = INTEGER: 1500IF-MIB::ifMtu.2 = INTEGER: 1500IF-MIB::ifMtu.3 = INTEGER: 1500…IF-MIB::ifSpeed.1 = Gauge32: 10000000IF-MIB::ifSpeed.2 = Gauge32: 1544000IF-MIB::ifSpeed.3 = Gauge32: 10000000…IF-MIB::ifPhysAddress.1 = STRING: c4:1:1e:c8:0:0IF-MIB::ifPhysAddress.2 = STRING:IF-MIB::ifPhysAddress.3 = STRING: c4:1:1e:c8:0:1…IF-MIB::ifAdminStatus.1 = INTEGER: up(1)IF-MIB::ifAdminStatus.2 = INTEGER: down(2)IF-MIB::ifAdminStatus.3 = INTEGER: down(2)…IF-MIB::ifOperStatus.1 = INTEGER: up(1)IF-MIB::ifOperStatus.2 = INTEGER: down(2)IF-MIB::ifOperStatus.3 = INTEGER: down(2)…IF-MIB::ifLastChange.1 = Timeticks: (1738) 0:00:17.38IF-MIB::ifLastChange.2 = Timeticks: (1696) 0:00:16.96IF-MIB::ifLastChange.3 = Timeticks: (1559) 0:00:15.59…IF-MIB::ifInOctets.1 = Counter32: 305255IF-MIB::ifInOctets.2 = Counter32: 0IF-MIB::ifInOctets.3 = Counter32: 0…IF-MIB::ifInDiscards.1 = Counter32: 0IF-MIB::ifInDiscards.2 = Counter32: 0IF-MIB::ifInDiscards.3 = Counter32: 0…IF-MIB::ifInErrors.1 = Counter32: 0IF-MIB::ifInErrors.2 = Counter32: 0IF-MIB::ifInErrors.3 = Counter32: 0…IF-MIB::ifOutOctets.1 = Counter32: 347968IF-MIB::ifOutOctets.2 = Counter32: 0IF-MIB::ifOutOctets.3 = Counter32: 0 As you can see, for every network interface, there are several OIDs, each one detailing a specific aspect of the interface: its name, its type, whether it's up or down, the amount of traffic coming in or going out, and so on. The different OIDs are related through their last number, the actual index of the OID. Looking at the preceding excerpt, we know that the device has 26 interfaces, of which we are showing some values for just the first three. By correlating the index numbers, we also know that interface 1 is called FastEthernet0/0, its MAC address is c4:1:1e:c8:0:0, the interface is up and has been up for just 17 seconds, and some traffic already went through it. Now, one way to monitor several of these metrics for the same interface is to manually correlate these values when creating the items, putting the complete OID in the SNMP OID field, and making sure that both the item key and its name reflect the right interface. This process is not only prone to errors during the setup phase, but it could also introduce some inconsistencies down the road. There is no guarantee, in fact, that the index will remain consistent across hardware or software upgrades or even across configurations when it comes to more volatile states like the number of VLANs or routing tables instead of network interfaces. Fortunately Zabbix provides a feature, called dynamic indexes, that allows you to actually correlate different OIDs in the same SNMP OID field so that you can define an index based on the index exposed by another OID. This means that if you want to know the admin status of FastEthernet0/0, you don't need to find the index associated with FastEthernet0/0 (in this case it would be 1) and then add that index to IF-MIB::ifAdminStatus of the base OID, hoping that it won't ever change in the future. You can instead use the following code: IF-MIB::ifAdminStatus["index", "IF-MIB::ifDescr",   "FastEthernet0/0"] Upon using the preceding code in the SNMP OID field of your item, the item will dynamically find the index of the IF-MIB::ifDescr OID where the value is FastEthernet0/0 and append it to IF-MIB::ifAdminStatus in order to get the right status for the right interface. If you organize your items this way, you'll always be sure that related items actually show the right related values for the component you are interested in and not those of another one because things changed on the device's side without your knowledge. Moreover, we'll build on this technique to develop low-level discovery of a device. You can use the same technique to get other interesting information out of a device. Consider, for example, the following excerpt: ENTITY-MIB::entPhysicalVendorType.1 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevChassis3640ENTITY-MIB::entPhysicalVendorType.2 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevContainerSlotENTITY-MIB::entPhysicalVendorType.3 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevCpu37452feENTITY-MIB::entPhysicalClass.1 = INTEGER: chassis(3)ENTITY-MIB::entPhysicalClass.2 = INTEGER: container(5)ENTITY-MIB::entPhysicalClass.3 = INTEGER: module(9)ENTITY-MIB::entPhysicalName.1 = STRING: 3745 chassisENTITY-MIB::entPhysicalName.2 = STRING: 3640 Chassis Slot 0ENTITY-MIB::entPhysicalName.3 = STRING: c3745 Motherboard with FastEthernet on Slot 0ENTITY-MIB::entPhysicalHardwareRev.1 = STRING: 2.0ENTITY-MIB::entPhysicalHardwareRev.2 = STRING:ENTITY-MIB::entPhysicalHardwareRev.3 = STRING: 2.0ENTITY-MIB::entPhysicalSerialNum.1 = STRING: FTX0945W0MYENTITY-MIB::entPhysicalSerialNum.2 = STRING:ENTITY-MIB::entPhysicalSerialNum.3 = STRING: XXXXXXXXXXX It should be immediately clear to you that you can find the chassis's serial number by creating an item with: ENTITY-MIB::entPhysicalSerialNum["index", "ENTITY-MIB::entPhysicalName", "3745 chassis"] Then you can specify, in the same item, that it should populate the Serial Number field of the host's inventory. This is how you can have a more automatic, dynamic population of inventory fields. The possibilities are endless as we've only just scratched the surface of what any given device can expose as SNMP metrics. Before you go and find your favorite OIDs to monitor though, let's have a closer look at the preceding examples, and let's discuss data types. Getting data types right We have already seen how an OID's value has a specific data type that is usually clearly stated with the default snmpwalk command. In the preceding examples, you can clearly see the data type just after the = sign, before the actual value. There are a number of SNMP data types—some still current and some deprecated. You can find the official list and documentation in RFC2578 (http://tools.ietf.org/html/rfc2578), but let's have a look at the most important ones from the perspective of a Zabbix user: SNMP type Description Suggested Zabbix item type and options INTEGER This can have negative values and is usually used for enumerations Numeric unsigned, decimal Store value as is Show with value mappings STRING This is a regular character string and can contain new lines Text Store value as is OID This is an SNMP object identifier Character Store value as is IpAddress IPv4 only Character Store value as is Counter32 This includes only non-negative and nondecreasing values Numeric unsigned, decimal Store value as delta (speed per second) Gauge32 This includes only non-negative values, which can decrease Numeric unsigned, decimal Store value as is Counter64 This includes non-negative and nondecreasing 64-bit values Numeric unsigned, decimal Store value as delta (speed per second) TimeTicks This includes non-negative, nondecreasing values Numeric unsigned, decimal Store value as is First of all, remember that the above suggestions are just that—suggestions. You should always evaluate how to store your data on a case-by-case basis, but you'll probably find that in many cases those are indeed the most useful settings. Moving on to the actual data types, remember that the command line SNMP tools by default parse the values and show some already interpreted information. This is especially true for Timeticks values and for INTEGER values when these are used as enumerations. In other words, you see the following from the command line: VRRP-MIB::vrrpNotificationCntl.0 = INTEGER: disabled(2) However, what is actually passed as a request is the bare OID: 1.3.6.1.2.1.68.1.2.0 The SNMP agent will respond with just the value, which, in this case, is the value 2. This means that in the case of enumerations, Zabbix will just receive and store a number and not the string disabled(2) as seen from the command line. If you want to display monitoring values that are a bit clearer, you can apply value mappings to your numeric items. Value maps contain the mapping between numeric values and arbitrary string representations for a human-friendly representation. You can specify which one you need in the item configuration form, as follows: Zabbix comes with a few predefined value mappings. You can create your own mappings by following the show value mappings link and, provided you have admin roles on Zabbix, you'll be taken to a page where you can configure all value mappings that will be used by Zabbix. From there, click on Create value map in the upper-right corner of the page, and you'll be able to create a new mapping. Not all INTEGER values are enumerations, but those that are used as such will be clearly recognizable from your command-line tools as they will be defined as INTEGER values but will show a string label along with the actual value, just as in the preceding example. On the other hand, when they are not used as enumerations, they can represent different things depending on the context. As seen in the previous paragraph, they can represent the number of indexes available for a given OID. They can also represent application or protocol-specific values, such as default MTU, default TTL, route metrics, and so on. The main difference between gauges, counters, and integers is that integers can assume negative values, while gauges and counters cannot. In addition to that, counters can only increase or wrap around and start again from the bottom of their value range once they reach the upper limits of it. From the perspective of Zabbix, this marks the difference in how you'll want to store their values. Gauges are usually employed when a value can vary within a given range, such as the speed of an interface, the amount of free memory, or any limits and timeouts you might find for notifications, the number of instances, and so on. In all of these cases, the value can increase or decrease in time, so you'll want to store them as they are because once put on a graph, they'll draw a meaningful curve. Counters, on the other hand, can only increase by definition. They are typically used to show how many packets were processed by an interface, how many were dropped, how many errors were encountered, and so on. If you store counter values as they are, you'll find in your graphs some ever-ascending curves that won't tell you very much for your monitoring or capacity planning purposes. This is why you'll usually want to track a counter's amount of change in time, more than its actual value. To do that, Zabbix offers two different ways to store deltas or differences between successive values. The delta (simple change) storage method does exactly what it says: it simply computes the difference between the currently received value and the previously received one, and stores the result. It doesn't take into consideration the elapsed time between the two measurements, nor the fact that the result can even have a negative value if the counter overflows. The fact is that most of the time, you'll be very interested in evaluating how much time has passed between two different measurements and in treating correctly any negative values that can appear as a result. The delta (speed per second) will divide the difference between the currently received value and the previously received one by the difference between the current timestamp and the previous one, as follows: (value – prev_value)/(time - prev_time) This will ensure that the scale of the change will always be constant, as opposed to the scale of the simple change delta, which will vary every time you modify the update interval of the item, giving you inconsistent results. Moreover, the speed-per-second delta will ignore any negative values and just wait for the next measurement, so you won't find any false dips in your graph due to overflowing. Finally, while SNMP uses specific data types for IP addresses and SNMP OIDs, there are no such types in Zabbix, so you'll need to map them to some kind of string item. The suggested type here is character as both values won't be bigger than 255 characters and won't contain any newlines. String values, on the other hand, can be quite long as the SNMP specification allows for 65,535-character-long texts; however, text that long would be of little practical value. Even if they are usually much shorter, string values can often contain newlines and be longer than 255 characters. Consider, for example, the following SysDescr OID for this device: NMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software, 3700 Software(C3745-ADVENTERPRISEK9_SNA-M), Version 12.4(15)T14, RELEASE SOFTWARE(fc2)^MTechnical Support: http://www.cisco.com/techsupport^MCopyright (c) 1986-2010 by Cisco Systems, Inc.^MCompiled Tue 17-Aug-10 12:56 by prod_rel_tea As you can see, the string spans multiple lines, and it's definitely longer than 255 characters. This is why the suggested type for string values is text as it allows text of arbitrary length and structure. On the other hand, if you're sure that a specific OID value will always be much shorter and simpler, you can certainly use the character data type for your corresponding Zabbix item. Now, you are truly ready to get the most out of your devices' SNMP agents as you are now able to find the OID you want to monitor and map them perfectly to Zabbix items, down to how to store the values, their data types, with what frequency, and with any value mapping that might be necessary. Summary In this article, you have learned the different possibilities offered by Zabbix to the enterprising network administrator. You should now be able to choose, design, and implement all the monitoring items you need, based on the methods illustrated in the preceding paragraphs. Resources for Article: Further resources on this subject: Monitoring additional servers [Article] Bar Reports in Zabbix 1.8 [Article] Using Proxies to Monitor Remote Locations with Zabbix 1.8 [Article]
Read more
  • 0
  • 0
  • 15146

article-image-speeding-vagrant-development-docker
Packt
03 Mar 2015
13 min read
Save for later

Speeding Vagrant Development With Docker

Packt
03 Mar 2015
13 min read
In this article by Chad Thompson, author of Vagrant Virtual Development Environment Cookbook, we will learn that many software developers are familiar with using Vagrant (http://vagrantup.com) to distribute and maintain development environments. In most cases, Vagrant is used to manage virtual machines running in desktop hypervisor software such as VirtualBox or the VMware Desktop product suites. (VMware Fusion for OS X and VMware Desktop for Linux and Windows environments.) More recently, Docker (http://docker.io) has become increasingly popular for deploying containers—Linux processes that can run in a single operating system environment yet be isolated from one another. In practice, this means that a container includes the runtime environment for an application, down to the operating system level. While containers have been popular for deploying applications, we can also use them for desktop development. Vagrant can use Docker in a couple of ways: As a target for running a process defined by Vagrant with the Vagrant provider. As a complete development environment for building and testing containers within the context of a virtual machine. This allows you to build a complete production-like container deployment environment with the Vagrant provisioner. In this example, we'll take a look at how we can use the Vagrant provider to build and run a web server. Running our web server with Docker will allow us to build and test our web application without the added overhead of booting and provisioning a virtual machine. (For more resources related to this topic, see here.) Introducing the Vagrant Provider The Vagrant Docker provider will build and deploy containers to a Docker runtime. There are a couple of cases to consider when using Vagrant with Docker: On a Linux host machine, Vagrant will use a native (locally installed) Docker environment to deploy containers. Make sure that Docker is installed before using Vagrant. Docker itself is a technology built on top of Linux Containers (LXC) technology—so Docker itself requires an operating system with a recent version (newer than Linux 3.8 which was released in February, 2013) of the Linux kernel. Most recent Linux distributions should support the ability to run Docker. On nonLinux environments (namely OS X and Windows), the provider will require a local Linux runtime to be present for deploying containers. When running the Docker provisioner in these environments, Vagrant will download and boot a version of the boot2docker (http://boot2docker.io) environment—in this case, a repackaging of boot2docker in Vagrant box format. Let's take a look at two scenarios for using the Docker provider. In each of these examples, we'll start these environments from an OS X environment so we will see some tasks that are required for using the boot2docker environment. Installing a Docker image from a repository We'll start with a simple case: installing a Docker container from a repository (a MySQL container) and connecting it to an external tool for development (the MySQL Workbench or a client tool of your choice). We'll need to initialize the boot2docker environment and use some Vagrant tools to interact with the environment and the deployed containers. Before we can start, we'll need to find a suitable Docker image to launch. One of the unique advantages to use Docker as a development environment is its ability to select a base Docker image, then add successive build steps on top of the base image. In this simple example, we can find a base MySQL image on the Docker Hub registry. (https://registry.hub.docker.com).The MySQL project provides an official Docker image that we can build from. We'll note from the repository the command for using the image: docker pull mysql and note that the image name is mysql. Start with a Vagrantfile that defines the docker: # -*- mode: ruby -*- # vi: set ft=ruby :   VAGRANTFILE_API_VERSION = "2" ENV['VAGRANT_DEFAULT_PROVIDER'] = 'vmware_fusion' Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.define"database" do |db|    db.vm.provider"docker"do |d|      d.image="mysql"    end end end An important thing to note immediately is that when we define the database machine and the provider with the Docker provider, we do not specify a box file. The Docker provider will start and launch containers into a boot2docker environment, negating the need for a Vagrant box or virtual machine definition. This will introduce a bit of a complication in interacting with the Vagrant environment in later steps. Also note the mysql image taken from the Docker Hub Registry. We'll need to launch the image with a few basic parameters. Add the following to the Docker provider block:    db.vm.provider "docker" do |d|      d.image="mysql"      d.env = {        :MYSQL_ROOT_PASSWORD => ""root",        :MYSQL_DATABASE     => ""dockertest",        :MYSQL_USER         => ""dockertest",        :MYSQL_PASSWORD     => ""d0cker"      }      d.ports =["3306:3306"]      d.remains_running = "true"    end The environment variables (d.env) are taken from the documentation on the MySQL Docker image page (https://registry.hub.docker.com/_/mysql/). This is how the image expects to set certain parameters. In this case, our parameters will set the database root password (for the root user) and create a database with a new user that has full permissions to that database. The d.ports parameter is an array of port listings that will be forwarded from the container (the default MySQL port of 3306) to the host operating system, in this case also 3306.The contained application will, thus, behave like a natively installed MySQL installation. The port forwarding here is from the container to the operating system that hosts the container (in this case, the container host is our boot2docker image). If we are developing and hosting containers natively with Vagrant on a Linux distribution, the port forwarding will be to localhost, but boot2docker introduces something of a wrinkle in doing Docker development on Windows or OS X. We'll either need to refer to our software installation by the IP of the boot2docker container or configure a second port forwarding configuration that allows a Docker contained application to be available to the host operating system as localhost. The final parameter (d.remains_running = true) is a flag for Vagrant to note that the Vagrant run should mark as failed if the Docker container exits on start. In the case of software that runs as a daemon process (such as the MySQL database), a Docker container that exits immediately is an error condition. Start the container using the vagrant up –provider=docker command. A few things will happen here: If this is the first time you have started the project, you'll see some messages about booting a box named mitchellh/boot2docker. This is a Vagrant-packaged version of the boot2docker project. Once the machine boots, it becomes a host for all Docker containers managed with Vagrant. Keep in mind that boot2doocker is necessary only for nonLinux operating systems that are running Docker through a virtual machine. On a Linux system running Docker natively, you will not see information about boot2docker. After the container is booted (or if it is already running), Vagrant will display notifications about rsyncing a folder (if we are using boot2docker) and launching the image: Docker generates unique identifiers for containers and notes any port mapping information. Let's take a look at some details on the containers that are running in the Docker host. We'll need to find a way to gain access to the Vagrant boot2docker image (and only if we are using boot2docker and not a native Linux environment), which is not quite as straightforward as a vagrant ssh; we'll need to identify the Vagrant container to access. First, identify the Docker Vagrant machine from the global Vagrant status. Vagrant keeps track of running instances that can be accessed from Vagrant itself. In this case, we are only interested in the Vagrant instance named docker-host. The instance we're interested in can be found with the vagrant global-status command: In this case, Vagrant identifies the instance as d381331 (a unique value for every Vagrant machine launched). We can access this instance with a vagrant ssh command: vagrant ssh d381331 This will display an ASCII-art boot2docker logo and a command prompt for the boot2docker instance. Let's take a look at Docker containers running on the system with the docker psps command: The docker ps command will provide information about the running Docker containers on the system; in this case, the unique ID of the container (output during the Vagrant startup) and other information about the container. Find the IP address of the boot2docker (only if we're using boot2docker) to connect to the MySQL instance. In this case, execute the ifconfig command: docker@boot2docker:~$ ifconfig This will output information about the network interfaces on the machine; we are interested in the eth0 entry. In particular, we can note the IP address of the machine on the eth0 interface: Make a note of the IP address noted as the inet addr; in this case 192.168.30.129. Connect a MySQL client to the running Docker container. In this case, we'll need to note some information to the connection: The IP address of the boot2docker virtual machine (if using boot2docker). In this case, we'll note 192.168.30.129. The port that the MySQL instance will respond to on the Docker host. In this case, the Docker container is forwarding port 3306 in the container to port 3306 on the host. Information noted in the Vagrantfile for the username or password on the MySQL instance. With this information in hand, we can configure a MySQL client. The MySQL project provides a supported GUI client named MySQL Workbench (http://www.mysql.com/products/workbench/). With the client installed on our host operating system, we can create a new connection in the Workbench client (consult the documentation for your version of Workbench, or use a MySQL client of your choice). In this case, we're connecting to the boot2docker instance. If you are running Docker natively on a Linux instance, the connection should simply forward to localhost. If the connection is successful, the Workbench client once connected will display an empty database: Once we've connected, we can use the MySQL database as we would for any other MySQL instance that is hosted this time in a Docker container without having to install and configure the MySQL package itself. Building a Docker image with Vagrant While launching packaged Docker, applications can be useful (particularly in the case where launching a Docker container is simpler than native installation steps), Vagrant becomes even more useful when used to launch containers that are being developed. On OS X and Windows machines, the use of Vagrant can make managing the container deployment somewhat simpler through the boot2docker containers, while on Linux, using the native Docker tools could be somewhat simpler. In this example, we'll use a simple Dockerfile to modify a base image. First, start with a simple Vagrantfile. In this case, we'll specify a build directory rather than a image file: # -*- mode: ruby -*- # vi: set ft=ruby :   # Vagrantfile API/syntax version. Don't touch unless you know what you're doing! VAGRANTFILE_API_VERSION = "2" ENV['VAGRANT_DEFAULT_PROVIDER'] = 'vmware_fusion'   Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.define "nginx" do |nginx|    nginx.vm.provider "docker" do |d|      d.build_dir = "build"      d.ports = ["49153:80"]    end end end This Vagrantfile specifies a build directory as well as the ports forwarded to the host from the container. In this case, the standard HTTP port (80) forwards to port 49153 on the host machine, which in this case is the boot2docker instance. Create our build directory in the same directory as the Vagrantfile. In the build directory, create a Dockerfile. A Dockerfile is a set of instructions on how to build a Docker container. See https://docs.docker.com/reference/builder/ or James Turnbull's The Docker Book for more information on how to construct a Dockerfile. In this example, we'll use a simple Dockerfile to copy a working HTML directory to a base NGINX image: FROM nginx COPY content /usr/share/nginx/html Create a directory in our build directory named content. In the directory, place a simple index.html file that will be served from the new container: <html> <body>    <div style="text-align:center;padding-top:40px;border:dashed 2px;">      This is an NGINX build.    </div> </body> </html> Once all the pieces are in place, our working directory will have the following structure: . ├── Vagrantfile └── build ├── Dockerfile    └── content        └── index.html Start the container in the working directory with the command: vagrant up nginx --provider=docker This will start the container build and deploy process. Once the container is launched, the web server can be accessed using the IP address of the boot2docker instance (see the previous section for more information on obtaining this address) and the forwarded port. One other item to note, especially, if you have completed both steps in this section without halting or destroying the Vagrant project is that when using the Docker provider, containers are deployed to a single shared virtual machine. If the boot2docker instance is accessed and the docker ps command is executed, it can be noted that two separate Vagrant projects deploy containers to a single host. When using the Docker provider, the single instance has a few effects: The single virtual machine can use fewer resources on your development workstation Deploying and rebuilding containers is a process that is much faster than booting and shutting down entire operating systems Docker development with the Docker provider can be a useful technique to create and test Docker containers, although Vagrant might not be of particular help in packaging and distributing Docker containers. If you wish to publish containers, consult the documentation or The Docker Book on getting started with packaging and distributing Docker containers. See also Docker: http://docker.io boot2docker: http://boot2docker.io The Docker Book: http://www.dockerbook.com The Docker repository: https://registry.hub.docker.com Summary In this article, we learned how to use Docker provisioner with Vagrant by covering the topics mentioned in the preceding paragraphs. Resources for Article: Further resources on this subject: Going Beyond the Basics [article] Module, Facts, Types and Reporting tools in Puppet [article] Setting Up a Development Environment [article]
Read more
  • 0
  • 0
  • 13344

article-image-time-travelling-spring
Packt
03 Mar 2015
18 min read
Save for later

Time Travelling with Spring

Packt
03 Mar 2015
18 min read
This article by Sujoy Acharya, the author of the book Mockito for Spring, delves into the details Time Travelling with Spring. Spring 4.0 is the Java 8-enabled latest release of the Spring Framework. In this article, we'll discover the major changes in the Spring 4.x release and the four important features of the Spring 4 framework. We will cover the following topics in depth: @RestController AsyncRestTemplate Async tasks Caching (For more resources related to this topic, see here.) Discovering the new Spring release This section deals with the new features and enhancements in Spring Framework 4.0. The following are the features: Spring 4 supports Java 8 features such as Java lambda expressions and java.time. Spring 4 supports JDK 6 as the minimum. All deprecated packages/methods are removed. Java Enterprise Edition 6 or 7 are the base of Spring 4, which is based on JPA 2 and Servlet 3.0. Bean configuration using the Groovy DSL is supported in Spring Framework 4.0. Hibernate 4.3 is supported by Spring 4. Custom annotations are supported in Spring 4. Autowired lists and arrays can be ordered. The @Order annotation and the Ordered interface are supported. The @Lazy annotation can now be used on injection points as well as on the @Bean definitions. For the REST application, Spring 4 provides a new @RestController annotation. We will discuss this in detail in the following section. The AsyncRestTemplate feature (class) is added for asynchronous REST client development. Different time zones are supported in Spring 4.0. New spring-websocket and spring-messaging modules have been added. The SocketUtils class is added to examine the free TCP and UDP server ports on localhost. All the mocks under the org.springframework.mock.web package are now based on the Servlet 3.0 specification. Spring supports JCache annotations and new improvements have been made in caching. The @Conditional annotation has been added to conditionally enable or disable an @Configuration class or even individual @Bean methods. In the test module, SQL script execution can now be configured declaratively via the new @Sql and @SqlConfig annotations on a per-class or per-method basis. You can visit the Spring Framework reference at http://docs.spring.io/spring/docs/4.1.2.BUILD-SNAPSHOT/spring-framework-reference/htmlsingle/#spring-whats-new for more details. Also, you can watch a video at http://zeroturnaround.com/rebellabs/spring-4-on-java-8-geekout-2013-video/ for more details on the changes in Spring 4. Working with asynchronous tasks Java 7 has a feature called Future. Futures let you retrieve the result of an asynchronous operation at a later time. The FutureTask class runs in a separate thread, which allows you to perform non-blocking asynchronous operations. Spring provides an @Async annotation to make it more easier to use. We'll explore Java's Future feature and Spring's @Async declarative approach: Create a project, TimeTravellingWithSpring, and add a package, com.packt.async. We'll exercise a bank's use case, where an automated job will run and settle loan accounts. It will also find all the defaulters who haven't paid the loan EMI for a month and then send an SMS to their number. The job takes time to process thousands of accounts, so it will be good if we can send SMSes asynchronously to minimize the burden of the job. We'll create a service class to represent the job, as shown in the following code snippet: @Service public class AccountJob {    @Autowired    private SMSTask smsTask; public void process() throws InterruptedException, ExecutionException { System.out.println("Going to find defaulters... "); Future<Boolean> asyncResult =smsTask.send("1", "2", "3"); System.out.println("Defaulter Job Complete. SMS will be sent to all defaulter"); Boolean result = asyncResult.get(); System.out.println("Was SMS sent? " + result); } } The job class autowires an SMSTask class and invokes the send method with phone numbers. The send method is executed asynchronously and Future is returned. When the job calls the get() method on Future, a result is returned. If the result is not processed before the get() method invocation, the ExecutionException is thrown. We can use a timeout version of the get() method. Create the SMSTask class in the com.packt.async package with the following details: @Component public class SMSTask { @Async public Future<Boolean> send(String... numbers) { System.out.println("Selecting SMS format "); try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); return new AsyncResult<>(false); } System.out.println("Async SMS send task is Complete!!!"); return new AsyncResult<>(true); } } Note that the method returns Future, and the method is annotated with @Async to signify asynchronous processing. Create a JUnit test to verify asynchronous processing: @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations="classpath:com/packt/async/          applicationContext.xml") public class AsyncTaskExecutionTest { @Autowired ApplicationContext context; @Test public void jobTest() throws Exception { AccountJob job = (AccountJob)context.getBean(AccountJob.class); job.process(); } } The job bean is retrieved from the applicationContext file and then the process method is called. When we execute the test, the following output is displayed: Going to find defaulters... Defaulter Job Complete. SMS will be sent to all defaulter Selecting SMS format Async SMS send task is Complete!!! Was SMS sent? true During execution, you might feel that the async task is executed after a delay of 2 seconds as the SMSTask class waits for 2 seconds. Exploring @RestController JAX-RS provides the functionality for Representational State Transfer (RESTful) web services. REST is well-suited for basic, ad hoc integration scenarios. Spring MVC offers controllers to create RESTful web services. In Spring MVC 3.0, we need to explicitly annotate a class with the @Controller annotation in order to specify a controller servlet and annotate each and every method with @ResponseBody to serve JSON, XML, or a custom media type. With the advent of the Spring 4.0 @RestController stereotype annotation, we can combine @ResponseBody and @Controller. The following example will demonstrate the usage of @RestController: Create a dynamic web project, RESTfulWeb. Modify the web.xml file and add a configuration to intercept requests with a Spring DispatcherServlet: <web-app xsi_schemaLocation="http:// java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/webapp_ 3_0.xsd" id="WebApp_ID" version="3.0"> <display-name>RESTfulWeb</display-name> <servlet> <servlet-name>dispatcher</servlet-name> <servlet-class> org.springframework.web.servlet.DispatcherServlet </servlet-class> <load-on-startup>1</load-on-startup> </servlet> <servlet-mapping> <servlet-name>dispatcher</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> <context-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/dispatcher-servlet.xml </param-value> </context-param> </web-app> The DispatcherServlet expects a configuration file with the naming convention [servlet-name]-servlet.xml. Create an application context XML, dispatcher-servlet.xml. We'll use annotations to configure Spring beans, so we need to tell the Spring container to scan the Java package in order to craft the beans. Add the following lines to the application context in order to instruct the container to scan the com.packt.controller package: <context:component-scan base-package= "com.packt.controller" /> <mvc:annotation-driven /> We need a REST controller class to handle the requests and generate a JSON output. Go to the com.packt.controller package and add a SpringService controller class. To configure the class as a REST controller, we need to annotate it with the @RestController annotation. The following code snippet represents the class: @RestController @RequestMapping("/hello") public class SpringService { private Set<String> names = new HashSet<String>(); @RequestMapping(value = "/{name}", method =          RequestMethod.GET) public String displayMsg(@PathVariable String name) {    String result = "Welcome " + name;    names.add(name);    return result; } @RequestMapping(value = "/all/", method =          RequestMethod.GET) public String anotherMsg() {    StringBuilder result = new StringBuilder("We          greeted so far ");    for(String name:names){      result.append(name).append(", ");    }    return result.toString();  } } We annotated the class with @RequestMapping("/hello"). This means that the SpringService class will cater for the requests with the http://{site}/{context}/hello URL pattern, or since we are running the app in localhost, the URL can be http://localhost:8080/RESTfulWeb/hello. The displayMsg method is annotated with @RequestMapping(value = "/{name}", method = RequestMethod.GET). So, the method will handle all HTTP GET requests with the URL pattern /hello/{name}. The name can be any String, such as /hello/xyz or /hello/john. In turn, the method stores the name to Set for later use and returns a greeting message, welcome {name}. The anotherMsg method is annotated with @RequestMapping(value = "/all/", method = RequestMethod.GET), which means that the method accepts all the requests with the http://{SITE}/{Context}/hello/all/ URL pattern. Moreover, this method builds a list of all users who visited the /hello/{names} URL. Remember, the displayMsg method stores the names in Set; this method iterates Set and builds a list of names who visited the /hello/{name} URL. There is some confusion though: what will happen if you enter the /hello/all URL in the browser? When we pass only a String literal after /hello/, the displayMsg method handles it, so you will be greeted with welcome all. However, if you type /hello/all/ instead—note that we added a slash after all—it means that the URL does not match the /hello/{name} pattern and the second method will handle the request and show you the list of users who visited the first URL. When we run the application and access the /hello/{name} URL, the following output is displayed: When we access http://localhost:8080/RESTfulWeb/hello/all/, the following output is displayed: Therefore, our RESTful application is ready for use, but just remember that in the real world, you need to secure the URLs against unauthorized access. In a web service, development security plays a key role. You can read the Spring security reference manual for additional information. Learning AsyncRestTemplate We live in a small, wonderful world where everybody is interconnected and impatient! We are interconnected through technology and applications, such as social networks, Internet banking, telephones, chats, and so on. Likewise, our applications are interconnected; often, an application housed in India may need to query an external service hosted in Philadelphia to get some significant information. We are impatient as we expect everything to be done in seconds; we get frustrated when we make an HTTP call to a remote service, and this blocks the processing unless the remote response is back. We cannot finish everything in milliseconds or nanoseconds, but we can process long-running tasks asynchronously or in a separate thread, allowing the user to work on something else. To handle RESTful web service calls asynchronously, Spring offers two useful classes: AsyncRestTemplate and ListenableFuture. We can make an async call using the template and get Future back and then continue with other processing, and finally we can ask Future to get the result. This section builds an asynchronous RESTful client to query the RESTful web service we developed in the preceding section. The AsyncRestTemplate class defines an array of overloaded methods to access RESTful web services asynchronously. We'll explore the exchange and execute methods. The following are the steps to explore the template: Create a package, com.packt.rest.template. Add a AsyncRestTemplateTest JUnit test. Create an exchange() test method and add the following lines: @Test public void exchange(){ AsyncRestTemplate asyncRestTemplate = new AsyncRestTemplate(); String url ="http://localhost:8080/RESTfulWeb/ hello/all/"; HttpMethod method = HttpMethod.GET; Class<String> responseType = String.class; HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.TEXT_PLAIN); HttpEntity<String> requestEntity = new HttpEntity<String>("params", headers); ListenableFuture<ResponseEntity<String>> future = asyncRestTemplate.exchange(url, method, requestEntity, responseType); try { //waits for the result ResponseEntity<String> entity = future.get(); //prints body of the given URL System.out.println(entity.getBody()); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } } The exchange() method has six overloaded versions. We used the method that takes a URL, an HttpMethod method such as GET or POST, an HttpEntity method to set the header, and finally a response type class. We called the exchange method, which in turn called the execute method and returned ListenableFuture. The ListenableFuture is the handle to our output; we invoked the GET method on ListenableFuture to get the RESTful service call response. The ResponseEntity has the getBody, getClass, getHeaders, and getStatusCode methods for extracting the web service call response. We invoked the http://localhost:8080/RESTfulWeb/hello/all/ URL and got back the following response: Now, create an execute test method and add the following lines: @Test public void execute(){ AsyncRestTemplate asyncTemp = new AsyncRestTemplate(); String url ="http://localhost:8080/RESTfulWeb /hello/reader"; HttpMethod method = HttpMethod.GET; HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.TEXT_PLAIN); AsyncRequestCallback requestCallback = new AsyncRequestCallback (){ @Override public void doWithRequest(AsyncClientHttpRequest request) throws IOException { System.out.println(request.getURI()); } }; ResponseExtractor<String> responseExtractor = new ResponseExtractor<String>(){ @Override public String extractData(ClientHttpResponse response) throws IOException { return response.getStatusText(); } }; Map<String,String> urlVariable = new HashMap<String, String>(); ListenableFuture<String> future = asyncTemp.execute(url, method, requestCallback, responseExtractor, urlVariable); try { //wait for the result String result = future.get(); System.out.println("Status =" +result); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } } The execute method has several variants. We invoke the one that takes a URL, HttpMethod such as GET or POST, an AsyncRequestCallback method which is invoked from the execute method just before executing the request asynchronously, a ResponseExtractor to extract the response, such as a response body, status code or headers, and a URL variable such as a URL that takes parameters. We invoked the execute method and received a future, as our ResponseExtractor extracts the status code. So, when we ask the future to get the result, it returns the response status which is OK or 200. In the AsyncRequestCallback method, we invoked the request URI; hence, the output first displays the request URI and then prints the response status. The following is the output: Caching objects Scalability is a major concern in web application development. Generally, most web traffic is focused on some special set of information. So, only those records are queried very often. If we can cache these records, then the performance and scalability of the system will increase immensely. The Spring Framework provides support for adding caching into an existing Spring application. In this section, we'll work with EhCache, the most widely used caching solution. Download the latest EhCache JAR from the Maven repository; the URL to download version 2.7.2 is http://mvnrepository.com/artifact/net.sf.ehcache/ehcache/2.7.2. Spring provides two annotations for caching: @Cacheable and @CacheEvict. These annotations allow methods to trigger cache population or cache eviction, respectively. The @Cacheable annotation is used to identify a cacheable method, which means that for an annotate method the result is stored into the cache. Therefore, on subsequent invocations (with the same arguments), the value in the cache is returned without actually executing the method. The cache abstraction allows the eviction of cache for removing stale or unused data from the cache. The @CacheEvict annotation demarcates the methods that perform cache eviction, that is, methods that act as triggers to remove data from the cache. The following are the steps to build a cacheable application with EhCache: Create a serializable Employee POJO class in the com.packt.cache package to store the employee ID and name. The following is the class definition: public class Employee implements Serializable { private static final long serialVersionUID = 1L; private final String firstName, lastName, empId;   public Employee(String empId, String fName, String lName) {    this.firstName = fName;    this.lastName = lName;    this.empId = empId; //Getter methods Spring caching supports two storages: the ConcurrentMap and ehcache libraries. To configure caching, we need to configure a manager in the application context. The org.springframework.cache.ehcache.EhCacheCacheManager class manages ehcache. Then, we need to define a cache with a configurationLocation attribute. The configurationLocation attribute defines the configuration resource. The ehcache-specific configuration is read from the resource ehcache.xml. <beans   xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans- 4.1.xsd http://www.springframework.org/schema/cache http://www. springframework.org/schema/cache/spring-cache- 4.1.xsd http://www.springframework.org/schema/context http://www. springframework.org/schema/context/springcontext- 4.1.xsd "> <context:component-scan base-package= "com.packt.cache" /> <cache:annotation-driven/> <bean id="cacheManager" class="org.springframework.cache. ehcache.EhCacheCacheManager" p:cacheManager-ref="ehcache"/> <bean id="ehcache" class="org.springframework.cache. ehcache.EhCacheManagerFactoryBean" p:configLocation="classpath:com/packt/cache/ehcache.xml"/> </beans> The <cache:annotation-driven/> tag informs the Spring container that the caching and eviction is performed in annotated methods. We defined a cacheManager bean and then defined an ehcache bean. The ehcache bean's configLocation points to an ehcache.xml file. We'll create the file next. Create an XML file, ehcache.xml, under the com.packt.cache package and add the following cache configuration data: <ehcache>    <diskStore path="java.io.tmpdir"/>    <cache name="employee"            maxElementsInMemory="100"            eternal="false"            timeToIdleSeconds="120"            timeToLiveSeconds="120"            overflowToDisk="true"            maxElementsOnDisk="10000000"            diskPersistent="false"            diskExpiryThreadIntervalSeconds="120"            memoryStoreEvictionPolicy="LRU"/>   </ehcache> The XML configures many things. Cache is stored in memory, but memory has a limit, so we need to define maxElementsInMemory. EhCache needs to store data to disk when max elements in memory reaches the threshold limit. We provide diskStore for this purpose. The eviction policy is set as an LRU, but the most important thing is the cache name. The name employee will be used to access the cache configuration. Now, create a service to store the Employee objects in a HashMap. The following is the service: @Service public class EmployeeService { private final Map<String, Employee> employees = new ConcurrentHashMap<String, Employee>(); @PostConstruct public void init() { saveEmployee (new Employee("101", "John", "Doe")); saveEmployee (new Employee("102", "Jack", "Russell")); } @Cacheable("employee") public Employee getEmployee(final String employeeId) { System.out.println(String.format("Loading a employee with id of : %s", employeeId)); return employees.get(employeeId); } @CacheEvict(value = "employee", key = "#emp.empId") public void saveEmployee(final Employee emp) { System.out.println(String.format("Saving a emp with id of : %s", emp.getEmpId())); employees.put(emp.getEmpId(), emp); } } The getEmployee method is a cacheable method; it uses the cache employee. When the getEmployee method is invoked more than once with the same employee ID, the object is returned from the cache instead of the original method being invoked. The saveEmployee method is a CacheEvict method. Now, we'll examine caching. We'll call the getEmployee method twice; the first call will populate the cache and the subsequent call will be responded toby the cache. Create a JUnit test, CacheConfiguration, and add the following lines: @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations="classpath:com/packt/cache/ applicationContext.xml") public class CacheConfiguration { @Autowired ApplicationContext context; @Test public void jobTest() throws Exception { EmployeeService employeeService = (EmployeeService)context.getBean(EmployeeService.class); long time = System.currentTimeMillis(); employeeService.getEmployee("101"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("101"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("102"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("102"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); employeeService.saveEmployee(new Employee("1000", "Sujoy", "Acharya")); time = System.currentTimeMillis(); employeeService.getEmployee("1000"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("1000"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); } } Note that the getEmployee method is invoked twice for each employee, and we recorded the method execution time in milliseconds. You will find from the output that every second call is answered by the cache, as the first call prints Loading a employee with id of : 101 and then the next call doesn't print the message but prints the time taken to execute. You will also find that the time taken for the cached objects is zero or less than the method invocation time. The following screenshot shows the output: Summary This article started with discovering the features of the new major Spring release 4.0, such as Java 8 support and so on. Then, we picked four Spring 4 topics and explored them one by one. The @Async section showcased the execution of long-running methods asynchronously and provided an example of how to handle asynchronous processing. The @RestController section eased the RESTful web service development with the advent of the @RestController annotation. The AsyncRestTemplate section explained the RESTful client code to invoke RESTful web service asynchronously. Caching is inevitable for a high-performance, scalable web application. The caching section explained the EhCache and Spring integrations to achieve a high-availability caching solution. Resources for Article: Further resources on this subject: Getting Started with Mockito [article] Progressive Mockito [article] Understanding outside-in [article]
Read more
  • 0
  • 0
  • 2002

article-image-central-air-and-heating-thermostat
Packt
03 Mar 2015
15 min read
Save for later

Central Air and Heating Thermostat

Packt
03 Mar 2015
15 min read
In this article by Andrew K. Dennis, author of the book Raspberry Pi Home Automation with Arduino Second Edition, you will learn how to build a thermostat device using an Arduino. You will also learn how to use the temperature data to switch relays on and off. Relays are the main components that you can use for interaction between your Arduino and high-voltage electronic devices. The thermostat will also provide a web interface so that you can connect to it and check out the temperature. (For more resources related to this topic, see here.) Introducing the thermostat A thermostat is a control device that is used to manipulate other devices based on a temperature setting. This temperature setting is known as the setpoint. When the temperature changes in relation to the setpoint, a device can be switched on or off. For example, let's imagine a system where a simple thermostat is set to switch an electric heater on when the temperature drops below 25 degrees Celsius. Within our thermostat, we have a temperature-sensing device such as a thermistor that returns a temperature reading every few seconds. When the thermistor reads a temperature below the setpoint (25 degrees Celsius), the thermostat will switch a relay on, completing the circuit between the wall plug and our electric heater and providing it with power. Thus, we can see that a simple electronic thermostat can be used to switch on a variety of devices. Warren S. Johnson, a college professor in Wisconsin, is credited with inventing the electric room thermostat in the 1880s. Johnson was known throughout his lifetime as a prolific inventor who worked in a variety of fields, including electricity. These electric room thermostats became a common feature in homes across the course of the twentieth century as larger parts of the world were hooked up the electricity grid. Now, with open hardware electronic tools such as the Arduino available, we can build custom thermostats for a variety of home projects. They can be used to control baseboard heaters, heat lamps, and air conditioner units. They can also be used for the following: Fish tank heaters Indoor gardens Electric heaters Fans Now that we have explored the uses of thermostats, let's take a look at our project. Setting up our hardware In the following examples, we will list the pins to which you need to connect your hardware. However, we recommend that when you purchase any device such as the Ethernet shield, you check whether certain pins are available or not. Due to the sheer range of hardware available, it is not possible to list every potential hardware combination. Therefore, if the pin in the example is not free, you can update the circuit and source code to use a different pin. When building the example, we also recommend using a breadboard. This will allow you to experiment with building your circuit without having to solder any components. Our first task will be to set up our thermostat device so that it has Ethernet access. Adding the Ethernet shield The Arduino Uno does not contain an Ethernet port. Therefore, you will need a way for your thermostat to be accessible on your home network. One simple solution is to purchase an Ethernet shield and connect it to your microcontroller. There are several shields in the market, including the Arduino Ethernet shield (http://arduino.cc/en/Main/ArduinoEthernetShield) and Seeed Ethernet shield (http://www.seeedstudio.com/wiki/Ethernet_Shield_V1.0). These shields are plugged into the GPIO pins on the Arduino. If you purchase one of these shields, then we would also recommend buying some extra GPIO headers. These are plugged into the existing headers attached to the Ethernet shield. Their purpose is to provide some extra clearance above the Ethernet port on the board so that you can connect other shields in future if you decide to purchase them. Take a board of your choice and attach it to the Arduino Uno. When you plug the USB cable into your microcontroller and into your computer, the lights on both the Uno and Ethernet shield should light up. Now our device has a medium to send and receive data over a LAN. Let's take a look at setting up our thermostat relays. Relays A relay is a type of switch controlled by an electromagnet. It allows us to use a small amount of power to control a much larger amount, for example, using a 9V power supply to switch 220V wall power. Relays are rated to work with different voltages and currents. A relay has three contact points: Normally Open, Common Connection, and Normally Closed. Two of these points will be wired up to our fan. In the context of an Arduino project, the relay will also have a pin for ground, 5V power and a data pin that is used to switch the relay on and off. A popular choice for a relay is the Pololu Basic SPDT Relay Carrier. This can be purchased from http://www.pololu.com/category/135/relay-modules. This relay has featured in some other Packt Publishing books on the Arduino, so it is a good investment. Once you have the relay, you need to wire it up to the microcontroller. Connect a wire from the relay to digital pin 5 on the Arduino, another wire to the GRD pin, and the final wire to the 5V pin. This completes the relay setup. In order to control relays though, we need some data to trigger switching them between on and off. Our thermistor device handles the task of collecting this data. Connecting the thermistor A thermistor is an electronic component that, when included in a circuit, can be used to measure temperature. The device is a type of resistor that has the property whereby its resistance varies as the temperature changes. It can be found in a variety of devices, including thermostats and electronic thermometers. There are two categories of thermistors available: Negative Thermistor Coefficient (NTC) and Positive Thermistor Coefficient (PTC). The difference between them is that as the temperature increases, the resistance decreases in the case of an NTC, and on the other hand, it increases in the case of a PTC. We are going to use a prebuilt digital device with the model number AM2303. This can be purchased at https://www.adafruit.com/products/393. This device reads both temperature and humidity. It also comes with a software library that you can use in your Arduino sketches. One of the benefits of this library is that many functions that precompute values, such as temperature in Celsius, are available and thus don't require you to write a lot of code. Take your AM203 and connect it to the GRD pin, 5V pin and digital pin 4. The following diagram shows how it should be set up: You are now ready to move on to creating the software to test for temperature readings. Setting up our software We now need to write an application in the Arduino IDE to control our new thermostat device. Our software will contain the following: The code responsible for collecting the temperature data Methods to switch relays on and off based on this data Code to handle accepting incoming HTTP requests so that we can view our thermostat's current temperature reading and change the setpoint A method to send our temperature readings to the Raspberry Pi The next step is to hook up our Arduino thermostat with the USB port of the device we installed the IDE on. You may need to temporarily disconnect your relay from the Arduino. This will prevent your thermostat device from drawing too much power from your computer's USB port, which may result in the port being disabled. We now need to download the DHT library that interacts with our AM2303. This can be found on GitHub, at https://github.com/adafruit/DHT-sensor-library. Click on the Download ZIP link and unzip the file to a location on your hard drive. Next, we need to install the library to make it accessible from our sketch: Open the Arduino IDE. Navigate to Sketch | Import Library. Next, click on Add library. Choose the folder on your hard drive. You can now use the library. With the library installed, we can include it in our sketch and access a number of useful functions. Let's now start creating our software. Thermostat software We can start adding some code to the Arduino to control our thermostat. Open a new sketch in the Arduino IDE and perform the following steps: Inside the sketch, we are going to start by adding the code to include the libraries we need to use. At the top of the sketch, add the following code: #include "DHT.h" // Include this if using the AM2302 #include <SPI.h> #include <Ethernet.h> Next, we will declare some variables to be used by our application. These will be responsible for defining:     The pin the AM2303 thermistor is located on     The relay pin     The IP address we want our Arduino to use, which should be unique     The Mac address of the Arduino, which should also be unique     The name of the room the thermostat is located in     The variables responsible for Ethernet communication The IP address will depend on your own home network. Check out your wireless router to see what range of IP addresses is available. Select an address that isn't in use and update the IPAddress variable as follows: #define DHTPIN 4 // The digital pin to read from #define DHTTYPE DHT22 // DHT 22 (AM2302)   unsigned char relay = 5; //The relay pins String room = "library"; byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED }; IPAddress ip(192,168,3,5); DHT dht(DHTPIN, DHTTYPE); EthernetServer server(80); EthernetClient client; We can now include the setup() function. This is responsible for initializing some variables with their default values, and setting the pin to which our relay is connected to output mode: void setup() {   Serial.begin(9600);   Ethernet.begin(mac, ip);   server.begin();   dht.begin();   pinMode(relay, OUTPUT); } The next block of code we will add is the loop() function. This contains the main body of our program to be executed. Here, we will assign a value to the setpoint and grab our temperature readings: void loop() {   int setpoint = 25;   float h = dht.readHumidity();   float t = dht.readTemperature(); Following this, we check whether the temperature is above or below the setpoint and switch the relay on or off as needed. Paste this code below the variables you just added: if(t <setpoint) {   digitalWrite(relay,HIGH); } else {   digitalWrite(relay,LOW); } Next, we need to handle the HTTP requests to the thermostat. We start by collecting all of the incoming data. The following code also goes inside the loop() function: client = server.available(); if (client) {   // an http request ends with a blank line   booleancurrentLineIsBlank = true;   String result;   while (client.connected()) {     if (client.available()) {       char c = client.read();       result= result + c;     } With the incoming request stored in the result variable, we can examine the HTTP header to know whether we are requesting an HTML page or a JSON object. You'll learn more about JavaScript Object Notation (JSON) shortly. If we request an HTML page, this is displayed in the browser. Next, add the following code to your sketch: if(result.indexOf("text/html") > -1) {   client.println("HTTP/1.1 200 OK");   client.println("Content-Type: text/html");   client.println();   if (isnan(h) || isnan(t)) {     client.println("Failed to read from DHT sensor!");     return;   }   client.print("<b>Thermostat</b> set to: ");   client.print(setpoint);    client.print("degrees C <br />Humidity: ");   client.print(h);   client.print(" %t");   client.print("<br />Temperature: ");   client.print(t);   client.println(" degrees C ");   break; } The following code handles a request for the data to be returned in JSON format. Our Raspberry Pi will make HTTP requests to the Arduino, and then process the data returned to it. At the bottom of this last block of code is a statement adding a short delay to allow the Arduino to process the request and close the client connection. Paste this final section of code in your sketch: if( result.indexOf("application/json") > -1 ) { client.println("HTTP/1.1 200 OK"); client.println("Content-Type: application/json;charset=utf-8"); client.println("Server: Arduino"); client.println("Connnection: close"); client.println(); client.print("{"thermostat":[{"location":""); client.print(room); client.print(""},"); client.print("{"temperature":""); client.print(t); client.print(""},"); client.print("{"humidity":""); client.print(h); client.print(""},"); client.print("{"setpoint":""); client.print(setpoint); client.print(""}"); client.print("]}"); client.println(); break;           }     } delay(1); client.stop();   }  } This completes our program. We can now save it and run the Verify process. Click on the small check mark in a circle located in the top-left corner of the sketch. If you have added all of the code correctly, you should see Binary sketch size: 16,962 bytes (of a 32,256 byte maximum). Now that our code is verified and saved, we can look at uploading it to the Arduino, attaching the fan, and testing our thermostat. Testing our thermostat and fan We have our hardware set up and the code ready. Now we can test the thermostat and see it in action with a device connected to the mains electricity. We will first attach a fan and then run the sketch to switch it on and off. Attaching the fan Ensure that your Arduino is powered down and that the fan is not plugged into the wall. Using a wire stripper and cutters, cut one side of the cable that connects the plug to the fan body. Take the end of the cable attached to the plug, and attach it to the NO point on the relay. Use a screwdriver to ensure that it is fastened correctly. Now, take the other portion of the cut cable that is attached to the fan body, and attach this to the COM point. Once again, use a screwdriver to ensure that it is fastened securely to the relay. Your connection should look as follows: You can now reattach your Arduino to the computer via its USB cable. However, do not plug the fan into the wall yet. Starting your thermostat application With the fan connected to our relay, we can upload our sketch and test it: From the Arudino IDE, select the upload icon. Once the code has been uploaded, disconnect your Arduino board. Next, connect an Ethernet cable to your Arduino. Following this, plug the Arduino into the wall to get mains power. Finally, connect the fan to the wall outlet. You should hear the clicking sound of the relay as it switches on or off depending on the room temperature. When the relay switch is on (or off), the fan will follow suit. Using a separate laptop if you have it, or from your Raspberry Pi, access the IP address you specified in the application via a web browser, for example, http://192.168.3.5/. You should see something similar to this: Thermostat set to: 25degrees C  Humidity: 35.70 % Temperature: 14.90 degrees C You can now stimulate the thermistor using an ice cube and hair dryer, to switch the relay on and off, and the fan will follow suit. If you refresh your connection to the IP address, you should see the change in the temperature output on the screen. You can use the F5 key to do this. Let's now test the JSON response. Testing the JSON response A format useful in transferring data between applications is JavaScript Object Notation (JSON). You can read more about this on the official JSON website, at http://www.json.org/. The purpose of us generating data in JSON format is to allow the Raspberry Pi control device we are building to query the thermostat periodically and collect the data being generated. We can verify that we are getting JSON data back from the sketch by making an HTTP request using the application/json header. Load a web browser such as Google Chrome or FireFox. We are going to make an XML HTTP request directly from the browser to our thermostat. This type of request is commonly known as an Asynchronous JavaScript and XML (AJAX) request. It can be used to refresh data on a page without having to actually reload it. In your web browser, locate and open the developer tools. The following link lists the location and shortcut keys in major browsers: http://webmasters.stackexchange.com/questions/8525/how-to-open-the-javascript-console-in-different-browsers In the JavaScript console portion of the developer tools, type the following JavaScript code: var xmlhttp; xmlhttp=new XMLHttpRequest(); xmlhttp.open("POST","192.168.3.5",true); xmlhttp.setRequestHeader("Content-type","application/json"); xmlhttp.onreadystatechange = function() {//Call a function when the state changes.    if(xmlhttp.readyState == 4 &&xmlhttp.status == 200) {          console.log(xmlhttp);    } }; xmlhttp.send() Press the return key or run option to execute the code. This will fire an HTTP request, and you should see a JSON object return: "{"thermostat":     [      {"location":"library"},      {"temperature":"14.90"},      {"humidity":"29.90"},      {"setpoint":"25"}   ] }" This confirms that our application can return data to the Raspberry Pi. We have tested our software and hardware and seen that they are working. Summary In this article, we built a thermostat device. We looked at thermistors, and we learned how to set up an Ethernet connection. To control our thermostat, we wrote an Arduino sketch, uploaded it to the microcontroller, and then tested it with a fan plugged into the mains electricity. Resources for Article: Further resources on this subject: The Raspberry Pi and Raspbian? [article] Clusters Parallel Computing and Raspberry Pi Brief Background [article] The Arduino Mobile Robot [article]
Read more
  • 0
  • 0
  • 21062
article-image-getting-started-postgresql
Packt
03 Mar 2015
11 min read
Save for later

Getting Started with PostgreSQL

Packt
03 Mar 2015
11 min read
In this article by Ibrar Ahmed, Asif Fayyaz, and Amjad Shahzad, authors of the book PostgreSQL Developer's Guide, we will come across the basic features and functions of PostgreSQL, such as writing queries using psql, data definition in tables, and data manipulation from tables. (For more resources related to this topic, see here.) PostgreSQL is widely considered to be one of the most stable database servers available today, with multiple features that include: A wide range of built-in types MVCC New SQL enhancements, including foreign keys, primary keys, and constraints Open source code, maintained by a team of developers Trigger and procedure support with multiple procedural languages Extensibility in the sense of adding new data types and the client language From the early releases of PostgreSQL (from version 6.0 that is), many changes have been made, with each new major version adding new and more advanced features. The current version is PostgreSQL 9.4 and is available from several sources and in various binary formats. Writing queries using psql Before proceeding, allow me to explain to you that throughout this article, we will use a warehouse database called warehouse_db. In this section, I will show you how you can create such a database, providing you with sample code for assistance. You will need to do the following: We are assuming here that you have successfully installed PostgreSQL and faced no issues. Now, you will need to connect with the default database that is created by the PostgreSQL installer. To do this, navigate to the default path of installation, which is /opt/PostgreSQL/9.4/bin from your command line, and execute the following command that will prompt for a postgres user password that you provided during the installation: /opt/PostgreSQL/9.4/bin$./psql -U postgres Password for user postgres: Using the following command, you can log in to the default database with the user postgres and you will be able to see the following on your command line: psql (9.4beta1) Type "help" for help postgres=# You can then create a new database called warehouse_db using the following statement in the terminal: postgres=# CREATE DATABASE warehouse_db; You can then connect with the warehouse_db database using the following command: postgres=# c warehouse_db You are now connected to the warehouse_db database as the user postgres, and you will have the following warehouse_db shell: warehouse_db=# Let's summarize what we have achieved so far. We are now able to connect with the default database postgres and created a warehouse_db database successfully. It's now time to actually write queries using psql and perform some Data Definition Language (DDL) and Data Manipulation Language (DML) operations, which we will cover in the following sections. In PostgreSQL, we can have multiple databases. Inside the databases, we can have multiple extensions and schemas. Inside each schema, we can have database objects such as tables, views, sequences, procedures, and functions. We are first going to create a schema named record and then we will create some tables in this schema. To create a schema named record in the warehouse_db database, use the following statement: warehouse_db=# CREATE SCHEMA record; Creating, altering, and truncating a table In this section, we will learn about creating a table, altering the table definition, and truncating the table. Creating tables Now, let's perform some DDL operations starting with creating tables. To create a table named warehouse_tbl, execute the following statements: warehouse_db=# CREATE TABLE warehouse_tbl ( warehouse_id INTEGER NOT NULL, warehouse_name TEXT NOT NULL, year_created INTEGER, street_address TEXT, city CHARACTER VARYING(100), state CHARACTER VARYING(2), zip CHARACTER VARYING(10), CONSTRAINT "PRIM_KEY" PRIMARY KEY (warehouse_id) ); The preceding statements created the table warehouse_tbl that has the primary key warehouse_id. Now, as you are familiar with the table creation syntax, let's create a sequence and use that in a table. You can create the hist_id_seq sequence using the following statement: warehouse_db=# CREATE SEQUENCE hist_id_seq; The preceding CREATE SEQUENCE command creates a new sequence number generator. This involves creating and initializing a new special single-row table with the name hist_id_seq. The user issuing the command will own the generator. You can now create the table that implements the hist_id_seq sequence using the following statement: warehouse_db=# CREATE TABLE history ( history_id INTEGER NOT NULL DEFAULT nextval('hist_id_seq'), date TIMESTAMP WITHOUT TIME ZONE, amount INTEGER, data TEXT, customer_id INTEGER, warehouse_id INTEGER, CONSTRAINT "PRM_KEY" PRIMARY KEY (history_id), CONSTRAINT "FORN_KEY" FOREIGN KEY (warehouse_id) REFERENCES warehouse_tbl(warehouse_id) ); The preceding query will create a history table in the warehouse_db database, and the history_id column uses the sequence as the default input value. In this section, we successfully learned how to create a table and also learned how to use a sequence inside the table creation syntax. Altering tables Now that we have learned how to create multiple tables, we can practice some ALTER TABLE commands by following this section. With the ALTER TABLE command, we can add, remove, or rename table columns. Firstly, with the help of the following example, we will be able to add the phone_no column in the previously created table warehouse_tbl: warehouse_db=# ALTER TABLE warehouse_tbl ADD COLUMN phone_no INTEGER; We can then verify that a column is added in the table by describing the table as follows: warehouse_db=# d warehouse_tbl            Table "public.warehouse_tbl"                  Column     |         Type         | Modifiers ----------------+------------------------+----------- warehouse_id  | integer               | not null warehouse_name | text                   | not null year_created   | integer               | street_address | text                   | city           | character varying(100) | state           | character varying(2)   | zip             | character varying(10) | phone_no       | integer               | Indexes: "PRIM_KEY" PRIMARY KEY, btree (warehouse_id) Referenced by: TABLE "history" CONSTRAINT "FORN_KEY"FOREIGN KEY  (warehouse_id) REFERENCES warehouse_tbl(warehouse_id) TABLE  "history" CONSTRAINT "FORN_KEY" FOREIGN KEY (warehouse_id)  REFERENCES warehouse_tbl(warehouse_id) To drop a column from a table, we can use the following statement: warehouse_db=# ALTER TABLE warehouse_tbl DROP COLUMN phone_no; We can then finally verify that the column has been removed from the table by describing the table again as follows: warehouse_db=# d warehouse_tbl            Table "public.warehouse_tbl"                  Column     |         Type         | Modifiers ----------------+------------------------+----------- warehouse_id   | integer               | not null warehouse_name | text                   | not null year_created   | integer               | street_address | text                   | city           | character varying(100) | state           | character varying(2)   | zip             | character varying(10) | Indexes: "PRIM_KEY" PRIMARY KEY, btree (warehouse_id) Referenced by: TABLE "history" CONSTRAINT "FORN_KEY" FOREIGN KEY  (warehouse_id) REFERENCES warehouse_tbl(warehouse_id) TABLE  "history" CONSTRAINT "FORN_KEY" FOREIGN KEY (warehouse_id)  REFERENCES warehouse_tbl(warehouse_id) Truncating tables The TRUNCATE command is used to remove all rows from a table without providing any criteria. In the case of the DELETE command, the user has to provide the delete criteria using the WHERE clause. To truncate data from the table, we can use the following statement: warehouse_db=# TRUNCATE TABLE warehouse_tbl; We can then verify that the warehouse_tbl table has been truncated by performing a SELECT COUNT(*) query on it using the following statement: warehouse_db=# SELECT COUNT(*) FROM warehouse_tbl; count -------      0 (1 row) Inserting, updating, and deleting data from tables In this section, we will play around with data and learn how to insert, update, and delete data from a table. Inserting data So far, we have learned how to create and alter a table. Now it's time to play around with some data. Let's start by inserting records in the warehouse_tbl table using the following command snippet: warehouse_db=# INSERT INTO warehouse_tbl ( warehouse_id, warehouse_name, year_created, street_address, city, state, zip ) VALUES ( 1, 'Mark Corp', 2009, '207-F Main Service Road East', 'New London', 'CT', 4321 ); We can then verify that the record has been inserted by performing a SELECT query on the warehouse_tbl table as follows: warehouse_db=# SELECT warehouse_id, warehouse_name, street_address               FROM warehouse_tbl; warehouse_id | warehouse_name |       street_address         ---------------+----------------+------------------------------- >             1 | Mark Corp     | 207-F Main Service Road East (1 row) Updating data Once we have inserted data in our table, we should know how to update it. This can be done using the following statement: warehouse_db=# UPDATE warehouse_tbl SET year_created=2010 WHERE year_created=2009; To verify that a record is updated, let's perform a SELECT query on the warehouse_tbl table as follows: warehouse_db=# SELECT warehouse_id, year_created FROM               warehouse_tbl; warehouse_id | year_created --------------+--------------            1 |         2010 (1 row) Deleting data To delete data from a table, we can use the DELETE command. Let's add a few records to the table and then later on delete data on the basis of certain conditions: warehouse_db=# INSERT INTO warehouse_tbl ( warehouse_id, warehouse_name, year_created, street_address, city, state, zip ) VALUES ( 2, 'Bill & Co', 2014, 'Lilly Road', 'New London', 'CT', 4321 ); warehouse_db=# INSERT INTO warehouse_tbl ( warehouse_id, warehouse_name, year_created, street_address, city, state, zip ) VALUES ( 3, 'West point', 2013, 'Down Town', 'New London', 'CT', 4321 ); We can then delete data from the warehouse.tbl table, where warehouse_name is Bill & Co, by executing the following statement: warehouse_db=# DELETE FROM warehouse_tbl WHERE warehouse_name='Bill & Co'; To verify that a record has been deleted, we will execute the following SELECT query: warehouse_db=# SELECT warehouse_id, warehouse_name FROM warehouse_tbl WHERE warehouse_name='Bill & Co'; warehouse_id | warehouse_name --------------+---------------- (0 rows) The DELETE command is used to drop a row from a table, whereas the DROP command is used to drop a complete table. The TRUNCATE command is used to empty the whole table. Summary In this article, we learned how to utilize the SQL language for a collection of everyday DBMS exercises in an easy-to-use practical way. We also figured out how to make a complete database that incorporates DDL (create, alter, and truncate) and DML (insert, update, and delete) operators. Resources for Article: Further resources on this subject: Indexes [Article] Improving proximity filtering with KNN [Article] Using Unrestricted Languages [Article]
Read more
  • 0
  • 0
  • 2587

article-image-packaged-elegance
Packt
03 Mar 2015
24 min read
Save for later

Packaged Elegance

Packt
03 Mar 2015
24 min read
In this article by John Farrar, author of the book KnockoutJS Web development, we will see how templates drove us to a more dynamic, creative platform. The next advancement in web development was custom HTML components. KnockoutJS allows us to jump right in with some game-changing elegance for designers and developers. In this article, we will focus on: An introduction to components Bring Your Own Tags (BYOT) Enhancing attribute handling Making your own libraries Asynchronous module definition (AMD)—on demand resource loading This entire article is about packaging your code for reuse. Using these techniques, you can make your code more approachable and elegant. (For more resources related to this topic, see here.) Introduction to components The best explanation of a component is a packaged template with an isolated ViewModel. Here is the syntax we would use to declare a like component on the page: <div data-bind="component: "like"''"></div> If you are passing no parameters through to the component, this is the correct syntax. If you wish to pass parameters through, you would use a JSON style structure as follows: <div data-bind="component:{name: 'like-widget',params:{ approve: like} }"></div> This would allow us to pass named parameters through to our custom component. In this case, we are passing a parameter named approve. This would mean we had a bound viewModel variable by the name of like. Look at how this would be coded. Create a page called components.html using the _base.html file to speed things up as we have done in all our other articles. In your script section, create the following ViewModel: <script>ViewModel = function(){self = this;self.like = ko.observable(true);};// insert custom component herevm = new ViewModel();ko.applyBindings(vm);</script> Now, we will create our custom component. Here is the basic component we will use for this first component. Place the code where the comment is, as we want to make sure it is added before our applyBindings method is executed: ko.components.register('like-widget', { viewModel: function(params) {    this.approve = params.approve;    // Behaviors:    this.toggle = function(){      this.approve(!this.approve());    }.bind(this); }, template:    '<div class="approve">      <button data-bind="click: toggle">        <span data-bind="visible: approve" class="glyphicon   glyphicon-thumbs-up"></span>        <span data-bind="visible:! approve()" class="glyphicon   glyphicon-thumbs-down"></span>      </button>    </div>' }); There are two sections to our components: the viewModel and template sections. In this article, we will be using Knockout template details inside the component. The standard Knockout component passes variables to the component using the params structure. We can either use this structure or you could optionally use the self = this approach if desired. In addition to setting the variable structure, it is also possible to create behaviors for our components. If we look in the template code, we can see we have data-bound the click event to toggle the approve setting in our component. Then, inside the button, by binding to the visible trait of the span element, either the thumbs up or thumbs down image will be shown to the user. Yes, we are using a Bootstrap icon element rather than a graphic here. Here is a screenshot of the initial state: When we click on the thumb image, it will toggle between the thumbs up and the thumbs down version. Since we also passed in the external variable that is bound to the page ViewModel, we see that the value in the matched span text will also toggle. Here is the markup we would add to the page to produce these results in the View section of our code: <div data-bind="component:   {name: 'like-widget', params:{ approve: like} }"></div> <span data-bind="text: like"></span> You could build this type of functionality with a jQuery plugin as well, but it is likely to take a bit more code to do two-way binding and match the tight functionality we have achieved here. This doesn't mean jQuery plugins are bad, as this is also a jQuery-related technology. What it does mean is we have ways to do things even better. It is this author's opinion that features like this would still make great additions to the core jQuery library. Yet, I am not holding my breath waiting for them to adopt a Knockout-type project to the wonderful collection of projects they have at this point, and do not feel we should hold that against them. Keeping focused on what they do best is one of the reasons libraries like Knockout can provide a wider array of options. It seems the decisions are working on our behalf even if they are taking a different approach than I expected. Dynamic component selection You should have noticed when we selected the component that we did so using a quoted declaration. While at first it may seem to be more constricting, remember that it is actually a power feature. By using a variable instead of a hardcoded value, you can dynamically select the component you would like to be inserted. Here is the markup code: <div data-bind="component:  { name: widgetName, params: widgetParams }"></div> <span data-bind="text:widgetParams.approve"></span> Notice that we are passing in both widgetName as well as widgetParams. Because we are binding the structure differently, we also need to show the bound value differently in our span. Here is the script part of our code that needs to be added to our viewModel code: self.widgetName = ko.observable("like-widget"); self.widgetParams = {    approve: ko.observable(true) }; We will get the same visible results but notice that each of the like buttons is acting independent of the other. What would happen if we put more than one of the same elements on the page? If we do that, Knockout components will act independent of other components. Well, most of the time they act independent. If we bound them to the same variable they would not be independent. In your viewModel declaration code, add another variable called like2 as follows: self.like2 = ko.observable(false); Now, we will add another like button to the page by copying our first like View code. This time, change the value from like to like2 as follows: <like-widget params="approve: like2"></like-widget> <span data-bind="text: like2"></span> This time when the page loads, the other likes display with a thumbs up, but this like will display with a thumbs down. The text will also show false stored in the bound value. Any of the like buttons will act independently because each of them is bound to unique values. Here is a screenshot of the third button: Bring Your Own Tags (BYOT) What is an element? Basically, an element is a component that you reach using the tag syntax. This is the way it is expressed in the official documentation at this point and it is likely to stay that way. It is still a component under the hood. Depending on the crowd you are in, this distinction will be more or less important. Mostly, just be aware of the distinction in case someone feels it is important, as that will let you be on the same page in discussions. Custom tags are a part of the forthcoming HTML feature called Web Components. Knockout allows you to start using them today. Here is the View code: <like-widget params="approve: like3"></like-widget> <span data-bind="text: like3"></span> You may want to code some tags with a single tag rather than a double tag, as in an opening and closing tag syntax. Well, at this time, there are challenges getting each browser to see the custom element tags when declared as a single tag. This means custom tags, or elements, will need to be declared as opening and closing tags for now. We will also need to create our like3 bound variable for viewModel with the following code: self.like3 = ko.observable(true); Running the code gives us the same wonderful functionality as our data-bind approach, but now we are creating our own HTML tags. Has there ever been a time you wanted a special HTML tag that just didn't exist? There is a chance you could create that now using Knockout component element-style coding. Enhancing attribute handling Now, while custom tags are awesome, there is just something different about passing everything in with a single param attribute. The reason for this is that this process matches how our tags work when we are using the data-bind approach to coding. In the following example, we will look at passing things in via individual attributes. This is not meant to work as a data-bind approach, but it is focused completely on the custom tag element component. The first thing you want to do is make sure this enhancement doesn't cause any issues with the normal elements. We did this by checking the custom elements for a standard prefix. You do not need to work through this code as it is a bit more advanced. The easiest thing to do is to include our Knockout components tag with the following script tag: <script src="/share/js/knockout.komponents.js"></script> In this tag, we have this code segment to convert the tags that start with kom- to tags that use individual attributes rather than a JSON translation of the attributes. Feel free to borrow the code to create libraries of your own. We are going to be creating a standard set of libraries on GitHub for these component tags. Since the HTML tags are Knockout components, we are calling these libraries "KOmponents". The" resource can be found at https://github.com/sosensible/komponents. Now, with that library included, we will use our View code to connect to the new tag. Here is the code to use in the View: <kom-like approve="tagLike"></kom-like> <span data-bind="text: tagLike"></span> Notice that in our HTML markup, the tag starts with the library prefix. This will also require viewModel to have a binding to pass into this tag as follows: self.tagLike = ko.observable(true); The following is the code for the actual "attribute-aware version" of Knockout components. Do not place this in the code as it is already included in the library in the shared directory: // <kom-like /> tag ko.components.register('kom-like', { viewModel: function(params) {    // Data: value must but true to approve    this.approve = params.approve;    // Behaviors:    this.toggle = function(){      this.approve(!this.approve());    }.bind(this); }, template:    '<div class="approve">      <button data-bind="click: toggle">        <span data-bind="visible: approve" class="glyphicon   glyphicon-thumbs-up"></span>        <span data-bind="visible:! approve()" class="glyphicon   glyphicon-thumbs-down"></span>      </button>    </div>' }); The tag in the View changed as we passed the information in via named attributes and not as a JSON structure inside a param attribute. We also made sure to manage these tags by using a prefix. The reason for this is that we did not want our fancy tags to break the standard method of passing params commonly practiced with regular Knockout components. As we see, again we have another functional component with the added advantage of being able to pass the values in a style more familiar to those used to coding with HTML tags. Building your own libraries Again, we are calling our custom components KOmponents. We will be creating a number of library solutions over time and welcome others to join in. Tags will not do everything for us, as there are some limitations yet to be conquered. That doesn't mean we wait for all the features before doing the ones we can for now. In this article, we will also be showing some tags from our Bootstrap KOmponents library. First we will need to include the Bootstrap KOmponents library: <script src="/share/js/knockout.komponents.bs.js"></script> Above viewModel in our script, we need to add a function to make this section of code simpler. At times, when passing items into observables, we can pass in richer bound data using a function like this. Again, create this function above the viewModel declaration of the script, shown as follows: var listItem = function(display, students){ this.display = ko.observable(display); this.students = ko.observable(students); this.type = ko.computed(function(){    switch(Math.ceil(this.students()/5)){      case 1:      case 2:        return 'danger';        break;      case 3:        return 'warning';        break;      case 4:        return 'info';        break;      default:        return 'success';    } },this); }; Now, inside viewModel, we will declare a set of data to pass to a Bootstrap style listGroup as follows: self.listData = ko.observableArray([ new listItem("HTML5",12), new listItem("CSS",8), new listItem("JavaScript",19), new listItem("jQuery",48), new listItem("Knockout",33) ]); Each item in our array will have display, students, and type variables. We are using a number of features in Bootstrap here but packaging them all up inside our Bootstrap smart tag. This tag starts to go beyond the bare basics. It is still very implementable, but we don't want to throw too much at you to absorb at one time, so we will not go into the detailed code for this tag. What we do want to show is how much power can be wrapped into custom Knockout tags. Here is the markup we will use to call this tag and bind the correct part of viewModel for display: <kom-listgroup data="listData" badgeField="'students'"   typeField="'type'"></kom-listgroup> That is it. You should take note of a couple of special details. The data is passed in as a bound Knockout ViewModel. The badge field is passed in as a string name to declare the field on the data collection where the badge count will be pulled. The same string approach has been used for the type field. The type will set the colors as per standard Bootstrap types. The theme here is that if there are not enough students to hold a class, then it shows the danger color in the list group custom tag. Here is what it looks like in the browser when we run the code: While this is neat, let's jump into our browser tools console and change the value of one of the items. Let's say there was a class on some cool web technology called jQuery. What if people had not heard of it and didn't know what it was and you really wanted to take the class? Well, it would be nice to encourage a few others to check it out. How would you know whether the class was at a danger level or not? Well, we could simply use the badge and the numbers, but how awesome is it to also use the color coding hint? Type the following code into the console and see what changes: vm.listData()[3].display() Because JavaScript starts counting with zero for the first item, we will get the following result: Now we know we have the right item, so let's set the student count to nine using the following code in the browser console: vm.listData()[3].students(9) Notice the change in the jQuery class. Both the badge and the type value have updated. This screenshot of the update shows how much power we can wield with very little manual coding: We should also take a moment to see how the type was managed. Using the functional assignment, we were able to use the Knockout computed binding for that value. Here is the code for that part again: this.type = ko.computed(function(){ switch(Math.ceil(this.students()/5)){    case 1:    case 2:      return 'danger';      break;    case 3:      return 'warning';      break;    case 4:      return 'info';      break;    default:      return 'success'; } },this); While the code is outside the viewModel declaration, it is still able to bind properly to make our code run even inside a custom tag created with Knockout's component binding. Bootstrap component example Here is another example of binding with Bootstrap. The general best practice for using modal display boxes is to place them higher in the code, perhaps under the body tag, to make sure there are no conflicts with the rest of the code. Place this tag right below the body tag as shown in the following code: <kom-modal id="'komModal'" title="komModal.title()"   body="komModal.body()"></kom-modal> Again, we will need to make some declarations inside viewModel for this to work right. Enter this code into the declarations of viewModel: self.komModal = { title: ko.observable('Modal KOMponent'), body: ko.observable('This is the body of the <strong>modal   KOMponent</strong>.') }; We will also create a button on the page to call our viewModel. The button will use the binding that is part of Bootstrap. The data-toggle and data-target attributes are not Knockout binding features. Knockout works side-by-side wonderfully though. Another point of interest is the standard ID attribute, which tells how Bootstrap items, like this button, interact with the modal box. This is another reason it may be beneficial to use KOmponents or a library like it. Here is the markup code: <button type="button" data-toggle="modal" data-   target="#komModal">Open Modal KOmponent</button> When we click on the button, this is the requestor we see: Now, to understand the power of Knockout working with our requestor, head back over to your browser tools console. Enter the following command into the prompt: vm.komModal.body("Wow, live data binding!") The following screenshot shows the change: Who knows what type of creative modular boxes we can build using this type of technology. This brings us closer towards creating what we can imagine. Perhaps it may bring us closer to building some of the wild things our customers imagine. While that may not be your main motivation for using Knockout, it would be nice to have a few less roadblocks when we want to be creative. It would also be nice to have this wonderful ability to package and reuse these solutions across a site without using copy and paste and searching back through the code when the client makes a change to make updates. Again, feel free to look at the file to see how we made these components work. They are not extremely complicated once you get the basics of using Knockout and its components. If you are looking to build components of your own, they will help you get some insight on how to do things inside as you move your skills to the next level. Understanding the AMD approach We are going to look into the concept of what makes an AMD-style website. The point of this approach to sites is to pull content on demand. The content, or modules as they are defined here, does not need to be loaded in a particular order. If there are pieces that depend on other pieces, that is, of course, managed. We will be using the RequireJS library to manage this part of our code. We will create four files in this example, as follows: amd.html amd.config.js pick.js pick.html In our AMD page, we are going to create a configuration file for our RequireJS functionality. That will be the amd.config.js file mentioned in the aforementioned list. We will start by creating this file with the following code: // require.js settings var require = {    baseUrl: ".",    paths: {        "bootstrap":       "/share/js/bootstrap.min",        "jquery":           "/share/js/jquery.min",        "knockout":         "/share/js/knockout",        "text":             "/share/js/text"    },    shim: {        "bootstrap": { deps: ["jquery"] },        "knockout": { deps: ["jquery"] },    } }; We see here that we are creating some alias names and setting the paths these names point to for this page. The file could, of course, be working for more than one page, but in this case, it has specifically been created for a single page. The configuration in RequireJS does not need the .js extension on the file names, as you would have noted. Now, we will look at our amd.html page where we pull things together. We are again using the standard page we have used for this article, which you will notice if you preview the done file example of the code. There are a couple of differences though, because the JavaScript files do not all need to be called at the start. RequireJS handles this well for us. We are not saying this is a standard practice of AMD, but it is an introduction of the concepts. We will need to include the following three script files in this example: <script src="/share/js/knockout.js"></script> <script src="amd.config.js"></script> <script src="/share/js/require.js"></script> Notice that the configuration settings need to be set before calling the require.js library. With that set, we can create the code to wire Knockout binding on the page. This goes in our amd.html script at the bottom of the page: <script> ko.components.register('pick', { viewModel: { require: 'pick' }, template: { require: 'text!pick.html' } }); viewModel = function(){ this.choice = ko.observable(); } vm = new viewModel(); ko.applyBindings(vm); </script> Most of this code should look very familiar. The difference is that the external files are being used to set the content for viewModel and template in the pick component. The require setting smartly knows to include the pick.js file for the pick setting. It does need to be passed as a string, of course. When we include the template, you will see that we use text! in front of the file we are including. We also declare the extension on the file name in this case. The text method actually needs to know where the text is coming from, and you will see in our amd.config.js file that we created an alias for the inclusion of the text function. Now, we will create the pick.js file and place it in the same directory as the amd.html file. It could have been in another directory, and you would have to just set that in the component declaration along with the filename. Here is the code for this part of our AMD component: define(['knockout'], function(ko) {    function LikeWidgetViewModel(params) {        this.chosenValue = params.value;        this.land = Math.round(Math.random()) ? 'heads' : 'tails';    }    LikeWidgetViewModel.prototype.heads = function() {        this.chosenValue('heads');    };    LikeWidgetViewModel.prototype.tails = function() {        this.chosenValue('tails');    };    return LikeWidgetViewModel; }); Notice that our code starts with the define method. This is our AMD functionality in place. It is saying that before we try to execute this section of code we need to make sure the Knockout library is loaded. This allows us to do on-demand loading of code as needed. The code inside the viewModel section is the same as the other examples we have looked at with one exception. We return viewModel as you see at the end of the preceding code. We used the shorthand code to set the value for heads and tails in this example. Now, we will look at our template file, pick.html. This is the code we will have in this file: <div class="like-or-dislike" data-bind="visible: !chosenValue()"> <button data-bind="click: heads">Heads</button> <button data-bind="click: tails">Tails</button> </div> <div class="result" data-bind="visible: chosenValue">    You picked <strong data-bind="text: chosenValue"></strong>    The correct value was <strong data-bind="text:   land"></strong> </div> There is nothing special other than the code needed to make this example work. The goal is to allow a custom tag to offer up heads or tails options on the page. We also pass in a bound variable from viewModel. We will be passing it into three identical tags. The tags are actually going to load the content instantly in this example. The goal is to get familiar with how the code works. We will take it to full practice at the end of the article. Right now, we will put this code in the View segment of our amd.html page: <h2>One Choice</h2> <pick params="value: choice"></pick><br> <pick params="value: choice"></pick><br> <pick params="value: choice"></pick> Notice that we have included the pick tag three times. While we are passing in the bound choice item from viewModel, each tag will randomly choose heads or tails. When we run the code, this is what we will see: Since we passed the same bound item into each of the three tags, when we click on any heads or tails set, it will immediately pass that value out to viewModel, which will in turn immediately pass the value back into the other two tag sets. They are all wired together through viewModel binding being the same variable. This is the result we get if we click on Tails: Well, it is the results we got that time. Actually, the results change pretty much every time we refresh the page. Now, we are ready to do something extra special by combining our AMD approach with Knockout modules. Summary This article has shown the awesome power of templates working together with ViewModels within Knockout components. You should now have an awesome foundation to do more with less than ever before. You should know how to mingle your jQuery code with the Knockout code side by side. To review, in this article, we learned what Knockout components are. We learned how to use the components to create custom HTML elements that are interactive and powerful. We learned how to enhance custom elements to allow variables to be managed using the more common attributes approach. We learned how to use an AMD-style approach to coding with Knockout. We also learned how to AJAX everything and integrate jQuery to enhance Knockout-based solutions. What's next? That is up to you. One thing is for sure, the possibilities are broader using Knockout than they were before. Happy coding and congratulations on completing your study of KnockoutJS! Resources for Article: Further resources on this subject: Top features of KnockoutJS [article] Components [article] Web Application Testing [article]
Read more
  • 0
  • 0
  • 1776

article-image-postgresql-extensible-rdbms
Packt
03 Mar 2015
18 min read
Save for later

PostgreSQL as an Extensible RDBMS

Packt
03 Mar 2015
18 min read
This article by Usama Dar, the author of the book PostgreSQL Server Programming - Second Edition, explains the process of creating a new operator, overloading it, optimizing it, creating index access methods, and much more. PostgreSQL is an extensible database. I hope you've learned this much by now. It is extensible by virtue of the design that it has. As discussed before, PostgreSQL uses a catalog-driven design. In fact, PostgreSQL is more catalog-driven than most of the traditional relational databases. The key benefit here is that the catalogs can be changed or added to, in order to modify or extend the database functionality. PostgreSQL also supports dynamic loading, that is, a user-written code can be provided as a shared library, and PostgreSQL will load it as required. (For more resources related to this topic, see here.) Extensibility is critical for many businesses, which have needs that are specific to that business or industry. Sometimes, the tools provided by the traditional database systems do not fulfill those needs. People in those businesses know best how to solve their particular problems, but they are not experts in database internals. It is often not possible for them to cook up their own database kernel or modify the core or customize it according to their needs. A truly extensible database will then allow you to do the following: Solve domain-specific problems in a seamless way, like a native solution Build complete features without modifying the core database engine Extend the database without interrupting availability PostgreSQL not only allows you to do all of the preceding things, but also does these, and more with utmost ease. In terms of extensibility, you can do the following things in a PostgreSQL database: Create your own data types Create your own functions Create your own aggregates Create your own operators Create your own index access methods (operator classes) Create your own server programming language Create foreign data wrappers (SQL/MED) and foreign tables What can't be extended? Although PostgreSQL is an extensible platform, there are certain things that you can't do or change without explicitly doing a fork, as follows: You can't change or plug in a new storage engine. If you are coming from the MySQL world, this might annoy you a little. However, PostgreSQL's storage engine is tightly coupled with its executor and the rest of the system, which has its own benefits. You can't plug in your own planner/parser. One can argue for and against the ability to do that, but at the moment, the planner, parser, optimizer, and so on are baked into the system and there is no possibility of replacing them. There has been some talk on this topic, and if you are of the curious kind, you can read some of the discussion at http://bit.ly/1yRMkK7. We will now briefly discuss some more of the extensibility capabilities of PostgreSQL. We will not dive deep into the topics, but we will point you to the appropriate link where more information can be found. Creating a new operator Now, let's take look at how we can add a new operator in PostgreSQL. Adding new operators is not too different from adding new functions. In fact, an operator is syntactically just a different way to use an existing function. For example, the + operator calls a built-in function called numeric_add and passes it the two arguments. When you define a new operator, you must define the data types that the operator expects as arguments and define which function is to be called. Let's take a look at how to define a simple operator. You have to use the CREATE OPERATOR command to create an operator. Let's use that function to create a new Fibonacci operator, ##, which will have an integer on its left-hand side: CREATE OPERATOR ## (PROCEDURE=fib, LEFTARG=integer); Now, you can use this operator in your SQL to calculate a Fibonacci number: testdb=# SELECT 12##;?column?----------144(1 row) Note that we defined that the operator will have an integer on the left-hand side. If you try to put a value on the right-hand side of the operator, you will get an error: postgres=# SELECT ##12;ERROR: operator does not exist: ## integer at character 8HINT: No operator matches the given name and argument type(s). Youmight need to add explicit type casts.STATEMENT: select ##12;ERROR: operator does not exist: ## integerLINE 1: select ##12;^HINT: No operator matches the given name and argument type(s). Youmight need to add explicit type casts. Overloading an operator Operators can be overloaded in the same way as functions. This means, that an operator can have the same name as an existing operator but with a different set of argument types. More than one operator can have the same name, but two operators can't share the same name if they accept the same types and positions of the arguments. As long as there is a function that accepts the same kind and number of arguments that an operator defines, it can be overloaded. Let's override the ## operator we defined in the last example, and also add the ability to provide an integer on the right-hand side of the operator: CREATE OPERATOR ## (PROCEDURE=fib, RIGHTARG=integer); Now, running the same SQL, which resulted in an error last time, should succeed, as shown here: testdb=# SELECT ##12;?column?----------144(1 row) You can drop the operator using the DROP OPERATOR command. You can read more about creating and overloading new operators in the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/sql-createoperator.html and http://www.postgresql.org/docs/current/static/xoper.html. There are several optional clauses in the operator definition that can optimize the execution time of the operators by providing information about operator behavior. For example, you can specify the commutator and the negator of an operator that help the planner use the operators in index scans. You can read more about these optional clauses at http://www.postgresql.org/docs/current/static/xoper-optimization.html. Since this article is just an introduction to the additional extensibility capabilities of PostgreSQL, we will just introduce a couple of optimization options; any serious production quality operator definitions should include these optimization clauses, if applicable. Optimizing operators The optional clauses tell the PostgreSQL server about how the operators behave. These options can result in considerable speedups in the execution of queries that use the operator. However, if you provide these options incorrectly, it can result in a slowdown of the queries. Let's take a look at two optimization clauses called commutator and negator. COMMUTATOR This clause defines the commuter of the operator. An operator A is a commutator of operator B if it fulfils the following condition: x A y = y B x. It is important to provide this information for the operators that will be used in indexes and joins. As an example, the commutator for > is <, and the commutator of = is = itself. This helps the optimizer to flip the operator in order to use an index. For example, consider the following query: SELECT * FROM employee WHERE new_salary > salary; If the index is defined on the salary column, then PostgreSQL can rewrite the preceding query as shown: SELECT * from employee WHERE salary < new_salary This allows PostgreSQL to use a range scan on the index column salary. For a user-defined operator, the optimizer can only do this flip around if the commutator of a user-defined operator is defined: CREATE OPERATOR > (LEFTARG=integer, RIGHTARG=integer, PROCEDURE=comp, COMMUTATOR = <) NEGATOR The negator clause defines the negator of the operator. For example, <> is a negator of =. Consider the following query: SELECT * FROM employee WHERE NOT (dept = 10); Since <> is defined as a negator of =, the optimizer can simplify the preceding query as follows: SELECT * FROM employee WHERE dept <> 10; You can even verify that using the EXPLAIN command: postgres=# EXPLAIN SELECT * FROM employee WHERE NOTdept = 'WATER MGMNT';QUERY PLAN---------------------------------------------------------Foreign Scan on employee (cost=0.00..1.10 rows=1 width=160)Filter: ((dept)::text <> 'WATER MGMNT'::text)Foreign File: /Users/usamadar/testdata.csvForeign File Size: 197(4 rows) Creating index access methods Let's discuss how to index new data types or user-defined types and operators. In PostgreSQL, an index is more of a framework that can be extended or customized for using different strategies. In order to create new index access methods, we have to create an operator class. Let's take a look at a simple example. Let's consider a scenario where you have to store some special data such as an ID or a social security number in the database. The number may contain non-numeric characters, so it is defined as a text type: CREATE TABLE test_ssn (ssn text);INSERT INTO test_ssn VALUES ('222-11-020878');INSERT INTO test_ssn VALUES ('111-11-020978'); Let's assume that the correct order for this data is such that it should be sorted on the last six digits and not the ASCII value of the string. The fact that these numbers need a unique sort order presents a challenge when it comes to indexing the data. This is where PostgreSQL operator classes are useful. An operator allows a user to create a custom indexing strategy. Creating an indexing strategy is about creating your own operators and using them alongside a normal B-tree. Let's start by writing a function that changes the order of digits in the value and also gets rid of the non-numeric characters in the string to be able to compare them better: CREATE OR REPLACE FUNCTION fix_ssn(text)RETURNS text AS $$BEGINRETURN substring($1,8) || replace(substring($1,1,7),'-','');END;$$LANGUAGE 'plpgsql' IMMUTABLE; Let's run the function and verify that it works: testdb=# SELECT fix_ssn(ssn) FROM test_ssn;fix_ssn-------------0208782221102097811111(2 rows) Before an index can be used with a new strategy, we may have to define some more functions depending on the type of index. In our case, we are planning to use a simple B-tree, so we need a comparison function: CREATE OR REPLACE FUNCTION ssn_compareTo(text, text)RETURNS int AS$$BEGINIF fix_ssn($1) < fix_ssn($2)THENRETURN -1;ELSIF fix_ssn($1) > fix_ssn($2)THENRETURN +1;ELSERETURN 0;END IF;END;$$ LANGUAGE 'plpgsql' IMMUTABLE; It's now time to create our operator class: CREATE OPERATOR CLASS ssn_opsFOR TYPE text USING btreeASOPERATOR 1 < ,OPERATOR 2 <= ,OPERATOR 3 = ,OPERATOR 4 >= ,OPERATOR 5 > ,FUNCTION 1 ssn_compareTo(text, text); You can also overload the comparison operators if you need to compare the values in a special way, and use the functions in the compareTo function as well as provide them in the CREATE OPERATOR CLASS command. We will now create our first index using our brand new operator class: CREATE INDEX idx_ssn ON test_ssn (ssn ssn_ops); We can check whether the optimizer is willing to use our special index, as follows: testdb=# SET enable_seqscan=off;testdb=# EXPLAIN SELECT * FROM test_ssn WHERE ssn = '02087822211';QUERY PLAN------------------------------------------------------------------Index Only Scan using idx_ssn on test_ssn (cost=0.13..8.14 rows=1width=32)Index Cond: (ssn = '02087822211'::text)(2 rows) Therefore, we can confirm that the optimizer is able to use our new index. You can read about index access methods in the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/xindex.html. Creating user-defined aggregates User-defined aggregate functions are probably a unique PostgreSQL feature, yet they are quite obscure and perhaps not many people know how to create them. However, once you are able to create this function, you will wonder how you have lived for so long without using this feature. This functionality can be incredibly useful, because it allows you to perform custom aggregates inside the database, instead of querying all the data from the client and doing a custom aggregate in your application code, that is, the number of hits on your website per minute from a specific country. PostgreSQL has a very simple process for defining aggregates. Aggregates can be defined using any functions and in any languages that are installed in the database. Here are the basic steps to building an aggregate function in PostgreSQL: Define a start function that will take in the values of a result set; this function can be defined in any PL language you want. Define an end function that will do something with the final output of the start function. This can be in any PL language you want. Define the aggregate using the CREATE AGGREGATE command, providing the start and end functions you just created. Let's steal an example from the PostgreSQL wiki at http://wiki.postgresql.org/wiki/Aggregate_Median. In this example, we will calculate the statistical median of a set of data. For this purpose, we will define start and end aggregate functions. Let's define the end function first, which takes an array as a parameter and calculates the median. We are assuming here that our start function will pass an array to the following end function: CREATE FUNCTION _final_median(anyarray) RETURNS float8 AS $$WITH q AS(SELECT valFROM unnest($1) valWHERE VAL IS NOT NULLORDER BY 1),cnt AS(SELECT COUNT(*) AS c FROM q)SELECT AVG(val)::float8FROM(SELECT val FROM qLIMIT 2 - MOD((SELECT c FROM cnt), 2)OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0)) q2;$$ LANGUAGE sql IMMUTABLE; Now, we create the aggregate as shown in the following code: CREATE AGGREGATE median(anyelement) (SFUNC=array_append,STYPE=anyarray,FINALFUNC=_final_median,INITCOND='{}'); The array_append start function is already defined in PostgreSQL. This function appends an element to the end of an array. In our example, the start function takes all the column values and creates an intermediate array. This array is passed on to the end function, which calculates the median. Now, let's create a table and some test data to run our function: testdb=# CREATE TABLE median_test(t integer);CREATE TABLEtestdb=# INSERT INTO median_test SELECT generate_series(1,10);INSERT 0 10 The generate_series function is a set returning function that generates a series of values, from start to stop with a step size of one. Now, we are all set to test the function: testdb=# SELECT median(t) FROM median_test;median--------5.5(1 row) The mechanics of the preceding example are quite easy to understand. When you run the aggregate, the start function is used to append all the table data from column t into an array using the append_array PostgreSQL built-in. This array is passed on to the final function, _final_median, which calculates the median of the array and returns the result in the same data type as the input parameter. This process is done transparently to the user of the function who simply has a convenient aggregate function available to them. You can read more about the user-defined aggregates in the PostgreSQL documentation in much more detail at http://www.postgresql.org/docs/current/static/xaggr.html. Using foreign data wrappers PostgreSQL foreign data wrappers (FDW) are an implementation of SQL Management of External Data (SQL/MED), which is a standard added to SQL in 2013. FDWs are drivers that allow PostgreSQL database users to read and write data to other external data sources, such as other relational databases, NoSQL data sources, files, JSON, LDAP, and even Twitter. You can query the foreign data sources using SQL and create joins across different systems or even across different data sources. There are several different types of data wrappers developed by different developers and not all of them are production quality. You can see a select list of wrappers on the PostgreSQL wiki at http://wiki.postgresql.org/wiki/Foreign_data_wrappers. Another list of FDWs can be found on PGXN at http://pgxn.org/tag/fdw/. Let's take look at a small example of using file_fdw to access data in a CSV file. First, you need to install the file_fdw extension. If you compiled PostgreSQL from the source, you will need to install the file_fdw contrib module that is distributed with the source. You can do this by going into the contrib/file_fdw folder and running make and make install. If you used an installer or a package for your platform, this module might have been installed automatically. Once the file_fdw module is installed, you will need to create the extension in the database: postgres=# CREATE EXTENSION file_fdw;CREATE EXTENSION Let's now create a sample CSV file that uses the pipe, |, as a separator and contains some employee data: $ cat testdata.csvAARON, ELVIA J|WATER RATE TAKER|WATER MGMNT|81000.00|73862.00AARON, JEFFERY M|POLICE OFFICER|POLICE|74628.00|74628.00AARON, KIMBERLEI R|CHIEF CONTRACT EXPEDITER|FLEETMANAGEMNT|77280.00|70174.00 Now, we should create a foreign server that is pretty much a formality because the file is on the same server. A foreign server normally contains the connection information that a foreign data wrapper uses to access an external data resource. The server needs to be unique within the database: CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw; The next step, is to create a foreign table that encapsulates our CSV file: CREATE FOREIGN TABLE employee (emp_name VARCHAR,job_title VARCHAR,dept VARCHAR,salary NUMERIC,sal_after_tax NUMERIC) SERVER file_serverOPTIONS (format 'csv',header 'false' , filename '/home/pgbook/14/testdata.csv', delimiter '|', null '');''); The CREATE FOREIGN TABLE command creates a foreign table and the specifications of the file are provided in the OPTIONS section of the preceding code. You can provide the format, and if the first line of the file is a header (header 'false'), in our case there is no file header. We then provide the name and path of the file and the delimiter used in the file, which in our case is the pipe symbol |. In this example, we also specify that the null values should be represented as an empty string. Let's run a SQL command on our foreign table: postgres=# select * from employee;-[ RECORD 1 ]-+-------------------------emp_name | AARON, ELVIA Jjob_title | WATER RATE TAKERdept | WATER MGMNTsalary | 81000.00sal_after_tax | 73862.00-[ RECORD 2 ]-+-------------------------emp_name | AARON, JEFFERY Mjob_title | POLICE OFFICERdept | POLICEsalary | 74628.00sal_after_tax | 74628.00-[ RECORD 3 ]-+-------------------------emp_name | AARON, KIMBERLEI Rjob_title | CHIEF CONTRACT EXPEDITERdept | FLEET MANAGEMNTsalary | 77280.00sal_after_tax | 70174.00 Great, looks like our data is successfully loaded from the file. You can also use the d meta command to see the structure of the employee table: postgres=# d employee;Foreign table "public.employee"Column | Type | Modifiers | FDW Options---------------+-------------------+-----------+-------------emp_name | character varying | |job_title | character varying | |dept | character varying | |salary | numeric | |sal_after_tax | numeric | |Server: file_serverFDW Options: (format 'csv', header 'false',filename '/home/pg_book/14/testdata.csv', delimiter '|',"null" '') You can run explain on the query to understand what is going on when you run a query on the foreign table: postgres=# EXPLAIN SELECT * FROM employee WHERE salary > 5000;QUERY PLAN---------------------------------------------------------Foreign Scan on employee (cost=0.00..1.10 rows=1 width=160)Filter: (salary > 5000::numeric)Foreign File: /home/pgbook/14/testdata.csvForeign File Size: 197(4 rows) The ALTER FOREIGN TABLE command can be used to modify the options. More information about the file_fdw is available at http://www.postgresql.org/docs/current/static/file-fdw.html. You can take a look at the CREATE SERVER and CREATE FOREIGN TABLE commands in the PostgreSQL documentation for more information on the many options available. Each of the foreign data wrappers comes with its own documentation about how to use the wrapper. Make sure that an extension is stable enough before it is used in production. The PostgreSQL core development group does not support most of the FDW extensions. If you want to create your own data wrappers, you can find the documentation at http://www.postgresql.org/docs/current/static/fdwhandler.html as an excellent starting point. The best way to learn, however, is to read the code of other available extensions. Summary This includes the ability to add new operators, new index access methods, and create your own aggregates. You can access foreign data sources, such as other databases, files, and web services using PostgreSQL foreign data wrappers. These wrappers are provided as extensions and should be used with caution, as most of them are not officially supported. Even though PostgreSQL is very extensible, you can't plug in a new storage engine or change the parser/planner and executor interfaces. These components are very tightly coupled with each other and are, therefore, highly optimized and mature. Resources for Article: Further resources on this subject: Load balancing MSSQL [Article] Advanced SOQL Statements [Article] Running a PostgreSQL Database Server [Article]
Read more
  • 0
  • 0
  • 9211
article-image-basic-sql-server-administration
Packt
03 Mar 2015
11 min read
Save for later

Basic SQL Server Administration

Packt
03 Mar 2015
11 min read
 In this article by Donabel Santos, the author of PowerShell for SQL Server Essentials, we will look at how to accomplish typical SQL Server administration tasks by using PowerShell. Many of the tasks that we will see can be accomplished by using SQL Server Management Objects (SMO). As we encounter new SMO classes, it is best to verify the properties and methods of that class using Get-Help, or by directly visiting the TechNet or MSDN website. (For more resources related to this topic, see here.) Listing databases and tables Let's start out by listing the current databases. The SMO Server class has access to all the databases in that instance, so a server variable will have to be created first. To create one using Windows Authentication, you can use the following snippet: Import-Module SQLPS -DisableNameChecking #current server name $servername = "ROGUE"   #below should be a single line of code $server = New-Object "Microsoft.SqlServer.Management.  Smo.Server" $servername If you need to use SQL Server Authentication, you can set the LoginSecure property to false, and prompt the user for the database credentials: #with SQL authentication, we need #to supply the SQL Login and password $server.ConnectionContext.LoginSecure=$false; $credential = Get-Credential $server.ConnectionContext.set_Login($credential.UserName) $server.ConnectionContext.set_SecurePassword($credential.Password) Another way is to create a Microsoft.SqlServer.Management.Common.ServerConnection object and pass the database connection string: #code below is a single line $connectionString = "Server=$dataSource;uid=$username;   pwd=$passwordd;Database=$database;Integrated Security=False"   $connection = New-Object System.Data.SqlClient.SqlConnection $connection.ConnectionString = $connectionString To find out how many databases are there, you can use the Count property of the Databases property: $server.databases.Count In addition to simply displaying the number of databases in an instance, we can also find out additional information such as creation data, recovery model, number of tables, stored procedures, and user-defined functions. The following is a sample script that pulls this information: #create empty array $result = @() $server.Databases | Where-Object IsSystemObject -eq $false | ForEach-Object {     $db = $_     $object = [PSCustomObject] @{        Name          = $db.Name        CreateDate    = $db.CreateDate        RecoveryModel = $db.RecoveryModel        NumTables     = $db.Tables.Count        NumUsers      = $db.Users.Count        NumSP         = $db.StoredProcedures.Count        NumUDF        = $db.UserDefinedFunctions.Count     }     $result += $object } $result | Format-Table -AutoSize A sample result looks like the following screenshot: In this script, we have manipulated the output a little. Since we want information in a format different from the default, we created a custom object using the PSCustomObject class to store all this information. The PSCustomObject class was introduced in PowerShell V3. You can also use PSCustomObject to draw data points from different objects and pull them together in a single result set. Each line in the sample result shown in the preceding screenshot is a single PSCustomObject. All of these, in turn, are stored in the $result array, which can be piped to the Format-Table cmdlet for a little easier display. After learning these basics about PSCustomObject, you can adapt this script to increase the list of properties you are querying and change the formatting of the display. You can also export these to a file if you need to. To find out additional properties, you can pipe $server.Databases to the Get-Member cmdlet: $server.Databases | Get-Member | Where-Object MemberType –eq "Property" Once you execute this, your resulting screen should look similar to the following screenshot: To find out which methods are available for SMO database objects, we can use a very similar snippet, but this time, we will filter based on methods: $server.Databases | Get-Member | Where-Object MemberType –eq "Method" Once you execute this, your resulting screen should look similar to the following screenshot: Listing database files and filegroups Managing databases also incorporates monitoring and managing of the files and filegroups associated with these databases. Still, using SMO, we can pull this information via PowerShell. You can start by pulling all non-system databases: $server.Databases | Where-Object IsSystemObject -eq $false The preceding snippet iterates over all the databases in the system. You can use the Foreach-Object cmdlet to do the iteration, and for each iteration, you can get a handle to the current database object. The SMO database object will have access to a Filegroups property, which you can query to find out more about the filegroups associated with each database: ForEach-Object {   $db = $_   $db.FileGroups } This FileGroups class, in turn, can access all the files in that specific filegroup. Here is the complete script that lists all files and filegroups for all databases. Note that we use Foreach-Object several times: once to loop through all databases, then to loop through all filegroups for each database, and again to loop through all files in each filegroup: Import-Module SQLPS -DisableNameChecking   #current server name $servername = "ROGUE"   $server = New-Object "Microsoft.SqlServer.Management.Smo.  Server" $servername   $result = @()   $server.Databases | Where-Object IsSystemObject -eq $false | ForEach-Object {    $db = $_    $db.FileGroups |    ForEach-Object {       $fg = $_       $fg.Files |       ForEach-Object {          $file = $_            $object = [PSCustomObject] @{                 Database = $db.Name                 FileGroup = $fg.Name                 FileName = $file.FileName | Split-Path -Leaf                 "Size(MB)" = "{0:N2}" -f ($file.Size/1024)                 "UsedSpace(MB)" = "{0:N2}" -f ($file.UsedSpace/1MB)                 }          $result += $object         }    } } $result | Format-Table -AutoSize A sample result looks like the following screenshot: We have adjusted the result to make the display a bit more readable. For the FileName property, we extracted just the actual filename and did not report the path by piping the FileName property to the Split-Path cmdlet. The -Leaf option provides the filename part of the full path: $file.FileName | Split-Path -Leaf With Size and UsedSpace, we report the value in megabytes (MB). Since the default sizes are reported in kilobytes (KB), we have to divide the value by 1024. We also display the values with two decimal places: "Size(MB)" = "{0:N2}" -f ($file.Size/1024)< "UsedSpace(MB)" = "{0:N2}" -f ($file.UsedSpace/1MB) If you simply want to get the directory where the primary datafile is stored, you can use the following command: $db.PrimaryFilePath If you want to export the results to Excel or CSV, you simply need to take $result and instead of piping it to Format-Table, use one of the Export or Convert cmdlets. Adding files and filegroups Filegroups in SQL Server allow for a group of files to be managed together. It is almost akin to having folders on your desktop to allow you to manage, move, and save files together. To add a filegroup, you have to use the Microsoft.SqlServer.Management.Smo.Filegroup class. Assuming you already have variables that point to your server instance, you can create a variable that references the database you wish to work with, as shown in the following snippet: $dbname = "Registration" $db = $server.Databases[$dbname] Instantiating a Filegroup variable requires the handle to the SMO database object and a filegroup name. We have shown this in the following screenshot: #code below is a single line $fg = New-Object "Microsoft.SqlServer.Management.Smo.  Filegroup" $db, "FG1" When you're ready to create, invoke the Create() method: $fg.Create() Adding a datafile uses a similar approach. You need to identify which filegroup this new datafile belongs to. You will also need to identify the logical filename and actual file path of the new file. The following snippet will help you do that: #code below is a single line $datafile = New-Object "Microsoft.SqlServer.Management.Smo.DataFile" $fg, "data4"   $datafile.FileName = "C:DATAdata4.ndf" $datafile.Create() You can verify the changes visually in SQL Server Management Studio when you go to the database's properties. Under Files, you will see that the new secondary file, data4.ndf, has been added. If, at a later time, you need to increase any of the files' sizes, you can use SMO to create a handle to the file and change the Size property. The Size property is allocated by KB, so you will need to calculate accordingly. After the Size property is changed, invoke the Alter() method to persist the changes. The following is an example snippet to do this: $db = $server.Databases[$dbname] $fg = $db.FileGroups["FG1"] $file = $fg.Files["data4"] $file.Size = 2 * 1024 #2MB $file.Alter() Listing the processes SQL Server has a number of processes in the background that are needed for a normal operation. The SMO server class can access the list of processes by using the method EnumProcesses(). The following is an example script to pull current non-system processes, the programs that are using them, the databases that are using them, and the account that's configured to use/run them: Import-Module SQLPS -DisableNameChecking   #current server name $servername = "ROGUE"   $server = New-Object "Microsoft.SqlServer.Management.Smo.Server" $servername   $server.EnumProcesses() | Where-Object IsSystem -eq $false | Select-Object Spid, Database, IsSystem, Login, Status, Cpu, MemUsage, Program | Format-Table -AutoSize The result that you will get looks like the following screenshot: You can adjust this script based on your needs. For example, if you only need running queries, you can pipe it to the Where-Object cmdlet and filter by status. You can also sort the result based on the highest CPU or memory usage by piping this to the Sort-Object cmdlet. Should you need to kill any process, for example when some processes are blocked, you can use the KillProcess() method of the SMO server object. You will need to pass the SQL Server session ID (or SPID) to this method: $server.KillProcess($blockingSpid) If you want to kill all processes in a specific database, you can use the KillAllProcesses() method and pass the database name: $server.KillAllProcesses($dbname) Be careful though. Killing processes should not be done lightly. Before you kill a process, investigate what the process does, why you need to kill it, and what potential effects killing it will have on your database. Otherwise, killing processes could result in varying levels of system instability. Checking enabled features SQL has many features. We can find out if certain features are enabled by using SMO and PowerShell. To determine this, you need to access the object that owns that feature. For example, some features are available to be queried once you create an SMO server object: Import-Module SQLPS -DisableNameChecking   #current server name $servername = "ROGUE"   $server = New-Object "Microsoft.SqlServer.Management.Smo.Server" $servername   $server | Select-Object IsClustered, ClusterName, FilestreamLevel, IsFullTextInstalled, LinkedServers, IsHadrEnabled, AvailabilityGroups In the preceding script, we can easily find out the following parameters: Is the server clustered (IsClustered)? Does it support FileStream and to what level (FilestreamLevel)? Is FullText installed (IsFullTextInstalled)? Are there any configured linked servers in the system (LinkedServers)? Is AlwaysOn enabled (IsHadrEnabled) and are any availability groups configured (AvailabilityGroups)? There are also a number of cmdlets available with the SQLPS module that allow you to manage the AlwaysOn parameter: Replication can also be managed programmatically using the Replication Management Objects assembly. More information can be found at http://msdn.microsoft.com/en-us/library/ms146869.aspx. Summary In this article, we looked at some of the commands that can used to perform basic SQL Server administration tasks in PowerShell. Resources for Article: Further resources on this subject: Sql Server Analysis Services Administering and Monitoring Analysis Services? [article] Unleashing your Development Skills Powershell [article] The Arduino Mobile Robot [article]
Read more
  • 0
  • 0
  • 1635

article-image-elasticsearch-administration
Packt
03 Mar 2015
28 min read
Save for later

Elasticsearch Administration

Packt
03 Mar 2015
28 min read
In this article by Rafał Kuć and Marek Rogoziński, author of the book Mastering Elasticsearch, Second Edition we will talk more about the Elasticsearch configuration and new features introduced in Elasticsearch 1.0 and higher. By the end of this article, you will have learned: (For more resources related to this topic, see here.) Configuring the discovery and recovery modules Using the Cat API that allows a human-readable insight into the cluster status The backup and restore functionality Federated search Discovery and recovery modules When starting your Elasticsearch node, one of the first things that Elasticsearch does is look for a master node that has the same cluster name and is visible in the network. If a master node is found, the starting node gets joined into an already formed cluster. If no master is found, then the node itself is selected as a master (of course, if the configuration allows such behavior). The process of forming a cluster and finding nodes is called discovery. The module responsible for discovery has two main purposes—electing a master and discovering new nodes within a cluster. After the cluster is formed, a process called recovery is started. During the recovery process, Elasticsearch reads the metadata and the indices from the gateway, and prepares the shards that are stored there to be used. After the recovery of the primary shards is done, Elasticsearch should be ready for work and should continue with the recovery of all the replicas (if they are present). In this section, we will take a deeper look at these two modules and discuss the possibilities of configuration Elasticsearch gives us and what the consequences of changing them are. Note that the information provided in the Discovery and recovery modules section is an extension of what we already wrote in Elasticsearch Server Second Edition, published by Packt Publishing. Discovery configuration As we have already mentioned multiple times, Elasticsearch was designed to work in a distributed environment. This is the main difference when comparing Elasticsearch to other open source search and analytics solutions available. With such assumptions, Elasticsearch is very easy to set up in a distributed environment, and we are not forced to set up additional software to make it work like this. By default, Elasticsearch assumes that the cluster is automatically formed by the nodes that declare the same cluster.name setting and can communicate with each other using multicast requests. This allows us to have several independent clusters in the same network. There are a few implementations of the discovery module that we can use, so let's see what the options are. Zen discovery Zen discovery is the default mechanism that's responsible for discovery in Elasticsearch and is available by default. The default Zen discovery configuration uses multicast to find other nodes. This is a very convenient solution: just start a new Elasticsearch node and everything works—this node will be joined to the cluster if it has the same cluster name and is visible by other nodes in that cluster. This discovery method is perfectly suited for development time, because you don't need to care about the configuration; however, it is not advised that you use it in production environments. Relying only on the cluster name is handy but can also lead to potential problems and mistakes, such as the accidental joining of nodes. Sometimes, multicast is not available for various reasons or you don't want to use it for these mentioned reasons. In the case of bigger clusters, the multicast discovery may generate too much unnecessary traffic, and this is another valid reason why it shouldn't be used for production. For these cases, Zen discovery allows us to use the unicast mode. When using the unicast Zen discovery, a node that is not a part of the cluster will send a ping request to all the addresses specified in the configuration. By doing this, it informs all the specified nodes that it is ready to be a part of the cluster and can be either joined to an existing cluster or can form a new one. Of course, after the node joins the cluster, it gets the cluster topology information, but the initial connection is only done to the specified list of hosts. Remember that even when using unicast Zen discovery, the Elasticsearch node still needs to have the same cluster name as the other nodes. If you want to know more about the differences between multicast and unicast ping methods, refer to these URLs: http://en.wikipedia.org/wiki/Multicast and http://en.wikipedia.org/wiki/Unicast. If you still want to learn about the configuration properties of multicast Zen discovery, let's look at them. Multicast Zen discovery configuration The multicast part of the Zen discovery module exposes the following settings: discovery.zen.ping.multicast.address (the default: all available interfaces): This is the interface used for the communication given as the address or interface name. discovery.zen.ping.multicast.port (the default: 54328): This port is used for communication. discovery.zen.ping.multicast.group (the default: 224.2.2.4): This is the multicast address to send messages to. discovery.zen.ping.multicast.buffer_size (the default: 2048): This is the size of the buffer used for multicast messages. discovery.zen.ping.multicast.ttl (the default: 3): This is the time for which a multicast message lives. Every time a packet crosses the router, the TTL is decreased. This allows for the limiting area where the transmission can be received. Note that routers can have the threshold values assigned compared to TTL, which causes that TTL value to not match exactly the number of routers that a packet can jump over. discovery.zen.ping.multicast.enabled (the default: true): Setting this property to false turns off the multicast. You should disable multicast if you are planning to use the unicast discovery method. The unicast Zen discovery configuration The unicast part of Zen discovery provides the following configuration options: discovery.zen.ping.unicats.hosts: This is the initial list of nodes in the cluster. The list can be defined as a list or as an array of hosts. Every host can be given a name (or an IP address) or have a port or port range added. For example, the value of this property can look like this: ["master1", "master2:8181", "master3[80000-81000]"]. So, basically, the hosts' list for the unicast discovery doesn't need to be a complete list of Elasticsearch nodes in your cluster, because once the node is connected to one of the mentioned nodes, it will be informed about all the others that form the cluster. discovery.zen.ping.unicats.concurrent_connects (the default: 10): This is the maximum number of concurrent connections unicast discoveries will use. If you have a lot of nodes that the initial connection should be made to, it is advised that you increase the default value. Master node One of the main purposes of discovery apart from connecting to other nodes is to choose a master node—a node that will take care of and manage all the other nodes. This process is called master election and is a part of the discovery module. No matter how many master eligible nodes there are, each cluster will only have a single master node active at a given time. If there is more than one master eligible node present in the cluster, they can be elected as the master when the original master fails and is removed from the cluster. Configuring master and data nodes By default, Elasticsearch allows every node to be a master node and a data node. However, in certain situations, you may want to have worker nodes, which will only hold the data or process the queries and the master nodes that will only be used as cluster-managed nodes. One of these situations is to handle a massive amount of data, where data nodes should be as performant as possible, and there shouldn't be any delay in master nodes' responses. Configuring data-only nodes To set the node to only hold data, we need to instruct Elasticsearch that we don't want such a node to be a master node. In order to do this, we add the following properties to the elasticsearch.yml configuration file: node.master: falsenode.data: true Configuring master-only nodes To set the node not to hold data and only to be a master node, we need to instruct Elasticsearch that we don't want such a node to hold data. In order to do that, we add the following properties to the elasticsearch.yml configuration file: node.master: truenode.data: false Configuring the query processing-only nodes For large enough deployments, it is also wise to have nodes that are only responsible for aggregating query results from other nodes. Such nodes should be configured as nonmaster and nondata, so they should have the following properties in the elasticsearch.yml configuration file: node.master: falsenode.data: false Please note that the node.master and the node.data properties are set to true by default, but we tend to include them for configuration clarity. The master election configuration We already wrote about the master election configuration in Elasticsearch Server Second Edition, but this topic is very important, so we decided to refresh our knowledge about it. Imagine that you have a cluster that is built of 10 nodes. Everything is working fine until, one day, your network fails and three of your nodes are disconnected from the cluster, but they still see each other. Because of the Zen discovery and the master election process, the nodes that got disconnected elect a new master and you end up with two clusters with the same name with two master nodes. Such a situation is called a split-brain and you must avoid it as much as possible. When a split-brain happens, you end up with two (or more) clusters that won't join each other until the network (or any other) problems are fixed. If you index your data during this time, you may end up with data loss and unrecoverable situations when the nodes get joined together after the network split. In order to prevent split-brain situations or at least minimize the possibility of their occurrences, Elasticsearch provides a discovery.zen.minimum_master_nodes property. This property defines a minimum amount of master eligible nodes that should be connected to each other in order to form a cluster. So now, let's get back to our cluster; if we set the discovery.zen.minimum_master_nodes property to 50 percent of the total nodes available plus one (which is six, in our case), we would end up with a single cluster. Why is that? Before the network failure, we would have 10 nodes, which is more than six nodes, and these nodes would form a cluster. After the disconnections of the three nodes, we would still have the first cluster up and running. However, because only three nodes disconnected and three is less than six, these three nodes wouldn't be allowed to elect a new master and they would wait for reconnection with the original cluster. Zen discovery fault detection and configuration Elasticsearch runs two detection processes while it is working. The first process is to send ping requests from the current master node to all the other nodes in the cluster to check whether they are operational. The second process is a reverse of that—each of the nodes sends ping requests to the master in order to verify that it is still up and running and performing its duties. However, if we have a slow network or our nodes are in different hosting locations, the default configuration may not be sufficient. Because of this, the Elasticsearch discovery module exposes three properties that we can change: discovery.zen.fd.ping_interval: This defaults to 1s and specifies the interval of how often the node will send ping requests to the target node. discovery.zen.fd.ping_timeout: This defaults to 30s and specifies how long the node will wait for the sent ping request to be responded to. If your nodes are 100 percent utilized or your network is slow, you may consider increasing that property value. discovery.zen.fd.ping_retries: This defaults to 3 and specifies the number of ping request retries before the target node will be considered not operational. You can increase this value if your network has a high number of lost packets (or you can fix your network). There is one more thing that we would like to mention. The master node is the only node that can change the state of the cluster. To achieve a proper cluster state updates sequence, Elasticsearch master nodes process single cluster state update requests one at a time, make the changes locally, and send the request to all the other nodes so that they can synchronize their state. The master nodes wait for the given time for the nodes to respond, and if the time passes or all the nodes are returned, with the current acknowledgment information, it proceeds with the next cluster state update request processing. To change the time, the master node waits for all the other nodes to respond, and you should modify the default 30 seconds time by setting the discovery.zen.publish_timeout property. Increasing the value may be needed for huge clusters working in an overloaded network. The Amazon EC2 discovery Amazon, in addition to selling goods, has a few popular services such as selling storage or computing power in a pay-as-you-go model. So-called Amazon Elastic Compute Cloud (EC2) provides server instances and, of course, they can be used to install and run Elasticsearch clusters (among many other things, as these are normal Linux machines). This is convenient—you pay for instances that are needed in order to handle the current traffic or to speed up calculations, and you shut down unnecessary instances when the traffic is lower. Elasticsearch works well on EC2, but due to the nature of the environment, some features may work slightly differently. One of these features that works differently is discovery, because Amazon EC2 doesn't support multicast discovery. Of course, we can switch to unicast discovery, but sometimes, we want to be able to automatically discover nodes and, with unicast, we need to at least provide the initial list of hosts. However, there is an alternative—we can use the Amazon EC2 plugin, a plugin that combines the multicast and unicast discovery methods using the Amazon EC2 API. Make sure that during the set up of EC2 instances, you set up communication between them (on port 9200 and 9300 by default). This is crucial in order to have Elasticsearch nodes communicate with each other and, thus, cluster functioning is required. Of course, this communication depends on network.bind_host and network.publish_host (or network.host) settings. The EC2 plugin installation The installation of a plugin is as simple as with most of the plugins. In order to install it, we should run the following command: bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.4.0 The EC2 plugin's generic configuration This plugin provides several configuration settings that we need to provide in order for the EC2 discovery to work: cluster.aws.access_key: Amazon access key—one of the credential values you can find in the Amazon configuration panel cluster.aws.secret_key: Amazon secret key—similar to the previously mentioned access_key setting, it can be found in the EC2 configuration panel The last thing is to inform Elasticsearch that we want to use a new discovery type by setting the discovery.type property to ec2 value and turn off multicast. Optional EC2 discovery configuration options The previously mentioned settings are sufficient to run the EC2 discovery, but in order to control the EC2 discovery plugin behavior, Elasticsearch exposes additional settings: cloud.aws.region: This region will be used to connect with Amazon EC2 web services. You can choose a region that's adequate for the region where your instance resides, for example, eu-west-1 for Ireland. The possible values can be eu-west, sa-east, us-east, us-west-1, us-west-2, ap-southeast-1, and ap-southeast-1. cloud.aws.ec2.endpoint: If you are using EC2 API services, instead of defining a region, you can provide an address of the AWS endpoint, for example, ec2.eu-west-1.amazonaws.com. cloud.aws.protocol: This is the protocol that should be used by the plugin to connect to Amazon Web Services endpoints. By default, Elasticsearch will use the HTTPS protocol (which means setting the value of the property to https). We can also change this behavior and set the property to http for the plugin to use HTTP without encryption. We are also allowed to overwrite the cloud.aws.protocol settings for each service by using the cloud.aws.ec2.protocol and cloud.aws.s3.protocol properties (the possible values are the same—https and http). cloud.aws.proxy_host: Elasticsearch allows us to define a proxy that will be used to connect to AWS endpoints. The cloud.aws.proxy_host property should be set to the address to the proxy that should be used. cloud.aws.proxy_port: The second property related to the AWS endpoints proxy allows us to specify the port on which the proxy is listening. The cloud.aws.proxy_port property should be set to the port on which the proxy listens. discovery.ec2.ping_timeout (the default: 3s): This is the time to wait for the response for the ping message sent to the other node. After this time, the nonresponsive node will be considered dead and removed from the cluster. Increasing this value makes sense when dealing with network issues or we have a lot of EC2 nodes. The EC2 nodes scanning configuration The last group of settings we want to mention allows us to configure a very important thing when building cluster working inside the EC2 environment—the ability to filter available Elasticsearch nodes in our Amazon Elastic Cloud Computing network. The Elasticsearch EC2 plugin exposes the following properties that can help us configure its behavior: discovery.ec2.host_type: This allows us to choose the host type that will be used to communicate with other nodes in the cluster. The values we can use are private_ip (the default one; the private IP address will be used for communication), public_ip (the public IP address will be used for communication), private_dns (the private hostname will be used for communication), and public_dns (the public hostname will be used for communication). discovery.ec2.groups: This is a comma-separated list of security groups. Only nodes that fall within these groups can be discovered and included in the cluster. discovery.ec2.availability_zones: This is array or command-separated list of availability zones. Only nodes with the specified availability zones will be discovered and included in the cluster. discovery.ec2.any_group (this defaults to true): Setting this property to false will force the EC2 discovery plugin to discover only those nodes that reside in an Amazon instance that falls into all of the defined security groups. The default value requires only a single group to be matched. discovery.ec2.tag: This is a prefix for a group of EC2-related settings. When you launch your Amazon EC2 instances, you can define tags, which can describe the purpose of the instance, such as the customer name or environment type. Then, you use these defined settings to limit discovery nodes. Let's say you define a tag named environment with a value of qa. In the configuration, you can now specify the following: discovery.ec2.tag.environment: qa and only nodes running on instances with this tag will be considered for discovery. cloud.node.auto_attributes: When this is set to true, Elasticsearch will add EC2-related node attributes (such as the availability zone or group) to the node properties and will allow us to use them, adjusting the Elasticsearch shard allocation and configuring the shard placement. Other discovery implementations The Zen discovery and EC2 discovery are not the only discovery types that are available. There are two more discovery types that are developed and maintained by the Elasticsearch team, and these are: Azure discovery: https://github.com/elasticsearch/elasticsearch-cloud-azure Google Compute Engine discovery: https://github.com/elasticsearch/elasticsearch-cloud-gce In addition to these, there are a few discovery implementations provided by the community, such as the ZooKeeper discovery for older versions of Elasticsearch (https://github.com/sonian/elasticsearch-zookeeper). The gateway and recovery configuration The gateway module allows us to store all the data that is needed for Elasticsearch to work properly. This means that not only is the data in Apache Lucene indices stored, but also all the metadata (for example, index allocation settings), along with the mappings configuration for each index. Whenever the cluster state is changed, for example, when the allocation properties are changed, the cluster state will be persisted by using the gateway module. When the cluster is started up, its state will be loaded using the gateway module and applied. One should remember that when configuring different nodes and different gateway types, indices will use the gateway type configuration present on the given node. If an index state should not be stored using the gateway module, one should explicitly set the index gateway type to none. The gateway recovery process Let's say explicitly that the recovery process is used by Elasticsearch to load the data stored with the use of the gateway module in order for Elasticsearch to work. Whenever a full cluster restart occurs, the gateway process kicks in to load all the relevant information we've mentioned—the metadata, the mappings, and of course, all the indices. When the recovery process starts, the primary shards are initialized first, and then, depending on the replica state, they are initialized using the gateway data, or the data is copied from the primary shards if the replicas are out of sync. Elasticsearch allows us to configure when the cluster data should be recovered using the gateway module. We can tell Elasticsearch to wait for a certain number of master eligible or data nodes to be present in the cluster before starting the recovery process. However, one should remember that when the cluster is not recovered, all the operations performed on it will not be allowed. This is done in order to avoid modification conflicts. Configuration properties Before we continue with the configuration, we would like to say one more thing. As you know, Elasticsearch nodes can play different roles—they can have a role of data nodes—the ones that hold data—they can have a master role, or they can be only used for request handing, which means not holding data and not being master eligible. Remembering all this, let's now look at the gateway configuration properties that we are allowed to modify: gateway.recover_after_nodes: This is an integer number that specifies how many nodes should be present in the cluster for the recovery to happen. For example, when set to 5, at least 5 nodes (doesn't matter whether they are data or master eligible nodes) must be present for the recovery process to start. gateway.recover_after_data_nodes: This is an integer number that allows us to set how many data nodes should be present in the cluster for the recovery process to start. gateway.recover_after_master_nodes: This is another gateway configuration option that allows us to set how many master eligible nodes should be present in the cluster for the recovery to start. gateway.recover_after_time: This allows us to set how much time to wait before the recovery process starts after the conditions defined by the preceding properties are met. If we set this property to 5m, we tell Elasticsearch to start the recovery process 5 minutes after all the defined conditions are met. The default value for this property is 5m, starting from Elasticsearch 1.3.0. Let's imagine that we have six nodes in our cluster, out of which four are data eligible. We also have an index that is built of three shards, which are spread across the cluster. The last two nodes are master eligible and they don't hold the data. What we would like to configure is the recovery process to be delayed for 3 minutes after the four data nodes are present. Our gateway configuration could look like this: gateway.recover_after_data_nodes: 4gateway.recover_after_time: 3m Expectations on nodes In addition to the already mentioned properties, we can also specify properties that will force the recovery process of Elasticsearch. These properties are: gateway.expected_nodes: This is the number of nodes expected to be present in the cluster for the recovery to start immediately. If you don't need the recovery to be delayed, it is advised that you set this property to the number of nodes (or at least most of them) with which the cluster will be formed from, because that will guarantee that the latest cluster state will be recovered. gateway.expected_data_nodes: This is the number of expected data eligible nodes to be present in the cluster for the recovery process to start immediately. gateway.expected_master_nodes: This is the number of expected master eligible nodes to be present in the cluster for the recovery process to start immediately. Now, let's get back to our previous example. We know that when all six nodes are connected and are in the cluster, we want the recovery to start. So, in addition to the preceeding configuration, we would add the following property: gateway.expected_nodes: 6 So the whole configuration would look like this: gateway.recover_after_data_nodes: 4gateway.recover_after_time: 3mgateway.expected_nodes: 6 The preceding configuration says that the recovery process will be delayed for 3 minutes once four data nodes join the cluster and will begin immediately after six nodes are in the cluster (doesn't matter whether they are data nodes or master eligible nodes). The local gateway With the release of Elasticsearch 0.20 (and some of the releases from 0.19 versions), all the gateway types, apart from the default local gateway type, were deprecated. It is advised that you do not use them, because they will be removed in future versions of Elasticsearch. This is still not the case, but if you want to avoid full data reindexation, you should only use the local gateway type, and this is why we won't discuss all the other types. The local gateway type uses a local storage available on a node to store the metadata, mappings, and indices. In order to use this gateway type and the local storage available on the node, there needs to be enough disk space to hold the data with no memory caching. The persistence to the local gateway is different from the other gateways that are currently present (but deprecated). The writes to this gateway are done in a synchronous manner in order to ensure that no data will be lost during the write process. In order to set the type of gateway that should be used, one should use the gateway.type property, which is set to local by default. There is one additional thing regarding the local gateway of Elasticsearch that we didn't talk about—dangling indices. When a node joins a cluster, all the shards and indices that are present on the node, but are not present in the cluster, will be included in the cluster state. Such indices are called dangling indices, and we are allowed to choose how Elasticsearch should treat them. Elasticsearch exposes the gateway.local.auto_import_dangling property, which can take the value of yes (the default value that results in importing all dangling indices into the cluster), close (results in importing the dangling indices into the cluster state but keeps them closed by default), and no (results in removing the dangling indices). When setting the gateway.local.auto_import_dangling property to no, we can also set the gateway.local.dangling_timeout property (defaults to 2h) to specify how long Elasticsearch will wait while deleting the dangling indices. The dangling indices feature can be nice when we restart old Elasticsearch nodes, and we don't want old indices to be included in the cluster. Low-level recovery configuration We discussed that we can use the gateway to configure the behavior of the Elasticsearch recovery process, but in addition to that, Elasticsearch allows us to configure the recovery process itself. However, we decided that it would be good to mention the properties we can use in the section dedicated to gateway and recovery. Cluster- level recovery configuration The recovery configuration is specified mostly on the cluster level and allows us to set general rules for the recovery module to work with. These settings are: indices.recovery.concurrent_streams: This defaults to 3 and specifies the number of concurrent streams that are allowed to be opened in order to recover a shard from its source. The higher the value of this property, the more pressure will be put on the networking layer; however, the recovery may be faster, depending on your network usage and throughput. indices.recovery.max_bytes_per_sec: By default, this is set to 20MB and specifies the maximum number of data that can be transferred during shard recovery per second. In order to disable data transfer limiting, one should set this property to 0. Similar to the number of concurrent streams, this property allows us to control the network usage of the recovery process. Setting this property to higher values may result in higher network utilization and a faster recovery process. indices.recovery.compress: This is set to true by default and allows us to define whether ElasticSearch should compress the data that is transferred during the recovery process. Setting this to false may lower the pressure on the CPU, but it will also result in more data being transferred over the network. indices.recovery.file_chunk_size: This is the chunk size used to copy the shard data from the source shard. By default, it is set to 512KB and is compressed if the indices.recovery.compress property is set to true. indices.recovery.translog_ops: This defaults to 1000 and specifies how many transaction log lines should be transferred between shards in a single request during the recovery process. indices.recovery.translog_size: This is the chunk size used to copy the shard transaction log data from the source shard. By default, it is set to 512KB and is compressed if the indices.recovery.compress property is set to true. In the versions prior to Elasticsearch 0.90.0, there was the indices.recovery.max_size_per_sec property that could be used, but it was deprecated, and it is suggested that you use the indices.recovery.max_bytes_per_sec property instead. However, if you are using an Elasticsearch version older than 0.90.0, it may be worth remembering this. All the previously mentioned settings can be updated using the Cluster Update API, or they can be set in the elasticsearch.yml file. Index-level recovery settings In addition to the values mentioned previously, there is a single property that can be set on a per-index basis. The property can be set both in the elasticsearch.yml file and using the indices Update Settings API, and it is called index.recovery.initial_shards. In general, Elasticsearch will only recover a particular shard when there is a quorum of shards present and if that quorum can be allocated. A quorum is 50 percent of the shards for the given index plus one. By using the index.recovery.initial_shards property, we can change what Elasticsearch will take as a quorum. This property can be set to the one of the following values: quorum: 50 percent, plus one shard needs to be present and be allocable. This is the default value. quorum-1: 50 percent of the shards for a given index need to be present and be allocable. full: All of the shards for the given index need to be present and be allocable. full-1: 100 percent minus one shards for the given index need to be present and be allocable. integer value: Any integer such as 1, 2, or 5 specifies the number of shards that are needed to be present and that can be allocated. For example, setting this value to 2 will mean that at least two shards need to be present and Elasticsearch needs at least 2 shards to be allocable. It is good to know about this property, but in most cases, the default value will be sufficient for most deployments. Summary In this article, we focused more on the Elasticsearch configuration and new features that were introduced in Elasticsearch 1.0. We configured discovery and recovery, and we used the human-friendly Cat API. In addition to that, we used the backup and restore functionality, which allowed easy backup and recovery of our indices. Finally, we looked at what federated search is and how to search and index data to multiple clusters, while still using all the functionalities of Elasticsearch and being connected to a single node. If you want to dig deeper, buy the book Mastering Elasticsearch, Second Edition and read in a simple step-by-step fashion using Elasticsearch to enhance your knowlege further. Resources for Article: Further resources on this subject: Downloading and Setting Up ElasticSearch [Article] Indexing the Data [Article] Driving Visual Analyses with Automobile Data (Python) [Article]
Read more
  • 0
  • 0
  • 5417
Modal Close icon
Modal Close icon