Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-factor-variables-r
Packt
01 Apr 2015
7 min read
Save for later

Factor variables in R

Packt
01 Apr 2015
7 min read
This article by Jaynal Abedin and Kishor Kumar Das, authors of the book Data Manipulation with R Second Edition, will discuss factor variables in R. In any data analysis task, the majority of the time is dedicated to data cleaning and preprocessing. Sometimes, it is considered that about 80 percent of the effort is devoted to data cleaning before conducting the actual analysis. Also, in real-world data, we often work with categorical variables. A variable that takes only a limited number of distinct values is usually known as a categorical variable, and in R, it is known as a factor. Working with categorical variables in R is a bit technical, and in this article, we have tried to demystify this process of dealing with categorical variables. (For more resources related to this topic, see here.) During data analysis, the factor variable sometimes plays an important role, particularly in studying the relationship between two categorical variables. In this section, we will see some important aspects of factor manipulation. When a factor variable is first created, it stores all its levels along with the factor. But if we take any subset of that factor variable, it inherits all its levels from the original factor levels. This feature sometimes creates confusion in understanding the results. Numeric variables are convenient during statistical analysis, but sometimes, we need to create categorical (factor) variables from numeric variables. We can create a limited number of categories from a numeric variable using a series of conditional statements, but this is not an efficient way to perform this operation. In R, cut is a generic command to create factor variables from numeric variables. The split-apply-combine strategy Data manipulation is an integral part of data cleaning and analysis. For large data, it is always preferable to perform the operation within a subgroup of a dataset to speed up the process. In R, this type of data manipulation can be done with base functionality, but for large-scale data, it requires considerable amount of coding and eventually takes a longer time to process. In the case of big data, we can split the dataset, perform the manipulation or analysis, and then again combine the results into a single output. This type of split using base R is not efficient, and to overcome this limitation, Wickham developed an R package, plyr, where he efficiently implemented the split-apply-combine strategy. Often, we require similar types of operations in different subgroups of a dataset, such as group-wise summarization, standardization, and statistical modeling. This type of task requires us to break down a big problem into manageable pieces, perform operations on each piece separately, and finally combine the output of each piece into a single piece of output. To understand the split-apply-combine strategy intuitively, we can compare it with the map-reduce strategy for processing large amounts of data, recently popularized by Google. In the map-reduce strategy, the map step corresponds to split and apply and the reduce step consists of combining. The map-reduce approach is primarily designed to deal with a highly parallel environment where the work has been done by several hundreds or thousands of computers independently. The split-apply-combine strategy creates an opportunity to see the similarities of problems across subgroups that were not previously connected. This strategy can be used in many existing tools, such as the GROUP BY operation in SAS, PivotTable in MS Excel, and the SQL GROUP BY operator. The plyr package works on every type of data structure, whereas the dplyr package is designed to work only on data frames. The dplyr package offers a complete set of functions to perform every kind of data manipulation we would need in the process of analysis. These functions take a data frame as the input and also produce a data frame as the output, hence the name dplyr. There are two different types of functions in the dplyr package: single-table and aggregate. The single-table function takes a data frame as the input and an action such as subsetting a data frame, generating new columns in the data frame, or rearranging a data frame. The aggregate function takes a column as the input and produces a single value as the output, which is mostly used to summarize columns. These functions do not allow us to perform any group-wise operation, but a combination of these functions with the group_by() function allows us to implement the split-apply-combine approach. Reshaping a dataset Reshaping data is a common and tedious task in real-life data manipulation and analysis. A dataset might come with different levels of grouping, and we need to implement some reorientation to perform certain types of analyses. A dataset's layout could be long or wide. In a long layout, multiple rows represent a single subject's record, whereas in a wide layout, a single row represents a single subject's record. Statistical analysis sometimes requires wide data and sometimes long data, and in such cases, we need to be able to fluently and fluidly reshape the data to meet the requirements of statistical analysis. Data reshaping is just a rearrangement of the form of the data—it does not change the content of the dataset. In this article, we will show you different layouts of the same dataset and see how they can be transferred from one layout to another. This article mainly highlights the melt and cast paradigm of reshaping datasets, which is implemented in the reshape contributed package. Later on, this same package is reimplemented with a new name, reshape2, which is much more time and memory efficient. A single dataset can be rearranged in many different ways, but before going into rearrangement, let's look back at how we usually perceive a dataset. Whenever we think about any dataset, we think of a two-dimensional arrangement where a row represents a subject's (a subject could be a person and is typically the respondent in a survey) information for all the variables in a dataset, and a column represents the information for each characteristic for all subjects. This means that rows indicate records and columns indicate variables, characteristics, or attributes. This is the typical layout of a dataset. In this arrangement, one or more variables might play a role as an identifier, and others are measured characteristics. For the purpose of reshaping, we can group the variables into two groups: identifier variables and measured variables: The identifier variables: These help us identify the subject from whom we took information on different characteristics. Typically, identifier variables are qualitative in nature and take a limited number of unique values. In database terminology, an identifier is termed as the primary key, and this can be a single variable or a composite of multiple variables. The measured variables: These are those characteristics whose information we took from a subject of interest. These can be qualitative, quantitative, or a mixture of both. Now, beyond this typical structure of a dataset, we can think differently, where we will have only identification variables and a value. The identification variable identifies a subject along with which the measured variable the value represents. In this new paradigm, each row represents one observation of one variable. In the new paradigm, this is termed as melting and it produces molten data. The difference between this new layout of the data and the typical layout is that it now contains only the ID variable and a new column, value, which represents the value of that observation. Text processing Text data is one of the most important areas in the field of data analytics. Nowadays, we are producing a huge amount of text data through various media every day; for example, Twitter posts, blog writing, and Facebook posts are all major sources of text data. Text data can be used to retrieve information, in sentiment analysis and even entity recognition. Summary This article briefly explained the factor variables, the split-apply-combine strategy, reshaping a dataset in R, and text processing. Resources for Article: Further resources on this subject: Introduction to S4 Classes [Article] Warming Up [Article] Driving Visual Analyses with Automobile Data (Python) [Article]
Read more
  • 0
  • 0
  • 5065

article-image-optimizing-javascript-ios-hybrid-apps
Packt
01 Apr 2015
17 min read
Save for later

Optimizing JavaScript for iOS Hybrid Apps

Packt
01 Apr 2015
17 min read
In this article by Chad R. Adams, author of the book, Mastering JavaScript High Performance, we are going to take a look at the process of optimizing JavaScript for iOS web apps (also known as hybrid apps). We will take a look at some common ways of debugging and optimizing JavaScript and page performance, both in a device's web browser and a standalone app's web view. Also, we'll take a look at the Apple Web Inspector and see how to use it for iOS development. Finally, we will also gain a bit of understanding about building a hybrid app and learn the tools that help to better build JavaScript-focused apps for iOS. Moreover, we'll learn about a class that might help us further in this. We are going to learn about the following topics in the article: Getting ready for iOS development iOS hybrid development (For more resources related to this topic, see here.) Getting ready for iOS development Before starting this article with Xcode examples and using iOS Simulator, I will be displaying some native code and will use tools that haven't been covered in this course. Mobile app developments, regardless of platform, are books within themselves. When covering the build of the iOS project, I will be briefly going over the process of setting up a project and writing non-JavaScript code to get our JavaScript files into a hybrid iOS WebView for development. This is essential due to the way iOS secures its HTML5-based apps. Apps on iOS that use HTML5 can be debugged, either from a server or from an app directly, as long as the app's project is built and deployed in its debug setting on a host system (meaning the developers machine). Readers of this article are not expected to know how to build a native app from the beginning to the end. And that's completely acceptable, as you can copy-and-paste, and follow along as I go. But I will show you the code to get us to the point of testing JavaScript code, and the code used will be the smallest and the fastest possible to render your content. All of these code samples will be hosted as an Xcode project solution of some type on Packt Publishing's website, but they will also be shown here if you want to follow along, without relying on code samples. Now with that said, lets get started… iOS hybrid development Xcode is the IDE provided by Apple to develop apps for both iOS devices and desktop devices for Macintosh systems. As a JavaScript editor, it has pretty basic functions, but Xcode should be mainly used in addition to a project's toolset for JavaScript developers. It provides basic code hinting for JavaScript, HTML, and CSS, but not more than that. To install Xcode, we will need to start the installation process from the Mac App Store. Apple, in recent years, has moved its IDE to the Mac App Store for faster updates to developers and subsequently app updates for iOS and Mac applications. Installation is easy; simply log in with your Apple ID in the Mac App Store and download Xcode; you can search for it at the top or, if you look in the right rail among popular free downloads, you can find a link to the Xcode Mac App Store page. Once you reach this, click Install as shown in the following screenshot: It's important to know that, for the sake of simplicity in this article, we will not deploy an app to a device; so if you are curious about it, you will need to be actively enrolled in Apple's Developer Program. The cost is 99 dollars a year, or 299 dollars for an enterprise license that allows deployment of an app outside the control of the iOS App Store. If you're curious to learn more about deploying to a device, the code in this article will run on the device assuming that your certificates are set up on your end. For more information on this, check out Apple's iOS Developer Center documentation online at https://developer.apple.com/library/ios/documentation/IDEs/Conceptual/AppDistributionGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40012582. Once it's installed, we can open up Xcode and look at the iOS Simulator; we can do this by clicking XCode, followed by Open Developer Tool, and then clicking on iOS Simulator. Upon first opening iOS Simulator, we will see what appears to be a simulation of an iOS device, shown in the next screenshot. Note that this is a simulation, not a real iOS device (even if it feels pretty close). A neat trick for JavaScript developers working with local HTML files outside an app is that they can quickly drag-and-drop an HTML file. Due to this, the simulator will open the mobile version of Safari, the built-in browser for iPhone and iPads, and render the page as it would do on an iOS device; this is pretty helpful when testing pages before deploying them to a web server. Setting up a simple iOS hybrid app JavaScript performance on a built-in hybrid application can be much slower than the same page run on the mobile version of Safari. To test this, we are going to build a very simple web browser using Apple's new programming language Swift. Swift is an iOS-ready language that JavaScript developers should feel at home with. Swift itself follows a syntax similar to JavaScript but, unlike JavaScript, variables and objects can be given types allowing for stronger, more accurate coding. In that regard, Swift follows syntax similar to what can be seen in the ECMAScript 6 and TypeScript styles of coding practice. If you are checking these newer languages out, I encourage you to check out Swift as well. Now let's create a simple web view, also known as a UIWebView, which is the class used to create a web view in an iOS app. First, let's create a new iPhone project; we are using an iPhone to keep our app simple. Open Xcode and select the Create new XCode project project; then, as shown in the following screenshot, select the Single View Application option and click the Next button. On the next view of the wizard, set the product name as JS_Performance, the language to Swift, and the device to iPhone; the organization name should autofill with your name based on your account name in the OS. The organization identifier is a reverse domain name unique identifier for our app; this can be whatever you deem appropriate. For instructional purposes, here's my setup: Once your project names are set, click the Next button and save to a folder of your choice with Git repository left unchecked. When that's done, select Main.storyboard under your Project Navigator, which is found in the left panel. We should be in the storyboard view now. Let's open the Object Library, which can be found in the lower-right panel in the subtab with an icon of a square inside a circle. Search for Web View in the Object Library in the bottom-right search bar, and then drag that to the square view that represents our iOS view. We need to consider two more things before we link up an HTML page using Swift; we need to set constraints as native iOS objects will be stretched to fit various iOS device windows. To fill the space, you can add the constraints by selecting the UIWebView object and pressing Command + Option + Shift + = on your Mac keyboard. Now you should see a blue border appear briefly around your UIWebView. Lastly, we need to connect our UIWebView to our Swift code; for this, we need to open the Assistant Editor by pressing Command + Option + Return on the keyboard. We should see ViewController.swift open up in a side panel next to our Storyboard. To link this as a code variable, right-click (or option-click the UIWebView object) and, with the button held down, drag the UIWebView to line number 12 in the ViewController.swift code in our Assistant Editor. This is shown in the following diagram: Once that's done, a popup will appear. Now leave everything the same as it comes up, but set the name to webview; this will be the variable referencing our UIWebView. With that done, save your Main.storyboard file and navigate to your ViewController.swift file. Now take a look at the Swift code shown in the following screenshot, and copy it into the project; the important part is on line 19, which contains the filename and type loaded into the web view; which in this case, this is index.html. Now obviously, we don't have an index.html file, so let's create one. Go to File and then select New followed by the New File option. Next, under iOS select Empty Application and click Next to complete the wizard. Save the file as index.html and click Create. Now open the index.html file, and type the following code into the HTML page: <br />Hello <strong>iOS</strong> Now click Run (the play button in the main iOS task bar), and we should see our HTML page running inside our own app, as shown here: That's nice work! We built an iOS app with Swift (even if it's a simple app). Let's create a structured HTML page; we will override our Hello iOS text with the HTML shown in the following screenshot: Here, we use the standard console.time function and print a message to our UIWebView page when finished; if we hit Run in Xcode, we will see the Loop Completed message on load. But how do we get our performance information? How can we get our console.timeEnd function code on line 14 on our HTML page? Using Safari web inspector for JavaScript performance Apple does provide a Web Inspector for UIWebViews, and it's the same inspector for desktop Safari. It's easy to use, but has an issue: the inspector only works on iOS Simulators and devices that have started from an Xcode project. This limitation is due to security concerns for hybrid apps that may contain sensitive JavaScript code that could be exploited if visible. Let's check our project's embedded HTML page console. First, open desktop Safari on your Mac and enable developer mode. Launch the Preferences option. Under the Advanced tab, ensure that the Show develop menu in menu bar option is checked, as shown in the following screenshot: Next, let's rerun our Xcode project, start up iOS Simulator and then rerun our page. Once our app is running with the Loop Completed result showing, open desktop Safari and click Develop, then iOS Simulator, followed by index.html. If you look closely, you will see iOS simulator's UIWebView highlighted in blue when you place the mouse over index.html; a visible page is seen as shown in the following screenshot: Once we release the mouse on index.html, we Safari's Web Inspector window appears featuring our hybrid iOS app's DOM and console information. The Safari's Web Inspector is pretty similar to Chrome's Developer tools in terms of feature sets; the panels used in the Developer tools also exist as icons in Web Inspector. Now let's select the Console panel in Web Inspector. Here, we can see our full console window including our Timer console.time function test included in the for loop. As we can see in the following screenshot, the loop took 0.081 milliseconds to process inside iOS. Comparing UIWebView with Mobile Safari What if we wanted to take our code and move it to Mobile Safari to test? This is easy enough; as mentioned earlier in the article, we can drag-and-drop the index.html file into our iOS Simulator, and then the OS will open the mobile version of Safari and load the page for us. With that ready, we will need to reconnect Safari Web Inspector to the iOS Simulator and reload the page. Once that's done, we can see that our console.time function is a bit faster; this time it's roughly 0.07 milliseconds, which is a full .01 milliseconds faster than UIWebView, as shown here: For a small app, this is minimal in terms of a difference in performance. But, as an application gets larger, the delay in these JavaScript processes gets longer and longer. We can also debug the app using the debugging inspector in the Safari's Web Inspector tool. Click Debugger in the top menu panel in Safari's Web Inspector. We can add a break point to our embedded script by clicking a line number and then refreshing the page with Command + R. In the following screenshot, we can see the break occurring on page load, and we can see our scope variable displayed for reference in the right panel: We can also check page load times using the timeline inspector. Click Timelines at the top of the Web Inspector and now we will see a timeline similar to the Resources tab found in Chrome's Developer tools. Let's refresh our page with Command + R on our keyboard; the timeline then processes the page. Notice that after a few seconds, the timeline in the Web Inspector stops when the page fully loads, and all JavaScript processes stop. This is a nice feature when you're working with the Safari Web Inspector as opposed to Chrome's Developer tools. Common ways to improve hybrid performance With hybrid apps, we can use all the techniques for improving performance using a build system such as Grunt.js or Gulp.js with NPM, using JSLint to better optimize our code, writing code in an IDE to create better structure for our apps, and helping to check for any excess code or unused variables in our code. We can use best performance practices such as using strings to apply an HTML page (like the innerHTML property) rather than creating objects for them and applying them to the page that way, and so on. Sadly, the fact that hybrid apps do not perform as well as native apps still holds true. Now, don't let that dismay you as hybrid apps do have a lot of good features! Some of these are as follows: They are (typically) faster to build than using native code They are easier to customize They allow for rapid prototyping concepts for apps They are easier to hand off to other JavaScript developers rather than finding a native developer They are portable; they can be reused for another platform (with some modification) for Android devices, Windows Modern apps, Windows Phone apps, Chrome OS, and even Firefox OS They can interact with native code using helper libraries such as Cordova At some point, however, application performance will be limited to the hardware of the device, and it's recommended you move to native code. But, how do we know when to move? Well, this can be done using Color Blended Layers. The Color Blended Layers option applies an overlay that highlights slow-performing areas on the device display, for example, green for good performance and red for slow performance; the darker the color is, the more impactful will be the performance result. Rerun your app using Xcode and, in the Mac OS toolbar for iOS Simulator, select Debug and then Color Blended Layers. Once we do that, we can see that our iOS Simulator shows a green overlay; this shows us how much memory iOS is using to process our rendered view, both native and non-native code, as shown here: Currently, we can see a mostly green overlay with the exception of the status bar elements, which take up more render memory as they overlay the web view and have to be redrawn over that object repeatedly. Let's make a copy of our project and call it JS_Performance_CBL, and let's update our index.html code with this code sample, as shown in the following screenshot: Here, we have a simple page with an empty div; we also have a button with an onclick function called start. Our start function will update the height continuously using the setInterval function, increasing the height every millisecond. Our empty div also has a background gradient assigned to it with an inline style tag. CSS background gradients are typically a huge performance drain on mobile devices as they can potentially re-render themselves over and over as the DOM updates itself. Some other issues include listener events; some earlier or lower-end devices do not have enough RAM to apply an event listener to a page. Typically, it's a good practice to apply onclick attributes to HTML either inline or through JavaScript. Going back to the gradient example, let's run this in iOS Simulator and enable Color Blended Layers after clicking our HTML button to trigger the JavaScript animation. As expected, our div element that we've expanded now has a red overlay indicating that this is a confirmed performance issue, which is unavoidable. To correct this, we would need to remove the CSS gradient background, and it would show as green again. However, if we had to include a gradient in accordance with a design spec, a native version would be required. When faced with UI issues such as these, it's important to understand tools beyond normal developer tools and Web Inspectors, and take advantage of the mobile platform tools that provide better analysis of our code. Now, before we wrap this article, let's take note of something specific for iOS web views. The WKWebView framework At the time of writing, Apple has announced the WebKit framework, a first-party iOS library intended to replace UIWebView with more advanced and better performing web views; this was done with the intent of replacing apps that rely on HTML5 and JavaScript with better performing apps as a whole. The WebKit framework, also known in developer circles as WKWebView, is a newer web view that can be added to a project. WKWebView is also the base class name for this framework. This framework includes many features that native iOS developers can take advantage of. These include listening for function calls that can trigger native Objective-C or Swift code. For JavaScript developers like us, it includes a faster JavaScript runtime called Nitro, which has been included with Mobile Safari since iOS6. Hybrid apps have always run worse that native code. But with the Nitro JavaScript runtime, HTML5 has equal footing with native apps in terms of performance, assuming that our view doesn't consume too much render memory as shown in our color blended layers example. WKWebView does have limitations though; it can only be used for iOS8 or higher and it doesn't have built-in Storyboard or XIB support like UIWebView. So, using this framework may be an issue if you're new to iOS development. Storyboards are simply XML files coded in a specific way for iOS user interfaces to be rendered, while XIB files are the precursors to Storyboard. XIB files allow for only one view whereas Storyboards allow multiple views and can link between them too. If you are working on an iOS app, I encourage you to reach out to your iOS developer lead and encourage the use of WKWebView in your projects. For more information, check out Apple's documentation of WKWebView at their developer site at https://developer.apple.com/library/IOs/documentation/WebKit/Reference/WKWebView_Ref/index.html. Summary In this article, we learned the basics of creating a hybrid-application for iOS using HTML5 and JavaScript; we learned about connecting the Safari Web Inspector to our HTML page while running an application in iOS Simulator. We also looked at Color Blended Layers for iOS Simulator, and saw how to test for performance from our JavaScript code when it's applied to device-rendering performance issues. Now we are down to the wire. As for all JavaScript web apps before they go live to a production site, we need to smoke-test our JavaScript and web app code and see if we need to perform any final improvements before final deployment. Resources for Article: Further resources on this subject: GUI Components in Qt 5 [article] The architecture of JavaScriptMVC [article] JavaScript Promises – Why Should I Care? [article]
Read more
  • 0
  • 0
  • 9664

article-image-introduction-testing-angularjs-directives
Packt
01 Apr 2015
14 min read
Save for later

An introduction to testing AngularJS directives

Packt
01 Apr 2015
14 min read
In this article by Simon Bailey, the author of AngularJS Testing Cookbook, we will cover the following recipes: Starting with testing directives Setting up templateUrl Searching elements using selectors Accessing basic HTML content Accessing repeater content (For more resources related to this topic, see here.) Directives are the cornerstone of AngularJS and can range in complexity providing the foundation to many aspects of an application. Therefore, directives require comprehensive tests to ensure they are interacting with the DOM as intended. This article will guide you through some of the rudimentary steps required to embark on your journey to test directives. The focal point of many of the recipes revolves around targeting specific HTML elements and how they respond to interaction. You will learn how to test changes on scope based on a range of influences and finally begin addressing testing directives using Protractor. Starting with testing directives Testing a directive involves three key steps that we will address in this recipe to serve as a foundation for the duration of this article: Create an element. Compile the element and link to a scope object. Simulate the scope life cycle. Getting ready For this recipe, you simply need a directive that applies a scope value to the element in the DOM. For example: angular.module('chapter5', []) .directive('writers', function() {    return {      restrict: 'E',      link: function(scope, element) {        element.text('Graffiti artist: ' + scope.artist);      }    }; }); How to do it… First, create three variables accessible across all tests:     One for the element: var element;     One for scope: var scope;     One for some dummy data to assign to a scope value: var artist = 'Amara Por Dios'; Next, ensure that you load your module: beforeEach(module('chapter5')); Create a beforeEach function to inject the necessary dependencies and create a new scope instance and assign the artist to a scope: beforeEach(inject(function ($rootScope, $compile) { scope = $rootScope.$new(); scope.artist = artist; })); Next, within the beforeEach function, add the following code to create an Angular element providing the directive HTML string: element = angular.element('<writers></writers>'); Compile the element providing our scope object: $compile(element)(scope); Now, call $digest on scope to simulate the scope life cycle: scope.$digest(); Finally, to confirm whether these steps work as expected, write a simple test that uses the text() method available on the Angular element. The text() method will return the text contents of the element, which we then match against our artist value: it('should display correct text in the DOM', function() { expect(element.text()).toBe('Graffiti artist: ' + artist); }); Here is what your code should look like to run the final test: var scope; var element; var artist;   beforeEach(module('chapter5'));   beforeEach(function() { artist = 'Amara Por Dios'; });   beforeEach(inject(function($compile) { element = angular.element('<writers></writers>'); scope.artist = artist; $compile(element)(scope); scope.$digest(); }));   it('should display correct text in the DOM', function() {    expect(element.text()).toBe('Graffiti artist: ' + artist); }); How it works… In step 4, the directive HTML tag is provided as a string to the angular.element function. The angular element function wraps a raw DOM element or an HTML string as a jQuery element if jQuery is available; otherwise, it defaults to using Angular's jQuery lite which is a subset of jQuery. This wrapper exposes a range of useful jQuery methods to interact with the element and its content (for a full list of methods available, visit https://docs.angularjs.org/api/ng/function/angular.element). In step 6, the element is compiled into a template using the $compile service. The $compile service can compile HTML strings into a template and produces a template function. This function can then be used to link the scope and the template together. Step 6 demonstrates just this, linking the scope object created in step 3. The final step to getting our directive in a testable state is in step 7 where we call $digest to simulate the scope life cycle. This is usually part of the AngularJS life cycle within the browser and therefore needs to be explicitly called in a test-based environment such as this, as opposed to end-to-end tests using Protractor. There's more… One beforeEach() method containing the logic covered in this recipe can be used as a reference to work from for the rest of this article: beforeEach(inject(function($rootScope, $compile) { // Create scope scope = $rootScope.$new(); // Replace with the appropriate HTML string element = angular.element('<deejay></deejay>'); // Replace with test scope data scope.deejay = deejay; // Compile $compile(element)(scope); // Digest scope.$digest(); })); See also The Setting up templateUrl recipe The Searching elements using selectors recipe The Accessing basic HTML content recipe The Accessing repeater content recipe Setting up templateUrl It's fairly common to separate the template content into an HTML file that can then be requested on demand when the directive is invoked using the templateUrl property. However, when testing directives that make use of the templateUrl property, we need to load and preprocess the HTML files to AngularJS templates. Luckily, the AngularJS team preempted our dilemma and provided a solution using Karma and the karma-ng-html2js-preprocessor plugin. This recipe will show you how to use Karma to enable us to test a directive that uses the templateUrl property. Getting ready For this recipe, you will need to ensure the following: You have installed Karma You installed the karma-ng-html2js-preprocessor plugin by following the instructions at https://github.com/karma-runner/karma-ng-html2js-preprocessor/blob/master/README.md#installation. You configured the karma-ng-html2js-preprocessor plugin by following the instructions at https://github.com/karma-runner/karma-ng-html2js-preprocessor/blob/master/README.md#configuration. Finally, you'll need a directive that loads an HTML file using templateUrl and for this example, we apply a scope value to the element in the DOM. Consider the following example: angular.module('chapter5', []) .directive('emcees', function() {    return {      restrict: 'E',      templateUrl: 'template.html',      link: function(scope, element) {        scope.emcee = scope.emcees[0];      }    }; }) An example template could be as simple as what we will use for this example (template.html): <h1>{{emcee}}</h1> How to do it… First, create three variables accessible across all tests:     One for the element: var element;     One for the scope: var scope;     One for some dummy data to assign to a scope value: var emcees = ['Roxanne Shante', 'Mc Lyte']; Next, ensure that you load your module: beforeEach(module('chapter5')); We also need to load the actual template. We can do this by simply appending the filename to the beforeEach function we just created in step 2: beforeEach(module('chapter5', 'template.html')); Next, create a beforeEach function to inject the necessary dependencies and create a new scope instance and assign the artist to a scope: beforeEach(inject(function ($rootScope, $compile) { scope = $rootScope.$new(); Scope.emcees = emcees; })); Within the beforeEach function, add the following code to create an Angular element providing the directive HTML string:    element = angular.element('<emcees></emcees>'); Compile the element providing our scope object: $compile(element)(scope); Call $digest on scope to simulate the scope life cycle: scope.$digest(); Next, create a basic test to establish that the text contained within the h1 tag is what we expect: it('should set the scope property id to the correct initial value', function () {}); Now, retrieve a reference to the h1 tag using the find() method on the element providing the tag name as the selector: var h1 = element.find('h1'); Finally, add the expectation that the h1 tag text matches our first emcee from the array we provided in step 4: expect(h1.text()).toBe(emcees[0]); You will see the following passing test within your console window: How it works… The karma-ng-html2js-preprocessor plugin works by converting HTML files into JS strings and generates AngularJS modules that we load in step 3. Once loaded, AngularJS makes these modules available by putting the HTML files into the $templateCache. There are libraries available to help incorporate this into your project build process, for example using Grunt or Gulp. There is a popular example specifically for Gulp at https://github.com/miickel/gulp-angular-templatecache. Now that the template is available, we can access the HTML content using the compiled element we created in step 5. In this recipe, we access the text content of the element using the find() method. Be aware that if using the smaller jQuery lite subset of jQuery, there are certain limitations compared to the full-blown jQuery version. The find() method in particular is limited to look up by tag name only. To read more about the find() method, visit the jQuery API documentation at http://api.jquery.com/find. See also The Starting with testing directives recipe Searching elements using selectors Directives, as you should know, attach special behavior to a DOM element. When AngularJS compiles and returns the element on which the directive is applied, it is wrapped by either jqLite or jQuery. This exposes an API on the element, offering many useful methods to query the element and its contents. In this recipe, you will learn how to use these methods to retrieve elements using selectors. Getting ready Follow the logic to define a beforeEach() function with the relevant logic to set up a directive as outlined in the Starting with testing directives recipe in this article. For this recipe, you can replicate the template that I suggested in the first recipe's There's more… section. For the purpose of this recipe, I tested against a property on scope named deejay: var deejay = { name: 'Shortee', style: 'turntablism' }; You can replace this with whatever code you have within the directive you're testing. How to do it… First, create a basic test to establish that the HTML code contained within an h2 tag is as we expected: it('should return an element using find()', function () {}); Next, retrieve a reference to the h2 tag using the find() method on the element providing the tag name as the selector: var h2 = element.find('h2'); Finally, we create an expectation that the element is actually defined: expect(h2[0]).toBeDefined(); How it works… In step 2, we use the find() method with the h2 selector to test against in step 3's expectation. Remember, the element returned is wrapped by jqLite or jQuery. Therefore, even if the element is not found, the object returned will have jQuery-specific properties; this means that we cannot run an expectation on the element alone being defined. A simple way to determine if the element itself is indeed defined is to access it via jQuery's internal array of DOM objects, typically the first. So, this is why in our recipe we run an expectation against element[0] as opposed to element itself. There's more… Here is an example using the querySelector() method. The querySelector() method is available on the actual DOM so we need to access it on an actual HTML element and not the jQuery wrapped element. The following code shows the selector we use in a CSS selector: it('should return an element using querySelector and css selector', function() { var elementByClass = element[0].querySelector('.deejay- style'); expect(elementByClass).toBeDefined(); }); Here is a another example using the querySelector() method that uses an id selector: it(should return an element using querySelector and id selector', function() { var elementByClass = element[0].querySelector(' #deejay_name'); expect(elementByClass).toBeDefined(); }); You can read more about the querySelector() method at https://developer.mozilla.org/en-US/docs/Web/API/document.querySelector. See also The Starting with testing directives recipe The Accessing basic HTML content recipe Accessing basic HTML content A substantial number of directive tests will involve interacting with the HTML content within the rendered HTML template. This recipe will teach you how to test whether a directive's HTML content is as expected. Getting ready Follow the logic to define a beforeEach() function with the relevant logic to set up a directive as outlined in the Starting with testing directives recipe in this article. For this recipe, you can replicate the template that I suggested in the first recipe's There's more… section. For the purpose of this recipe, I will test against a property on a scope named deejay: var deejay = { name: 'Shortee', style: 'turntablism' }; You can replace this with whatever code you have within the directive you're testing. How to do it… First, create a basic test to establish that the HTML code contained within a h2 tag is as we expected: it('should display correct deejay data in the DOM', function () {}); Next, retrieve a reference to the h2 tag using the find() method on the element providing the tag name as the selector: var h2 = element.find('h2'); Finally, using the html() method on the returned element from step 2, we can get the HTML contents within an expectation that the h2 tag HTML code matches our scope's deejay name: expect(h2.html()).toBe(deejay.name); How it works… We made heavy use of the jQuery (or jqLite) library methods available for our element. In step 2, we use the find() method with the h2 selector. This returns a match for us to further utilize in step 3, in our expectation where we access the HTML contents of the element using the html() method this time (http://api.jquery.com/html/). There's more… We could also run a similar expectation for text within our h2 element using the text() method (http://api.jquery.com/text/) on the element, for example: it('should retrieve text from <h2>', function() { var h2 = element.find('h2'); expect(h2.text()).toBe(deejay.name); }); See also The Starting with testing directives recipe The Searching elements using selectors recipe Accessing repeater content AngularJS facilitates generating repeated content with ease using the ngRepeat directive. In this recipe, we'll learn how to access and test repeated content. Getting ready Follow the logic to define a beforeEach() function with the relevant logic to set up a directive as outlined in the Starting with testing directives recipe in this article. For this recipe, you can replicate the template that I suggested in the first recipe's There's more… section. For the purpose of this recipe, I tested against a property on scope named breakers: var breakers = [{ name: 'China Doll' }, { name: 'Crazy Legs' }, { name: 'Frosty Freeze' }]; You can replace this with whatever code you have within the directive you're testing. How to do it… First, create a basic test to establish that the HTML code contained within the h2 tag is as we expected: it('should display the correct breaker name', function () {}); Next, retrieve a reference to the li tag using the find() method on the element providing the tag name as the selector: var list = element.find('li'); Finally, targeting the first element in the list, we retrieve the text content expecting it to match the first item in the breakers array: expect(list.eq(0).text()).toBe('China Doll'); How it works… In step 2, the find() method using li as the selector will return all the list items. In step 3, using the eq() method (http://api.jquery.com/eq/) on the returned element from step 2, we can get the HTML contents at a specific index, zero in this particular case. As the returned object from the eq() method is a jQuery object, we can call the text() method, which immediately after that will return the text content of the element. We can then run an expectation that the first li tag text matches the first breaker within the scope array. See also The Starting with testing directives recipe The Searching elements using selectors recipe The Accessing basic HTML content recipe Summary In this article you have learned to focus on testing changes within a directive based on interaction from either UI events or application updates to the model. Directives are one of the important jewels of AngularJS and can range in complexity. They can provide the foundation to many aspects of the application and therefore require comprehensive tests. Resources for Article: Further resources on this subject: The First Step [article] AngularJS Performance [article] Our App and Tool Stack [article]
Read more
  • 0
  • 0
  • 2076

article-image-woocommerce-basics
Packt
01 Apr 2015
16 min read
Save for later

WooCommerce Basics

Packt
01 Apr 2015
16 min read
In this article by Patrick Rauland, author of the book WooCommerce Cookbook, we will focus on the following topics: Installing WooCommerce Installing official WooThemes plugins Manually creating WooCommerce pages Creating a WooCommerce plugin (For more resources related to this topic, see here.) A few years ago, building an online store used to be an incredibly complex task. You had to install bulky software onto your own website and pay expensive developers a significant sum of money to customize even the simplest elements of your store. Luckily, nowadays, adding e-commerce functionality to your existing WordPress-powered website can be done by installing a single plugin. In this article, we'll go over the settings that you'll need to configure before launching your online store with WooCommerce. Most of the recipes in this article are simple to execute. We do, however, add a relatively complex recipe near the end of the article to show you how to create a plugin specifically for WooCommerce. If you're going to be customizing WooCommerce with code, it's definitely worth looking at that recipe to know the best way to customize WooCommerce without affecting other parts of your site. The recipes in this article form the very basics of setting up a store, installing plugins that enhance WooCommerce, and managing those plugins. There are recipes for official WooCommerce plugins written using WooThemes as well as a recipe for unofficial plugins. Feel free to select either one. In general, the official plugins are better supported, more up to date, and have more functionality than unofficial plugins. You could always try an unofficial plugin to see whether it meets your needs, and if it doesn't, then use an official plugin that is much more likely to meet your needs. At the end of this article, your store will be fully functional and ready to display products. Installing WooCommerce WooCommerce is a WordPress plugin, which means that you need to have WordPress running on your own server to add WooCommerce. The first step is to install WooCommerce. You could do this on an established website or a brand new website—it doesn't matter. Since e-commerce is more complex than your average plugin, there's more to the installation process than just installing the plugin. Getting ready Make sure you have the permissions necessary to install plugins on your WordPress site. The easiest way to have the correct permissions is to make sure your account on your WordPress site has the admin role. How to do it… There are two parts to this recipe. The first part is installing the plugin and the second step is adding the required pages to the site. Let's have a look at the following steps for further clarity: Log in to your WordPress site. Click on the Plugins menu. Click on the Add New menu item. These steps have been demonstrated visually in the following screenshot: Search for WooCommerce. Click on the Install Now button, as shown in the following screenshot: Once the plugin has been installed, click on the Activate Plugin button. You now have WooCommerce activated on your site, which means we're half way there. E-commerce platforms need to have certain pages (such as a cart page, a checkout page, an account page, and so on) to function. We need to add those to your site. Click on the Install WooCommerce Pages button, which appears after you've activated WooCommerce. This is demonstrated in the following screenshot: How it works… WordPress has an infrastructure that allows any WordPress site to install a plugin hosted on WordPress.org. This is a secure process that is managed by WordPress.org. Installing the WooCommerce pages allows all of the e-commerce functionality to run. Without installing the pages, WooCommerce won't know which page is the cart page or the checkout page. Once these pages are set up, we're ready to have a basic store up and running. If WordPress prompts you for FTP credentials when installing the plugin, that's likely to be a permissions issue with your web host. It is a huge pain if you have to enter FTP credentials every time you want to install or update a plugin, and it's something you should take care of. You can send this link to your web host provider so they know how to change their permissions. You can refer to http://www.chrisabernethy.com/why-wordpress-asks-connection-info/ for more information to resolve this WordPress issue. Installing official WooThemes plugins WooThemes doesn't just create the WooCommerce plugin. They also create standalone plugins and hundreds of extensions that add extra functionality to WooCommerce. The beauty of this system is that WooCommerce is very easy to use because users only add extra complexity when they need it. If you only need simple shipping options, you don't ever have to see the complex shipping settings. On the WooThemes website, you may browse for WooCommerce extensions, purchase them, and download and install them on your site. WooThemes has made the whole process very easy to maintain. They have built an updater similar to the one in WordPress, which, once configured, will allow a user to update a plugin with one click instead of having to through the whole plugin upload process again. Getting ready Make sure you have the necessary permissions to install plugins on your WordPress site. You also need to have a WooThemes product. There are several free WooThemes products including Pay with Amazon which you can find at http://www.woothemes.com/products/pay-with-amazon/. How to do it… There are two parts to this recipe. The first part is installing the plugin and the second step is adding your license for future updates. Follow these steps: Log in to http://www.woothemes.com. Click on the Downloads menu: Find the product you wish to download and click on the Download link for the product. You will see that you get a ZIP file. On your WordPress site, go the Plugins menu and click on Add New. Click on Upload Plugin. Select the file you just downloaded and click on the Install Now button. After the plugin has finished installing, click on the Activate Plugin link. You now have WooCommerce as well as a WooCommerce extension activated on your site. They're both functioning and will continue to function. You will, however, want to perform a few more steps to make sure it's easy to update your extensions: Once you have an extension activated on your site, you'll see a link in the WordPress admin: Install the WooThemes Updater plugin. Click on that link: The updater will be installed automatically. Once it is installed, you need to activate the updater. After activation, you'll see a new link in the WordPress admin: activate your product licenses. Click that link to go straight to the page where you can enter your licenses. You could also navigate to that page manually by going to Dashboard | WooThemes Helper from the menu. Keep your WordPress site open in one tab and log back in to your WooThemes account in another browser tab. On the WooThemes browser tab, go to My Licenses and you'll see a list of your products with a license key under the heading KEY: Copy the key, go back to your WordPress site, and enter it in the Licenses field. Click on the Activate Products button at the bottom of the page. The activation process can take a few seconds to complete. If you've successfully put in your key, you should see a message at the top of the screen saying so. How it works… A plugin that's not hosted on WordPress.org can't update without someone manually reuploading it. The WooThemes updater was built to make this process easier so you can press the update button and have your website do all the heavy lifting. Some websites sell official WooCommerce plugins without a license key. These sales aren't licensed and you won't be getting updates, bug fixes, or access to the support desk. With a regular website, it's important to stay up to date. However, with e-commerce, it's even more important since you'll be handling very sensitive payment information. That's why I wouldn't ever recommend using a plugin that can't update. Manually creating WooCommerce pages Every e-commerce platform will need to have some way of creating extra pages for e-commerce functionality, such as a cart page, a checkout page, an account page, and so on. WooCommerce prompts to helps you create these pages for you when you first install the plugin. So if you installed it correctly, you shouldn't have to do this. But if you were trying multiple e-commerce systems and for some reason deleted some pages, you may have to recreate those pages. How to do it… There's a very useful Tools menu in WooCommerce. It's a bit hard to find since you won't be needing it everyday, but it has some pretty useful tools if you ever need to do some troubleshooting. One of these tools is the one that allows you to recreate your WooCommerce pages. Let's have a look at how to use that tool: Log in to the WordPress admin. Click on WooCommerce | System Status: Click on Tools: Click on the Install Pages button: How it works… WooCommerce keeps track of which pages run e-commerce functionality. When you click on the Install Pages button, it checks which pages exist and if they don't exist, it will automatically create them for you. You could create them by creating new WordPress pages and then manually assigning each page with specific e-commerce functionality. You may want to do this if you already have a cart page and don't want to recreate a new cart page but just copy the content from the old page to the new page. All you want to do is tell WooCommerce which page should have the cart functionality. Let's have a look at the following manual settings: The Cart, Checkout, and Terms & Conditions page can all be set by going to WooCommerce | Settings | Checkout The My Account page can be set by going to WooCommerce | Settings | Accounts There's more... You can manually set some pages, such as the Cart and Checkout page, but you can't set subpages. WooCommerce uses a WordPress functionality called end points to create these subpages. Pages such as the Order Received page, which is displayed right after payment, can't be manually created. These endpoints are created on the fly based on the parent page. The Order Received page is part of the checkout process, so it's based on the Checkout page. Any content on the Checkout page will appear on both the Checkout page and on the Order Received page. You can't add content to the parent page without it affecting the subpage, but you can change the subpage URLs. The checkout endpoints can be configured by going to WooCommerce | Settings | Checkout | Checkout Endpoints. Creating a WooCommerce plugin Unlike a lot of hosted e-commerce solutions, WooCommerce is entirely customizable. That's one of the huge advantages for anyone who builds on open source software. If you don't like it, you can change it. At some point, you'll probably want to change something that's not on a settings page, and that's when you may want to dig into the code. Even if you don't know how to code, you may want to look this over so that when you work with a developer, you would know they're doing it the right way. Getting ready In addition to having admin access to a WordPress site, you'll also need FTP credentials so you can upload a plugin. You'll also need a text editor. Popular code editors include Sublime Text, Coda, Dreamweaver, and Atom. I personally use Atom. You could also use Notepad on a Windows machine or Text Edit on a Mac in a pinch. How to do it… We're going to be creating a plugin that interacts with WooCommerce. It will take the existing WooCommerce functionality and change it. These are the WooCommerce basics. If you build a plugin like this correctly, when WooCommerce isn't active, it won't do anything at all and won't slow down your website. Let's create a plugin by performing the following steps: Open your text editor and create a new file. Save the file as woocommerce-demo-plugin.php. In that file, add the opening PHP tag, which looks like this: <?php. On the next line, add a plugin header. This allows WordPress to recognize the file as a plugin so that it can be activated. It looks something like the following: /** * Plugin Name: WooCommerce Demo Plugin * Plugin URI: https://gist.github.com/BFTrick/3ab411e7cec43eff9769 * Description: A WooCommerce demo plugin * Author: Patrick Rauland * Author URI: http://speakinginbytes.com/ * Version: 1.0 * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. * */ Now that WordPress knows that your file is a plugin, it's time to add some functionality to this. The first thing a good developer does is makes sure their plugin won't conflict with another plugin. To do that, we make sure an existing class doesn't have the same name as our class. I'll be using the WC_Demo_Plugin class, but you can use any class name you want. Add the following code beneath the plugin header: if ( class_exists( 'WC_Demo_Plugin' ) ) {    return; }   class WC_Demo_Plugin {   } Our class doesn't do anything yet, but at least we've written it in such a way that it won't break another plugin. There's another good practice we should add to our plugin before we add the functionality, and that's some logic to make sure another plugin won't misuse our plugin. In the vast majority of use cases, you want to make sure there can't be two instances of your code running. In computer science, this is called the Singleton pattern. This can be controlled by tracking the instances of the plugin with a variable. Right after the WC_Demo_Plugin { line, add the following: protected static $instance = null;     /** * Return an instance of this class. * * @return object A single instance of this class. * @since 1.0 */ public static function get_instance() {    // If the single instance hasn't been set, set it now.    if ( null == self::$instance ) {        self::$instance = new self;    }      return self::$instance; } And get the plugin started by adding this right before the endif; line: add_action( 'plugins_loaded', array( 'WC_Demo_Plugin', 'get_instance' ), 0 ); At this point, we've made sure our plugin doesn't break other plugins and we've also dummy-proofed our own plugin so that we or other developers don't misuse it. Let's add just a bit more logic so that we don't run our logic unless WooCommerce is already loaded. This will make sure that we don't accidentally break something if we turn WooCommerce off temporarily. Right after the protected static $instance = null; line, add the following: /** * Initialize the plugin. * * @since 1.0 */ private function __construct() {    if ( class_exists( 'WooCommerce' ) ) {      } } And now our plugin only runs when WooCommerce is loaded. I'm guessing that at this point, you finally want it to do something, right? After we make sure WooCommerce is running, let's add some functionality. Right after the if ( class_exists( 'WooCommerce' ) ) { line, add the following code so that we add an admin notice: // print an admin notice to the screen. add_action( 'admin_notices', array( $this, 'my_admin_notice' ) ); This code will call a method called my_admin_notice, but we haven't written that yet, so it's not doing anything. Let's write that method. Have a look at the __construct method, which should now look like this: /** * Initialize the plugin. * * @since 1.0 */ private function __construct() {    if ( class_exists( 'WooCommerce' ) ) {          // print an admin notice to the screen.        add_action( 'admin_notices', array( $this, 'display_admin_notice' ) );      } } Add the following after the preceding __construct method: /** * Print an admin notice * * @since 1.0 */ public function display_admin_notice() {    ?>    <div class="updated">        <p><?php _e( 'The WooCommerce dummy plugin notice.', 'woocommerce-demo-plugin' ); ?></p>    </div>    <?php } This will print an admin notice on every single admin page. This notice includes all the messages you typically see in the WordPress admin. You could replace this admin notice method with just about any other hook in WooCommerce to provide additional customizations in other areas of WooCommerce, whether it be for shipping, the product page, the checkout process, or any other area. This plugin is the easiest way to get started with WooCommerce customizations. If you'd like to see the full code sample, you can see it at https://gist.github.com/BFTrick/3ab411e7cec43eff9769. Now that the plugin is complete, you need to upload it to your plugins folder. You can do this via the WordPress admin or more commonly via FTP. Once the plugin has been uploaded to your site, you'll need to activate the plugin just like any other WordPress plugin. The end result is a notice in the WordPress admin letting us know we did everything successfully. Whenever possible, use object-oriented code. That means using objects (like the WC_Demo_Plugin class) to encapsulate your code. It will prevent a lot of naming conflicts down the road. If you see some procedural code online, you can usually convert it to object-oriented code pretty easily. Summary In this article, you have learned the basic steps in installing WooCommerce, installing WooThemes plugins, manually creating WooCommerce pages, and creating a WooCommerce plugin. Resources for Article: Further resources on this subject: Creating Blog Content in WordPress [article] Tips and Tricks [article] Setting Up WooCommerce [article]
Read more
  • 0
  • 0
  • 11289

article-image-creating-games-cocos2d-x-easy-and-100-percent-free-0
Packt
01 Apr 2015
5 min read
Save for later

Creating Games with Cocos2d-x is Easy and 100 percent Free

Packt
01 Apr 2015
5 min read
In this article by Raydelto Hernandez, the author of the book Building Android games with Cocos2d-x, we will talk about the Cocos2d-x game engine, which is widely used to create Android games. The launch of the Apple App Store back in 2008 leveraged the reach capacity of indie game developers who since its occurrence are able to reach millions of users and compete with large companies, outperforming them in some situations. This reality led the trend of creating reusable game engines, such as Cocos2d-iPhone, which is written natively using Objective-C by the Argentine iPhone developer, Ricardo Quesada. Cocos2d-iPhone allowed many independent developers to reach the top charts of downloads. (For more resources related to this topic, see here.) Picking an existing game engine is a smart choice for indies and large companies since it allows them to focus on the game logic rather than rewriting core features over and over again. Thus, there are many game engines out there with all kinds of licenses and characteristics. The most popular game engines for mobile systems right now are Unity, Marmalade, and Cocos2d-x; the three of them have the capabilities to create 2D and 3D games. Determining which one is the best in terms of ease of use and availability of tools may be debatable, but there is one objective fact, which we can mention that could be easily verified. Among these three engines, Cocos2d-x is the only one that you can use for free no matter how much money you make using it. We highlighted in this article's title that Cocos2d-x is completely free. This was emphasized because the other two frameworks also allow some free usage; nevertheless, both of these at some point require a payment for the usage license. In order to understand why Cocos2d-x is still free and open source, we need to understand how this tool was born. Ricardo, an enthusiastic Python programmer, often participated in game creation challenges that required participants to develop games from scratch within a week. Back in those days, Ricardo and his team rewrote the core engine for each game until they came up with the idea of creating a framework to encapsulate core game capabilities. These capabilities could be used on any two-dimensional game to make it open source, so contributions could be received worldwide. This is why Cocos2d was originally written for fun. With the launch of the first iPhone in 2007, Ricardo led the development of the port of the Cocos2d Python framework to the iPhone platform using its native language, Objective-C. Cocos2d-iPhone quickly became popular among indie game developers, some of them turning into Appillionaires, as Chris Stevens called these individuals and enterprises that made millions of dollars during the App Store bubble period. This phenomenon made game development companies look at this framework created by hobbyists as a tool to create their products. Zynga was one of the first big companies to adopt Cocos2d as their framework to deliver their famous Farmville game to iPhone in 2009. This company has been trading on NASDAQ since 2011 and has more than 2,000 employees. In July 2010, a C++ port of the Cocos2d iPhone called Cocos2d-x, was written in China with the objective of taking the power of this framework to other platforms, such as the Android operating system, which by that time was gaining market share at a spectacular rate. In 2011, this Cocos2d port was acquired by Chukong Technologies, the third largest mobile game development company in China, who later hired the original Cocos2d-IPhone author to join their team. Today, Cocos2d-x-based games dominate the top grossing charts of Google Play and the App Store, especially in Asia. Recognized companies and leading studios, such as Konami, Zynga, Bandai Namco, Wooga, Disney Mobile, and Square Enix are using Cocos2d-x in their games. Currently, there are 400,000 developers working on adding new functionalities and making this framework as stable as possible. These include engineers from Google, ARM, Intel, BlackBerry, and Microsoft who officially support the ports of their products, such as Windows Phone, Windows, Windows Metro Interface, and they're planning to support Cocos2d-x for the Xbox in this year. Cocos2d-x is a very straightforward engine that requires a little learning to grasp it. I teach game development courses at many universities using this framework; during the first week, the students are capable of creating a game with the complexity of the famous title Doodle Jump. This can be easily achieved because the framework provides us all the single components that are required for our game, such as physics, audio handling, collision detection, animation, networking, data storage, user input, map rendering, scene transitions, 3D rendering, particle systems rendering, font handling, menu creation, displaying forms, threads handling, and so on. This abstracts us from the low-level logic and allows us to focus on the game logic. Summary In conclusion, if you are willing to learn how to develop games for mobile platforms, I strongly recommend you to learn and use the Cocos2d-x framework because it is easy to use, is totally free, is an open source. This means that you can better understand it by reading its source, you could modify it if needed, and you have the warranty that you will never be forced to pay a license fee if your game becomes a hit. Another big advantage of this framework is its highly available documentation, including the Packt Publishing collection of Cocos2d-x game development books. Resources for Article: Further resources on this subject: Moving the Space Pod Using Touch [article] Why should I make cross-platform games? [article] Animations in Cocos2d-x [article]
Read more
  • 0
  • 0
  • 5502

article-image-testing-our-application-ios-device
Packt
01 Apr 2015
10 min read
Save for later

Testing our application on an iOS device

Packt
01 Apr 2015
10 min read
In this article by Michelle M. Fernandez, author of the book Corona SDK Mobile Game Development Beginner's Guide, we can upload our first Hello World application on an iOS device, we need to log in into our Apple developer account so that we can create and install our signing certificates on our development machine. If you haven't created a developer account yet, do so by going to http://developer.apple.com/programs/ios/. Remember that there is a fee of $99 a year to become an Apple developer. (For more resources related to this topic, see here.) The Apple developer account is only applied to users developing on Mac OS X. Make sure that your version of Xcode is the same or newer than the version of the OS on your phone. For example, if you have version 5.0 of the iPhone OS installed, you will need Xcode that is bundled with the iOS SDK version 5.0 or later. Time for action – obtaining the iOS developer certificate Make sure that you're signed up for the developer program; you will need to use the Keychain Access tool located in /Applications/Utilities so that you can create a certificate request. A valid certificate must sign all iOS applications before they can be run on an Apple device in order to do any kind of testing. The following steps will show you how to create an iOS developer certificate: Go to Keychain Access | Certificate Assistant | Request a Certificate From a Certificate Authority: In the User Email Address field, type in the e-mail address you used when you registered as an iOS developer. For Common Name, enter your name or team name. Make sure that the name entered matches the information that was submitted when you registered as an iOS developer. The CA Email Address field does not need to be filled in, so you can leave it blank. We are not e-mailing the certificate to a Certificate Authority (CA). Check Saved to disk and Let me specify key pair information. When you click on Continue, you will be asked to choose a save location. Save your file at a destination where you can locate it easily, such as your desktop. In the following window, make sure that 2048 bits is selected for the Key Size and RSA for the Algorithm, and then click on Continue. This will generate the key and save it to the location you specified. Click on Done in the next window. Next, go to the Apple developer website at http://developer.apple.com/, click on iOS Dev Center, and log in to your developer account. Select Certificates, Identifiers & Profiles under iOS Developer Program on the right-hand side of the screen and navigate to Certificates under iOS Apps. Select the + icon on the right-hand side of the page. Under Development, click on the iOS App Development radio button. Click on the Continue button till you reach the screen to generate your certificate: Click on the Choose File button and locate your certificate file that you saved to your desktop, and then, click on the Generate button. Upon hitting Generate, you will get the e-mail notification you specified in the CA request form from Keychain Access, or you can download it directly from the developer portal. The person who created the certificate will get this e-mail and can approve the request by hitting the Approve button. Click on the Download button and save the certificate to a location that is easy to find. Once this is completed, double-click on the file, and the certificate will be added automatically in the Keychain Access. What just happened? We now have a valid certificate for iOS devices. The iOS Development Certificate is used for development purposes only and valid for about a year. The key pair is made up of your public and private keys. The private key is what allows Xcode to sign iOS applications. Private keys are available only to the key pair creator and are stored in the system keychain of the creator's machine. Adding iOS devices You are allowed to assign up to 100 devices for development and testing purposes in the iPhone Developer Program. To register a device, you will need the Unique Device Identification (UDID) number. You can find this in iTunes and Xcode. Xcode To find out your device's UDID, connect your device to your Mac and open Xcode. In Xcode, navigate to the menu bar, select Window, and then click on Organizer. The 40 hex character string in the Identifier field is your device's UDID. Once the Organizer window is open, you should see the name of your device in the Devices list on the left-hand side. Click on it and select the identifier with your mouse, copying it to the clipboard. Usually, when you connect a device to Organizer for the first time, you'll receive a button notification that says Use for Development. Select it and Xcode will do most of the provisioning work for your device in the iOS Provisioning Portal. iTunes With your device connected, open iTunes and click on your device in the device list. Select the Summary tab. Click on the Serial Number label to show the Identifier field and the 40-character UDID. Press Command + C to copy the UDID to your clipboard. Time for action – adding/registering your iOS device To add a device to use for development/testing, perform the following steps: Select Devices in the Developer Portal and click on the + icon to register a new device. Select the Register Device radio button to register one device. Create a name for your device in the Name field and put your device's UDID in the UDID field by pressing Command + V to paste the number you have saved on the clipboard. Click on Continue when you are done and click on Register once you have verified the device information. Time for action – creating an App ID Now that you have added a device to the portal, you will need to create an App ID. An App ID has a unique 10-character Apple ID Prefix generated by Apple and an Apple ID Suffix that is created by the Team Admin in the Provisioning Portal. An App ID could looks like this: 7R456G1254.com.companyname.YourApplication. To create a new App ID, use these steps: Click on App IDs in the Identifiers section of the portal and select the + icon. Fill out the App ID Description field with the name of your application. You are already assigned an Apple ID Prefix (also known as a Team ID). In the App ID Suffix field, specify a unique identifier for your app. It is up to you how you want to identify your app, but it is recommended that you use the reverse-domain style string, that is, com.domainname.appname. Click on Continue and then on Submit to create your App ID. You can create a wildcard character in the bundle identifier that you can share among a suite of applications using the same Keychain access. To do this, simply create a single App ID with an asterisk (*) at the end. You would place this in the field for the bundle identifier either by itself or at the end of your string, for example, com.domainname.*. More information on this topic can be found in the App IDs section of the iOS Provisioning Portal at https://developer.apple.com/ios/manage/bundles/howto.action. What just happened? All UDIDs are unique on every device, and we can locate them in Xcode and iTunes. When we added a device in the iOS Provisioning Portal, we took the UDID, which consists of 40 hex characters, and made sure we created a device name so that we could identify what we're using for development. We now have an App ID for the applications we want to install on a device. An App ID is a unique identifier that iOS uses to allow your application to connect to the Apple Push Notification service, share keychain data between applications, and communicate with external hardware accessories you wish to pair your iOS application with. Provisioning profiles A provisioning profile is a collection of digital entities that uniquely ties apps and devices to an authorized iOS Development Team and enables a device to be used to test a particular app. Provisioning profiles define the relationship between apps, devices, and development teams. They need to be defined for both the development and distribution aspects of an app. Time for action – creating a provisioning profile To create a provisioning profile, go to the Provisioning Profiles section of the Developer Portal and click on the + icon. Perform the following steps: Select the iOS App Development radio button under the Development section and then select Continue. Select the App ID you created for your application in the pull-down menu and click on Continue. Select the certificate you wish to include in the provisioning profile and then click on Continue. Select the devices you wish to authorize for this profile and click on Continue. Create a Profile Name and click on the Generate button when you are done: Click on the Download button. While the file is downloading, launch Xcode if it's not already open and press Shift + Command + 2 on the keyboard to open Organizer. Under Library, select the Provisioning Profiles section. Drag your downloaded .mobileprovision file to the Organizer window. This will automatically copy your .mobileprovision file to the proper directory. What just happened? Devices that have permission within the provisioning profile can be used for testing as long as the certificates are included in the profile. One device can have multiple provisioning profiles installed. Application icon Currently, our app has no icon image to display on the device. By default, if there is no icon image set for the application, you will see a light gray box displayed along with your application name below it once the build has been loaded to your device. So, launch your preferred creative developmental tool and let's create a simple image. The application icon for standard resolution iPad2 or iPad mini image file is 76 x 76 px PNG. The image should always be saved as Icon.png and must be located in your current project folder. iPhone/iPod touch devices that support retina display need an additional high resolution 120 x 120 px and iPad or iPad mini have an icon of 152 x 152 px named as Icon@2x.png. The contents of your current project folder should look like this: Hello World/       name of your project folderIcon.png           required for iPhone/iPod/iPadIcon@2x.png   required for iPhone/iPod with Retina displaymain.lua In order to distribute your app, the App Store requires a 1024 x 1024 pixel version of the icon. It is best to create your icon at a higher resolution first. Refer to the Apple iOS Human Interface Guidelines for the latest official App Store requirements at http://developer.apple.com/library/ios/#documentation/userexperience/conceptual/mobilehig/Introduction/Introduction.html. Creating an application icon is a visual representation of your application name. You will be able to view the icon on your device once you compile a build together. The icon is also the image that launches your application. Summary In this article, we covered how to test your app on an iOS device and register your iOS device. Resources for Article: Further resources on this subject: Linking OpenCV to an iOS project [article] Creating a New iOS Social Project [article] Sparrow iOS Game Framework - The Basics of Our Game [article]
Read more
  • 0
  • 0
  • 11080
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-securing-your-elastix-system
Packt
31 Mar 2015
19 min read
Save for later

Securing your Elastix System

Packt
31 Mar 2015
19 min read
In the Article by Gerardo Barajas Puente, author of Elastix Unified Communications Server Cookbook, we will discuss some topics regarding security in our Elastix Unified Communications System. We will share some recommendations to ensure our system's availability, privacy, and correct performance. Attackers' objectives may vary from damaging data, to data stealing, to telephonic fraud, to denial of service. This list is intended to minimize any type of attack, but remember that there are no definitive arguments about security; it is a constantly changing subject with new types of attacks, challenges, and opportunities. (For more resources related to this topic, see here.) The recipes covered in this article are as follows: Using Elastix's embedded firewall Using the Security Advanced Settings menu to enable security features Recording and monitoring calls Recording MeetMe rooms (conference rooms) Recording queues' calls Monitoring recordings Upgrading our Elastix system Generating system backups Restoring a backup from one server to another Using Elastix's embedded firewall Iptables is one of the most powerful tools of Linux's kernel. It is used largely in servers and devices worldwide. Elastix's security module incorporates iptables' main features into its webGUI in order to secure our Unified Communications Server. This module is available in the Security | Firewall menu. In this module's main screen, we can check the status of the firewall (Activated or Deactivated). We will also notice the status of each rule of the firewall with the following information: Order: This column represents the order in which rules will be applied Traffic: The rule will be applied to any ingoing or outgoing packet Target: This option allows, rejects, or drops a packet Interface: This represents the network interface on which the rule will be used Source Address: The firewall will search for this IP source address and apply the rule. Destination Address: We can apply a firewall rule if the destination address is matched Protocol: We can apply a rule depending on the IP protocol of the packet (TCP, UDP, ICMP, and so on) Details: In this column, the details or comments regarding this rule may appear in order to remind us of why this rule is being applied By default, when the firewall is applied, Elastix will allow the traffic from any device to use the ports that belong to the Unified Communications Services. The next image shows the state of the firewall. We can review this information in the Define Ports section as shown in the next image: In this section, we can delete, define a new rule (or port), or search for a specific port. If we click on the View link, we will be redirected to the editing page for the selected rule as shown in the next picture. This is helpful whenever we would like to change the details of a rule. How to do it… To add a new rule, click on the Define Port link and add the following information as shown in the next image:     Name: Name for this port.     Protocol: We can choose the IP protocol to use. The options are as follows: TCP, ICMP, IP, and UDP.     Port: We can enter a single port or a range of ports. To enter a port we just enter the port number in the first text field before the ":" character. If we'd like to enter a range, we must use the two text areas. The first one is for the first port of the range, and the second one is for the last port of the range.     Comment: We can enter a comment for this port. The next image shows the creation of a new port for GSM-Solution. This solution will use the TCP protocol from port 5000 to 5002. Having our ports defined, we proceed to activate the firewall by clicking on Save. As soon as the firewall service is activated, we will see the status of every rule. A message will be displayed, informing us that the service has been activated. When the service has been started, we will be able to edit, eliminate or change the execution order of a certain rule or rules. To add a new rule, click on the New Rule button (as shown in the next picture) and we will be redirected to a new web page. The information we need to enter is as follows:     Traffic: This option sets the rule for incoming (INPUT), outgoing (OUTPUT), or redirecting (FORWARD) packets.     Interface IN: This is the interface used for the rule. All the available network interfaces will be listed. The options ANY and LOOPBACK are also available     Source Address: We can apply a rule for any specified IP address. For example, we can block all the incoming traffic from the IP address 192.168.1.1. It is important to specify its netmask.     Destination Address: This is the destination IP address for the rule. It is important to specify its netmask.     Protocol: We can choose the protocol we would like to filter or forward. The options are TCP, UDP, ICMP, IP, and STATE.     Source Port: In this section, we can choose any option previously configured in the Port Definition section for the source port.     Destination Port: Here, we can select any option previously configured in the Port Definition section for the source port.     Target: This is the action to perform for any packet that matches any of the conditions set in the previous fields The next image shows the application of a new firewall's rule based on the ports we defined previously: We can also check the user's activity by using the Audit menu. This module can be found in the Security menu. To enhance our system's security we also recommend using Elastix's internal Port Knocking feature. Using the Security Advanced Settings menu to enable security features The Advanced Settings option will allow us to perform the following actions: Enable or disable direct access to FreePBX's webGUI. Enable or disable anonymous SIP calls. Change the database and web administration password for FreePBX. How to do it… Click on the Security | Advanced Settings menu and these options are shown as in the next screenshot. Recording and monitoring calls Whenever we have the need for recording the calls that pass through our system, Elastix, and taking advantage of FreePBX's and Asterisk's features. In this section, we will show the configuration steps to record the following types of calls: Extension's inbound and outbound calls MeetMe rooms (conference rooms) Queues Getting ready... Go to PBX | PBX Configuration | General Settings. In the section called Dialing Options, add the values w and W to the Asterisk Dial command options and the Asterisk Outbound Dial command options. These values will allow the users to start recording after pressing *1. The next screenshot shows this configuration. The next step is to set the options from the Call Recording section as follows: Extension recording override: Disabled. If enabled, this option will ignore all automatic recording settings for all extensions. Call recording format: We can choose the audio format that the recording files will have. We recommend the wav49 format because it is compact and the voice is understandable despite the audio quality. Here is a brief description for the audio file format: WAV: This is the most popular good quality recording format, but its size will increase by 1 MB per minute. WAV49: This format results from a GSM codec recording under the WAV encapsulation making the recording file smaller: 100 KB per minute. Its quality is similar to that of a mobile phone call. ULAW/ALAW: This is the native codec (G.711) used between TELCOS and users, but the file size is very large (1 MB per minute). SLN: SLN means SLINEAR format, which is Asterisk's native format. It is an 8-kHz, 16-bit signer linear raw format. GSM: This format is used for recording calls by using the GSM codec. The recording file size will be increased at a rate of 100 KB per minute. Recording location: We leave this option blank. This option specifies the folder where our recordings will be stored. By default, our system is configured to record calls in the /var/spool/asterisk/monitor folder. Run after record: We also leave this option blank. This is for running a script after a recording has been done. For more information about audio formats, visit: http://www.voip-info.org/wiki/view/Convert+WAV+audio+files+for+use+in+Asterisk Apply the changes. All these options are shown in the next screenshot: How to do it… To record all the calls that are generated or received from or to extensions go to the extension's details in the module: PBX | PBX Configuration. We have to click on the desired extension we would like to activate its call recording. In the Recording Options section, we have two options:     Record Incoming     Record Outgoing Depending on the type of recording, select from one of the following options:     On Demand: In this option, the user must press *1 during a call to start recording it. This option only lasts for the current call. When this call is terminated, if the user wants to record another, the digits *1 must be pressed again. If *1 is pressed during a call that is being recorded, the recording will be stopped.     Always: All the calls will be recorded automatically.     Never: This option disables all call recording. These options are shown in the next image. Recording MeetMe rooms If we need to record the calls that go to a conference room, Elastix allows us to do this. This feature is very helpful whenever we need to remember the topics discussed in a conference. How to do it… To record the calls of a conference room, enable it at the conference's details. These details are found in the menu: PBX | PBX Configuration | Conferences. Click on the conference we would like to record and set the Record Conference option to Yes. Save and apply the changes. These steps are shown in the next image. Recording queues' calls Most of the time, the calls that arrive in a queue must be recorded for quality and security purposes. In this recipe, we will show how to enable this feature. How to do it… Go to PBX | PBX Configuration | Queues. Click on a queue to record its calls. Search for the Call Recording option. Select the recording format to use (wav49, wav, gsm). Save and apply the changes. The following image shows the configuration of this feature. Monitoring recordings Now that we know how to record calls, we will show how to retrieve them in order to listen them. How to do it… To visualize the recorded calls, go to PBX | Monitoring. In this menu, we will be able to see the recordings stored in our system. The displayed columns are as follows:     Date: Date of call     Time: Time of call     Source: Source of call (may be an internal or external number)     Destination: Destination of call (may be an internal or external number)     Duration: Duration of call     Type: Incoming or outgoing     Message: This column sets the Listen and Download links to enable you to listen or download the recording files. To listen to a recording, just click on the Message link and a new window will popup in your web browser. This window will have the options to playback the selected recording. It is important to enable our web browser to reproduce audio. To download a recording, we click on the Download link. To delete a recording or group of recordings, just select them and click on the Delete button. To search for a recording or set of recordings, we can do it by date, source, destination, or type, by clicking on the Show Filter button. If click on the Download button, we can download the search or report of the recording files in any of the following formats: CSV, Excel, or Text. It is very important to regularly check the Hard Disk status to prevent it from getting full of recording files and therefore have insufficient space to allow the main services work efficiently. Encrypting voice calls In Elastix/Asterisk, the SIP calls can be encrypted in two ways: encrypting the SIP protocol signaling and encrypting the RTP voice flow. To encrypt the SIP protocol signal, we will use the Transport Layer Security (TLS) protocol. How to do it… Create security keys and certificates. For this example, we will store our keys and certificates in the /etc/asterisk/keys folder. To create this folder, enter the mkdir /etc/asterisk/keys command. Change the owner of the folder from the user root to the user asterisk: chown asterisk:asterisk /etc/asterisk/keys Generate the keys and certificates by going to the following folder: cd /usr/share/doc/asterisk-1.8.20.0/contrib/scripts/   ./ast_tls_cert -C 10.20.30.70 -O "Our Company" -d /etc/asterisk/keys Where the options are as follows:     -C is used to set the host (DNS name) or IP address of our Elastix server.     -O is the organizational name or description.     -d is the folder where keys will be stored. Generate a pair of keys for a pair of extensions (extension 7002 and extension 7003, for example):     For extension 7002: ./ast_tls_cert -m client -c /etc/asterisk/keys/ca.crt -k /etc/asterisk/keys/ca.key -C 10.20.31.107 -O "Elastix Company" -d /etc/asterisk/keys -o 7002     And for extension 7003 ./ast_tls_cert -m client -c /etc/asterisk/keys/ca.crt -k /etc/asterisk/keys/ca.key -C 10.20.31.106 -O "Elastix Company" -d /etc/asterisk/keys -o 7003 where:     -m client: This option sets the program to create a client certificate.     -c /etc/asterisk/keys/ca.crt: This option specifies the Certificate Authority to use (our IP-PBX).     -k /etc/asterisk/keys/ca.key: Provides the key file to the *.crt file.     -C: This option defines the hostname or IP address of our SIP device.     -O: This option defines the organizational name (same as above).     -d: This option specifies the directory where the keys and certificates will be stored.     -o: This is the name of the key and certificate we are creating. When creating the client's keys and certificates, we must enter the same password set when creating the server's certificates. Configure the IP-PBX to support TLS by editing the sip_general_custom.conf file located in the /etc/asterisk/ folder. Add the following lines: tlsenable=yes tlsbindaddr=0.0.0.0 tlscertfile=/etc/asterisk/keys/asterisk.pem tlscafile=/etc/asterisk/keys/ca.crt tlscipher=ALL tlsclientmethod=tlsv1 tlsdontverifyserver=yes     These lines are in charge of enabling the TLS support in our IP-PBX. They also specify the folder where the certificates and the keys are stored and set the ciphering option and client method to use. Add the line transport=tls to the extensions we would like to use TLS in the sip_custom.conf file located at /etc/asterisk/. This file should look like: [7002](+) encryption=yes transport=tls   [7003](+) encryption=yes transport=tls Reload the SIP module in the Asterisk service. This can be done by using the command: asterisk -rx 'sip reload' Configure our TLS-supporting IP phones. This configuration varies from model to model. It is important to mention that the port used for TLS and SIP is port 5061; therefore, our devices must use TCP/UDP port 5061. After our devices are registered and we can call each other, we can be sure this configuration is working. If we issue the command asterisk -rx 'sip show peer 7003', we will see that the encryption is enabled. At this point, we've just enabled the encryption at the SIP signaling level. With this, we can block any unauthorized user depending on which port the media (voice or/and video) is being transported or steal a username or password or eavesdrop a conversation. Now, we will proceed to enable the audio/video (RTP) encryption. This term is also known as Secure Real Time Protocol (SRTP). To do this, we only enable on the SIP peers the encryption=yes option. The screenshot after this shows an SRTP call between peers 7002 and 7003. This information can be displayed with the command: asterisk -rx 'sip show channel [the SIP channel of our call] The line RTP/SAVP informs us that the call is secure, and the call in the softphone shows an icon with the form of a lock confirming that the call is secure. The following screenshot shows the icon of a lock, informing us that the current call is secured through SRTP: We can have the SRTP enabled without enabling TLS, and we can even activate TLS support between SIP trunks and our Elastix system. There is more… To enable the IAX encryption in our extensions and IAX trunks, add the following line to their configuration file (/etc/asterisk/iax_general_ custom.conf): encryption=aes128 Reload the IAX module with the command: iax2 reload If we would like to see the encryption in action, configure the debug output in the logger.conf file and issue the following CLI commands: CLI> set debug 1 Core debug is at least 1 CLI> iax2 debug IAX2 Debugging Enabled Generating system backups Generating system backups is a very important activity that helps us to restore our system in case of an emergency or failure. The success of our Elastix platform depends on how quickly we can restore our system. In this recipe, we will cover the generation of backups. How to do it… To perform a backup on our Elastix UCS, go to the System | Backup/Restore menu. When entering this module, the first screen that we will see shows all the backup files available and stored in our system, the date they have been created, and the possibility to restore any of them. If we click on any of them, we can download it on to our laptop, tablet, or any device that will allow us to perform a full backup restore, in the event of a disaster. The next screenshot shows the list of backups available on a system. If we select a backup file from the main view, we can delete it by clicking on the Delete button. To create a backup, click on the Perform a Backup button. Select what modules (with their options) will be saved. Click on the Process button to start the backup process on our Elastix box. When done, a message will be displayed informing us that the process has been completed successfully. We can automate this process by clicking on Set Automatic Backup after selecting this option when this process will be started: Daily, Weekly, or Monthly. Restoring a backup from one server to another If we have a backup file, we can copy it to another recently installed Elastix Unified Communications Server, if we'd like to restore it. For example, Server A is a production server, but we'd like to use a brand new server with more resources (Server B). How to do it… After having Elastix installed in Server B, perform a backup, irrespective of whether there is no configuration in it and create a backup in Server A as well. Then, we copy the backup (*.tar file) from Server A to Server B with the console command (being in Server A's console): scp /var/www/backup/back-up-file.tar root@ip-address-of-server-b:/var/www/backup/ Log into Server B's console and change the ownership of the backup file with the command: chown asterisk:asterisk /var/www/backup/back-up-file.tar Restore the copied backup in Server B by using the System | Backup/Restore menu. When this process is being done, Elastix's webGUI will alert us of a restoring process being performed and it will show if there is any software difference between the backup and our current system. We recommend the use of the same Admin and Root passwords and the same telephony hardware in both servers. After this operation is done, we have to make sure that all configurations are working on the new server, before going on production. There is more… If we click on the FTP Backup option, we can drag and drop any selected backup to upload it to a remote FTP server or we can download it locally. We only need to set up the correct data to log us into the remote FTP server. The data to enter are as follows: Server FTP: IP address or domain name of the remote FTP server Port: FTP port User: User Password: Password Path Server FTP: Folder or directory to store the backup The next screenshot shows the FTP-Backup menu and options: Although securing systems is a very important and sometimes difficult area that requires a high level of knowledge, in this article, we discussed the most common but effective tasks that should be done in order to keep your Elastix Unified Communications System healthy and secure. Summary The main objective of this article is to give you all the necessary tools to configure and support an Elastix Unified Communications Server. We will look at these tools through Cookbook recipes, just follow the steps to get an Elastix System up and running. Although a good Linux and Asterisk background is required, this article is structured to help you grow from a beginner to an advanced user. Resources for Article: Further resources on this subject: Lync 2013 Hybrid and Lync Online [article] Creating an Apache JMeter™ test workbench [article] Innovation of Communication and Information Technologies [article]
Read more
  • 0
  • 0
  • 18566

article-image-hacking-toys-ifttt-and-spark
David Resseguie
31 Mar 2015
6 min read
Save for later

Hacking toys with IFTTT and Spark

David Resseguie
31 Mar 2015
6 min read
Open up even the simplest of toys and you’ll often be amazed at the number of interesting electronic components inside. This is especially true in many of the otherwise “throw away” toys found in fast food kids’ meals. I’ve tried to make it a habit of salvaging as many parts as possible from such toys so I can use them in future projects. (And I recommend you do the same!) But what if we could use the toy itself as a basis for a new project? In this post, we’ll look at one example of how we can Internet-enable a simple LED lantern toy using a wireless Spark Core device and the powerful IFTTT service. This particular LED lantern is operated by a standard on-off switch, and inside is a single LED, three coin batteries, and a simple switch mechanism for connecting and disconnecting power. Like many fast-food premiums, the lantern uses “tamper proof” triangular screws. If you don’t have the appropriate bit, you can usually make do with a small straight edge screwdriver. In addition to screws, some toys are also glued or sonic welded together, which makes it difficult to open without damaging the plastic beyond repair. Not shown in this photo is a small plastic piece that holds all the components in place. To programmatically control our lantern, we want to remove the batteries and run jumper cables to a pin on our microcontroller instead. Here is an exposed view after also removing the switch mechanism and attaching female-male jumper cables to the positive and negative leads of the LED. The next step is to hook our lantern up to the Spark Core. We choose the Spark Core for this project for two primary reasons. First, the Spark’s size is very conducive to toy hacking, especially for projects where you want to completely embed the electronics inside the finished product. Second, there is already a Spark channel on IFTTT that allows us to remotely trigger actions. More on that later! But before we go too far, let’s test our Spark setup to be sure we can power the LED. Run the jumper cable from the positive lead to pin D0 and the negative lead to GND. Now let’s write a simple Spark application that turns the LED on and off. Using Spark’s Web IDE, flash the following program onto your Spark Core. This will cause the LED to blink on and off in one second intervals. int led = D0; void setup() { pinMode(led, OUTPUT); } void loop() { digitalWrite(led, HIGH); delay(1000); digitalWrite(led, LOW); delay(1000); } But to really make our project useful, we need to hook it up to the Internet and respond to remote triggers for controlling the LED. IFTTT (pronounced like “gift” without the “g”) is a web-based service for connecting a variety of other online services and devices through “recipes”. An IFTT recipe is of the form “If [this] then [that]. The services that can be combined to fill in those blanks are called “channels”. IFTTT has dozens of channels to pick from, including email, SMS, Twitter, etc. But especially important to us: there is a Spark channel that allows Spark devices to serve as both triggers and actuators. For this project, we’ll set up our Spark as an actuator that that turns on the LED when the “if this” condition is met. To trigger our lantern, we could use any number of IFTTT channels, but for simplicity, let’s connect it up to the Yo smartphone app. Yo is a (rather silly) app that just lets you send a “yo” message to friends. The Yo channel for IFTTT allows you to trigger recipes by Yo-ing IFTTT. Load the app to your smartphone and add IFTTT as a contact by clicking the + button and typing “IFTTT” in the username field. If you haven’t already done so, create an IFTTT account and go to the “Channels” tab to activate the Yo and Spark channels. In both cases, you’ll have to log in to your respective accounts and authorize IFTTT. The process is straightforward and the IFTTT website walks you through the entire process. Once you’ve done this, you’re ready to create your first recipe. Click the “Create a Recipe” button found on the “My Recipes” tab. IFTTT will walk you through setting up both the trigger and action. For the “if this” condition, select your Yo channel and the “You Yo IFTTT” trigger. For the “then that” action, select the Spark channel and “Publish an event” action. Name the event (I just used “yo”) and select the “private event” option. (It doesn’t matter what you enter as the data field--we’re just going to ignore it anyway.) Name your recipe and click “Create Recipe” to finish the process. Your new recipe will now show up in your personal recipe list. Now we need to modify our Spark code to listen for our “yo” events. Back in the Spark Web IDE, change the code to the following. Now instead of turning the LED on and off in the loop() function, we instead register an event listener using Spark.subscribe() and turn the LED on for five seconds inside the callback function. int led = D0; void setup() { Spark.subscribe("yo", yoHandler, MY_DEVICES); pinMode(led, OUTPUT); } void loop() {} void yoHandler(const char *event, const char *data) { digitalWrite(led, HIGH); delay(5000); digitalWrite(led, LOW); } Once you’ve flashed this update to your Spark, it’s time to test it out! Be sure the Spark is flashing cyan (meaning it has a connection to the Spark cloud) and then use your smartphone to Yo IFTTT. The LED should light up for five seconds, then turn back off and wait again for the next “yo” event. Note that the “yo” events will be broadcast to all your Spark devices if you have more than one, so you could set up multiple hacked toys and send your greetings to several people at once. And if you choose to use public events, you could even trigger events to family and friends around the world.  All that’s left to do is package up the lantern by screwing everything back together. For a more permanent solution, instead of running the wires out to the external Spark, you could carefully fit the Spark and a small LiPo battery inside the lantern as well. I hope this post has inspired you to give new life to broken or disposable toys you have around the house. If you build something really cool, I’d love to see it. Consider sharing your project on the hackster.io Spark community. About the author David Resseguie is a member of the Computational Sciences and Engineering Division at Oak Ridge National Laboratory and lead developer for Sensorpedia. His interests include human computer interaction, Internet of Things, robotics, data visualization, and STEAM education. His current research focus is on applying social computing principles to the design of information sharing systems.
Read more
  • 0
  • 0
  • 3375

article-image-dealing-legacy-code
Packt
31 Mar 2015
16 min read
Save for later

Dealing with Legacy Code

Packt
31 Mar 2015
16 min read
In this article by Arun Ravindran, author of the book Django Best Practices and Design Patterns, we will discuss the following topics: Reading a Django code base Discovering relevant documentation Incremental changes versus full rewrites Writing tests before changing code Legacy database integration (For more resources related to this topic, see here.) It sounds exciting when you are asked to join a project. Powerful new tools and cutting-edge technologies might await you. However, quite often, you are asked to work with an existing, possibly ancient, codebase. To be fair, Django has not been around for that long. However, projects written for older versions of Django are sufficiently different to cause concern. Sometimes, having the entire source code and documentation might not be enough. If you are asked to recreate the environment, then you might need to fumble with the OS configuration, database settings, and running services locally or on the network. There are so many pieces to this puzzle that you might wonder how and where to start. Understanding the Django version used in the code is a key piece of information. As Django evolved, everything from the default project structure to the recommended best practices have changed. Therefore, identifying which version of Django was used is a vital piece in understanding it. Change of Guards Sitting patiently on the ridiculously short beanbags in the training room, the SuperBook team waited for Hart. He had convened an emergency go-live meeting. Nobody understood the "emergency" part since go live was at least 3 months away. Madam O rushed in holding a large designer coffee mug in one hand and a bunch of printouts of what looked like project timelines in the other. Without looking up she said, "We are late so I will get straight to the point. In the light of last week's attacks, the board has decided to summarily expedite the SuperBook project and has set the deadline to end of next month. Any questions?" "Yeah," said Brad, "Where is Hart?" Madam O hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked, was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline." There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this." "That's correct," said Evan from the far end of the room. "Nearly there," he smiled at others, as they shifted focus to him. Madam O peered above the rim of her glasses and smiled almost too politely. "Considering that we already have an extremely well-tested and working Archiver in our Sentinel code base, I would recommend that you leverage that instead of creating another redundant system." "But," Steve interrupted, "it is hardly redundant. We can improve over a legacy archiver, can't we?" "If it isn't broken, then don't fix it", replied Madam O tersely. He said, "He is working on it," said Brad almost shouting, "What about all that work he has already finished?" "Evan, how much of the work have you completed so far?" asked O, rather impatiently. "About 12 percent," he replied looking defensive. Everyone looked at him incredulously. "What? That was the hardest 12 percent" he added. O continued the rest of the meeting in the same pattern. Everybody's work was reprioritized and shoe-horned to fit the new deadline. As she picked up her papers, readying to leave she paused and removed her glasses. "I know what all of you are thinking... literally. But you need to know that we had no choice about the deadline. All I can tell you now is that the world is counting on you to meet that date, somehow or other." Putting her glasses back on, she left the room. "I am definitely going to bring my tinfoil hat," said Evan loudly to himself. Finding the Django version Ideally, every project will have a requirements.txt or setup.py file at the root directory, and it will have the exact version of Django used for that project. Let's look for a line similar to this: Django==1.5.9 Note that the version number is exactly mentioned (rather than Django>=1.5.9), which is called pinning. Pinning every package is considered a good practice since it reduces surprises and makes your build more deterministic. Unfortunately, there are real-world codebases where the requirements.txt file was not updated or even completely missing. In such cases, you will need to probe for various tell-tale signs to find out the exact version. Activating the virtual environment In most cases, a Django project would be deployed within a virtual environment. Once you locate the virtual environment for the project, you can activate it by jumping to that directory and running the activated script for your OS. For Linux, the command is as follows: $ source venv_path/bin/activate Once the virtual environment is active, start a Python shell and query the Django version as follows: $ python >>> import django >>> print(django.get_version()) 1.5.9 The Django version used in this case is Version 1.5.9. Alternatively, you can run the manage.py script in the project to get a similar output: $ python manage.py --version 1.5.9 However, this option would not be available if the legacy project source snapshot was sent to you in an undeployed form. If the virtual environment (and packages) was also included, then you can easily locate the version number (in the form of a tuple) in the __init__.py file of the Django directory. For example: $ cd envs/foo_env/lib/python2.7/site-packages/django $ cat __init__.py VERSION = (1, 5, 9, 'final', 0) ... If all these methods fail, then you will need to go through the release notes of the past Django versions to determine the identifiable changes (for example, the AUTH_PROFILE_MODULE setting was deprecated since Version 1.5) and match them to your legacy code. Once you pinpoint the correct Django version, then you can move on to analyzing the code. Where are the files? This is not PHP One of the most difficult ideas to get used to, especially if you are from the PHP or ASP.NET world, is that the source files are not located in your web server's document root directory, which is usually named wwwroot or public_html. Additionally, there is no direct relationship between the code's directory structure and the website's URL structure. In fact, you will find that your Django website's source code is stored in an obscure path such as /opt/webapps/my-django-app. Why is this? Among many good reasons, it is often more secure to move your confidential data outside your public webroot. This way, a web crawler would not be able to accidentally stumble into your source code directory. Starting with urls.py Even if you have access to the entire source code of a Django site, figuring out how it works across various apps can be daunting. It is often best to start from the root urls.py URLconf file since it is literally a map that ties every request to the respective views. With normal Python programs, I often start reading from the start of its execution—say, from the top-level main module or wherever the __main__ check idiom starts. In the case of Django applications, I usually start with urls.py since it is easier to follow the flow of execution based on various URL patterns a site has. In Linux, you can use the following find command to locate the settings.py file and the corresponding line specifying the root urls.py: $ find . -iname settings.py -exec grep -H 'ROOT_URLCONF' {} ; ./projectname/settings.py:ROOT_URLCONF = 'projectname.urls'   $ ls projectname/urls.py projectname/urls.py Jumping around the code Reading code sometimes feels like browsing the web without the hyperlinks. When you encounter a function or variable defined elsewhere, then you will need to jump to the file that contains that definition. Some IDEs can do this automatically for you as long as you tell it which files to track as part of the project. If you use Emacs or Vim instead, then you can create a TAGS file to quickly navigate between files. Go to the project root and run a tool called Exuberant Ctags as follows: find . -iname "*.py" -print | etags - This creates a file called TAGS that contains the location information, where every syntactic unit such as classes and functions are defined. In Emacs, you can find the definition of the tag, where your cursor (or point as it called in Emacs) is at using the M-. command. While using a tag file is extremely fast for large code bases, it is quite basic and is not aware of a virtual environment (where most definitions might be located). An excellent alternative is to use the elpy package in Emacs. It can be configured to detect a virtual environment. Jumping to a definition of a syntactic element is using the same M-. command. However, the search is not restricted to the tag file. So, you can even jump to a class definition within the Django source code seamlessly. Understanding the code base It is quite rare to find legacy code with good documentation. Even if you do, the documentation might be out of sync with the code in subtle ways that can lead to further issues. Often, the best guide to understand the application's functionality is the executable test cases and the code itself. The official Django documentation has been organized by versions at https://docs.djangoproject.com. On any page, you can quickly switch to the corresponding page in the previous versions of Django with a selector on the bottom right-hand section of the page: In the same way, documentation for any Django package hosted on readthedocs.org can also be traced back to its previous versions. For example, you can select the documentation of django-braces all the way back to v1.0.0 by clicking on the selector on the bottom left-hand section of the page: Creating the big picture Most people find it easier to understand an application if you show them a high-level diagram. While this is ideally created by someone who understands the workings of the application, there are tools that can create very helpful high-level depiction of a Django application. A graphical overview of all models in your apps can be generated by the graph_models management command, which is provided by the django-command-extensions package. As shown in the following diagram, the model classes and their relationships can be understood at a glance: Model classes used in the SuperBook project connected by arrows indicating their relationships This visualization is actually created using PyGraphviz. This can get really large for projects of even medium complexity. Hence, it might be easier if the applications are logically grouped and visualized separately. PyGraphviz Installation and Usage If you find the installation of PyGraphviz challenging, then don't worry, you are not alone. Recently, I faced numerous issues while installing on Ubuntu, starting from Python 3 incompatibility to incomplete documentation. To save your time, I have listed the steps that worked for me to reach a working setup. On Ubuntu, you will need the following packages installed to install PyGraphviz: $ sudo apt-get install python3.4-dev graphviz libgraphviz-dev pkg-config Now activate your virtual environment and run pip to install the development version of PyGraphviz directly from GitHub, which supports Python 3: $ pip install git+http://github.com/pygraphviz/pygraphviz.git#egg=pygraphviz Next, install django-extensions and add it to your INSTALLED_APPS. Now, you are all set. Here is a sample usage to create a GraphViz dot file for just two apps and to convert it to a PNG image for viewing: $ python manage.py graph_models app1 app2 > models.dot $ dot -Tpng models.dot -o models.png Incremental change or a full rewrite? Often, you would be handed over legacy code by the application owners in the earnest hope that most of it can be used right away or after a couple of minor tweaks. However, reading and understanding a huge and often outdated code base is not an easy job. Unsurprisingly, most programmers prefer to work on greenfield development. In the best case, the legacy code ought to be easily testable, well documented, and flexible to work in modern environments so that you can start making incremental changes in no time. In the worst case, you might recommend discarding the existing code and go for a full rewrite. Or, as it is commonly decided, the short-term approach would be to keep making incremental changes, and a parallel long-term effort might be underway for a complete reimplementation. A general rule of thumb to follow while taking such decisions is—if the cost of rewriting the application and maintaining the application is lower than the cost of maintaining the old application over time, then it is recommended to go for a rewrite. Care must be taken to account for all the factors, such as time taken to get new programmers up to speed, the cost of maintaining outdated hardware, and so on. Sometimes, the complexity of the application domain becomes a huge barrier against a rewrite, since a lot of knowledge learnt in the process of building the older code gets lost. Often, this dependency on the legacy code is a sign of poor design in the application like failing to externalize the business rules from the application logic. The worst form of a rewrite you can probably undertake is a conversion, or a mechanical translation from one language to another without taking any advantage of the existing best practices. In other words, you lost the opportunity to modernize the code base by removing years of cruft. Code should be seen as a liability not an asset. As counter-intuitive as it might sound, if you can achieve your business goals with a lesser amount of code, you have dramatically increased your productivity. Having less code to test, debug, and maintain can not only reduce ongoing costs but also make your organization more agile and flexible to change. Code is a liability not an asset. Less code is more maintainable. Irrespective of whether you are adding features or trimming your code, you must not touch your working legacy code without tests in place. Write tests before making any changes In the book Working Effectively with Legacy Code, Michael Feathers defines legacy code as, simply, code without tests. He elaborates that with tests one can easily modify the behavior of the code quickly and verifiably. In the absence of tests, it is impossible to gauge if the change made the code better or worse. Often, we do not know enough about legacy code to confidently write a test. Michael recommends writing tests that preserve and document the existing behavior, which are called characterization tests. Unlike the usual approach of writing tests, while writing a characterization test, you will first write a failing test with a dummy output, say X, because you don't know what to expect. When the test harness fails with an error, such as "Expected output X but got Y", then you will change your test to expect Y. So, now the test will pass, and it becomes a record of the code's existing behavior. Note that we might record buggy behavior as well. After all, this is unfamiliar code. Nevertheless, writing such tests are necessary before we start changing the code. Later, when we know the specifications and code better, we can fix these bugs and update our tests (not necessarily in that order). Step-by-step process to writing tests Writing tests before changing the code is similar to erecting scaffoldings before the restoration of an old building. It provides a structural framework that helps you confidently undertake repairs. You might want to approach this process in a stepwise manner as follows: Identify the area you need to make changes to. Write characterization tests focusing on this area until you have satisfactorily captured its behavior. Look at the changes you need to make and write specific test cases for those. Prefer smaller unit tests to larger and slower integration tests. Introduce incremental changes and test in lockstep. If tests break, then try to analyze whether it was expected. Don't be afraid to break even the characterization tests if that behavior is something that was intended to change. If you have a good set of tests around your code, then you can quickly find the effect of changing your code. On the other hand, if you decide to rewrite by discarding your code but not your data, then Django can help you considerably. Legacy databases There is an entire section on legacy databases in Django documentation and rightly so, as you will run into them many times. Data is more important than code, and databases are the repositories of data in most enterprises. You can modernize a legacy application written in other languages or frameworks by importing their database structure into Django. As an immediate advantage, you can use the Django admin interface to view and change your legacy data. Django makes this easy with the inspectdb management command, which looks as follows: $ python manage.py inspectdb > models.py This command, if run while your settings are configured to use the legacy database, can automatically generate the Python code that would go into your models file. Here are some best practices if you are using this approach to integrate to a legacy database: Know the limitations of Django ORM beforehand. Currently, multicolumn (composite) primary keys and NoSQL databases are not supported. Don't forget to manually clean up the generated models, for example, remove the redundant 'ID' fields since Django creates them automatically. Foreign Key relationships may have to be manually defined. In some databases, the auto-generated models will have them as integer fields (suffixed with _id). Organize your models into separate apps. Later, it will be easier to add the views, forms, and tests in the appropriate folders. Remember that running the migrations will create Django's administrative tables (django_* and auth_*) in the legacy database. In an ideal world, your auto-generated models would immediately start working, but in practice, it takes a lot of trial and error. Sometimes, the data type that Django inferred might not match your expectations. In other cases, you might want to add additional meta information such as unique_together to your model. Eventually, you should be able to see all the data that was locked inside that aging PHP application in your familiar Django admin interface. I am sure this will bring a smile to your face. Summary In this article, we looked at various techniques to understand legacy code. Reading code is often an underrated skill. But rather than reinventing the wheel, we need to judiciously reuse good working code whenever possible. Resources for Article: Further resources on this subject: So, what is Django? [article] Adding a developer with Django forms [article] Introduction to Custom Template Filters and Tags [article]
Read more
  • 0
  • 0
  • 7306

article-image-performing-hand-written-digit-recognition-golearn
Alex Browne
31 Mar 2015
9 min read
Save for later

Performing hand-written digit recognition with GoLearn

Alex Browne
31 Mar 2015
9 min read
In this step-by-step post, you'll learn how to do basic recognition of hand-written digits using GoLearn, a machine learning library for Go. I'll assume you are already comfortable with Go and have a basic understanding of machine learning. To learn Go, I recommend the interactive tutorial. And to learn about machine learning, I recommend Andrew Ng's Machine Learning course on Coursera. All of the code for this tutorial is available on github. Installation & Set Up  To follow along with this post, you will need to install: Go version 1.2 or later The GoLearn package Also, make sure that you follow these intructions for setting up your go work environment. In particular, you will need to have the GOPATH environment variable pointing to a directory where all of your Go code will reside. Project Structure Now is a good time to setup the directory where your code for this project will reside. Somewhere in your $GOPATH/src, create a new directory and call it whatever you want. I recommend $GOPATH/src/github.com/your-github-username/golearn-digit-recognition. Our basic project structure is going to look like this: golearn-digit-recognition/ data/ mnist_train.csv mnist_test.csv main.go The data directory is where we'll put our training and test data, and our program is going to consist of a single file: main.go. Getting the Training Data As I mentioned, in this post we're going to be using GoLearn to recognize hand-written digits. The training data we'll use comes from the popular MNIST handwritten digit database. I've already split the data into training and test sets and formatted it in the way GoLearn expects. You can simply download the CSV files and put them in your data directory:  Training Data Test Data The data consists of a series of 28x28 pixel grayscale images and labels for the corresponding digit (0-9). 28x28 = 784 so there are 784 features. In the CSV files, the pixels are labeled pixel0-pixel783. Each pixel can take on a value between 0 and 255, where 0 is white and 255 is black. There are 5,000 rows in the training data, and 500 in the test data. Writing the Code Without further ado, let's write a simple program to detect hand-written digits. Open up the main.go file in your favorite text editor and add the following lines: package main import ( "fmt" "github.com/sjwhitworth/golearn/base" ) func main() { // Load and parse the data from csv files fmt.Println("Loading data...") trainData, err := base.ParseCSVToInstances("data/mnist_train.csv", true) if err != nil { panic(err) } testData, err := base.ParseCSVToInstances("data/mnist_test.csv", true) if err != nil { panic(err) } } The ParseCSVToInstances function reads the CSV file and converts it into "Instances," which is simply a data structure that GoLearn can understand and manipulate. You should run the program with go run main.go to make sure everything works so far. Next, we're going to create a linear Support Vector Classifier, which is a type of Support Vector Machine where the output is the probability that the input belongs to some class. In our case, there are 10 possible classes representing the digits 0 through 9, so our SVC will consist of 10 SVMs, each of which outputs the probability that the input belongs to a certain class. The SVC will then simply output the class with the highest probability.  Modify main.go by importing the linear_models package from golearn: import (     // ...     "github.com/sjwhitworth/golearn/linear_models" ) Then add the following lines: func main() {           // ...        // Create a new linear SVC with some good default values      classifier, err := linear_models.NewLinearSVC("l1", "l2", true, 1.0, 1e-4)      if err != nil {           panic(err)      }        // Don't output information on each iteration      base.Silent()        // Train the linear SVC      fmt.Println("Training...")      classifier.Fit(trainData) }   You can read more about the different parameters for the SVC here. I found that these parameters give pretty good results. After we've created the classifier, training it is as simple as calling classifier.Fit(). Now might be a good time to run go run main.go again to make sure everything compiles and works as expected. If you want to see some details about what's going on with the classifier, comment out or remove the base.Silent() line. Finally, we can test the accuracy of our SVC by making predictions on the test data and then comparing our predictions to the expected output. GoLearn makes it really easy to do this. Just modify main.go as follows: package main   import (      // ...      "github.com/sjwhitworth/golearn/evaluation"     // ... )   func main() {           // ...        // Make predictions for the test data      fmt.Println("Predicting...")      predictions, err := classifier.Predict(testData)      if err != nil {           panic(err)      }        // Get a confusion matrix and print out some accuracy stats for our predictions      confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions)      if err != nil {           panic(fmt.Sprintf("Unable to get confusion matrix: %s", err.Error()))      }      fmt.Println(evaluation.GetSummary(confusionMat)) }     After making the predictions for our test data, we use the evaluation package to quickly get some stats about the accuracy of our classifier. You should run the program again with go run main.go. If everything works correctly, you should see output that looks something like this:  Loading data...Training...Predicting...Reference Class     True Positives     False Positives     True Negatives     Precision     Recall     F1 Score---------------     --------------     ---------------     --------------     ---------     ------     --------6          42          4          447          0.9130          0.8571     0.88425          31          15          444          0.6739          0.7561     0.71268          37          7          445          0.8409          0.7708     0.80437          47          5          440          0.9038          0.8545     0.87852          51          6          434          0.8947          0.8500     0.87183          35          9          448          0.7955          0.8140     0.80461          50          5          443          0.9091          0.9615     0.93464          48          4          441          0.9231          0.8727     0.89720          41          3          455          0.9318          0.9762     0.95359          49          11          434          0.8167          0.8909     0.8522Overall accuracy: 0.8620 That's about an 86% accuracy. Not too bad! And all it took was a few lines of code! Summary If you want to do even better, try playing around with the parameters for the SVC or use a different classifier. GoLearn has support for linear and logistic regression, K nearest neighbor, neural networks, and more! About the author Alex Browne is a recent college grad living in Raleigh NC with 4 years of professional software experience. He does software contract work to make ends meet, and spends most of his free time learning new things and working on various side projects. He is passionate about open source technology and has plans to start his own company.
Read more
  • 0
  • 0
  • 3064
article-image-gui-components-qt-5
Packt
30 Mar 2015
8 min read
Save for later

GUI Components in Qt 5

Packt
30 Mar 2015
8 min read
In this article by Symeon Huang, author of the book Qt 5 Blueprints, explains typical and basic GUI components in Qt 5 (For more resources related to this topic, see here.) Design UI in Qt Creator Qt Creator is the official IDE for Qt application development and we're going to use it to design application's UI. At first, let's create a new project: Open Qt Creator. Navigate to File | New File or Project. Choose Qt Widgets Application. Enter the project's name and location. In this case, the project's name is layout_demo. You may wish to follow the wizard and keep the default values. After this creating process, Qt Creator will generate the skeleton of the project based on your choices. UI files are under Forms directory. And when you double-click on a UI file, Qt Creator will redirect you to integrated Designer, the mode selector should have Design highlighted and the main window should contains several sub-windows to let you design the user interface. Here we can design the UI by dragging and dropping. Qt Widgets Drag three push buttons from the widget box (widget palette) into the frame of MainWindow in the center. The default text displayed on these buttons is PushButtonbut you can change text if you want, by double-clicking on the button. In this case, I changed them to Hello, Hola, and Bonjouraccordingly. Note that this operation won't affect the objectName property and in order to keep it neat and easy-to-find, we need to change the objectName! The right-hand side of the UI contains two windows. The upper right section includes Object Inspector and the lower-right includes the Property Editor. Just select a push button, we can easily change objectName in the Property Editor. For the sake of convenience, I changed these buttons' objectName properties to helloButton, holaButton, and bonjourButton respectively. Save changes and click on Run on the left-hand side panel, it will build the project automatically then run it as shown in the following screenshot: In addition to the push button, Qt provides lots of commonly used widgets for us. Buttons such as tool button, radio button, and checkbox. Advanced views such as list, tree, and table. Of course there are input widgets, line edit, spin box, font combo box, date and time edit, and so on. Other useful widgets such as progress bar, scroll bar, and slider are also in the list. Besides, you can always subclass QWidget and write your own one. Layouts A quick way to delete a widget is to select it and press the Delete button. Meanwhile, some widgets, such as the menu bar, status bar, and toolbar can't be selected, so we have to right-click on them in Object Inspector and delete them. Since they are useless in this example, it's safe to remove them and we can do this for good. Okay, let's understand what needs to be done after the removal. You may want to keep all these push buttons on the same horizontal axis. To do this, perform the following steps: Select all the push buttons either by clicking on them one by one while keeping the Ctrl key pressed or just drawing an enclosing rectangle containing all the buttons. Right-click and select Layout | LayOut Horizontally. The keyboard shortcut for this is Ctrl + H. Resize the horizontal layout and adjust its layoutSpacing by selecting it and dragging any of the points around the selection box until it fits best. Hmm…! You may have noticed that the text of the Bonjour button is longer than the other two buttons, and it should be wider than the others. How do you do this? You can change the property of the horizontal layout object's layoutStretch property in Property Editor. This value indicates the stretch factors of the widgets inside the horizontal layout. They would be laid out in proportion. Change it to 3,3,4, and there you are. The stretched size definitely won't be smaller than the minimum size hint. This is how the zero factor works when there is a nonzero natural number, which means that you need to keep the minimum size instead of getting an error with a zero divisor. Now, drag Plain Text Edit just below, and not inside, the horizontal layout. Obviously, it would be neater if we could extend the plain text edit's width. However, we don't have to do this manually. In fact, we could change the layout of the parent, MainWindow. That's it! Right-click on MainWindow, and then navigate to Lay out | Lay Out Vertically. Wow! All the children widgets are automatically extended to the inner boundary of MainWindow; they are kept in a vertical order. You'll also find Layout settings in the centralWidget property, which is exactly the same thing as the previous horizontal layout. The last thing to make this application halfway decent is to change the title of the window. MainWindow is not the title you want, right? Click on MainWindow in the object tree. Then, scroll down its properties to find windowTitle. Name it whatever you want. In this example, I changed it to Greeting. Now, run the application again and you will see it looks like what is shown in the following screenshot: Qt Quick Components Since Qt 5, Qt Quick has evolved to version 2.0 which delivers a dynamic and rich experience. The language it used is so-called QML, which is basically an extended version of JavaScript using a JSON-like format. To create a simple Qt Quick application based on Qt Quick Controls 1.2, please follow following procedures: Create a new project named HelloQML. Select Qt Quick Application instead of Qt Widgets Application that we chose previously. Select Qt Quick Controls 1.2 when the wizard navigates you to Select Qt Quick Components Set. Edit the file main.qml under the root of Resources file, qml.qrc, that Qt Creator has generated for our new Qt Quick project. Let's see how the code should be. import QtQuick 2.3 import QtQuick.Controls 1.2   ApplicationWindow {    visible: true    width: 640    height: 480    title: qsTr("Hello QML")      menuBar: MenuBar {        Menu {            title: qsTr("File")            MenuItem {                text: qsTr("Exit")                shortcut: "Ctrl+Q"                onTriggered: Qt.quit()            }        }    }      Text {        id: hw        text: qsTr("Hello World")        font.capitalization: Font.AllUppercase        anchors.centerIn: parent    }      Label {        anchors { bottom: hw.top; bottomMargin: 5; horizontalCenter: hw.horizontalCenter }        text: qsTr("Hello Qt Quick")    } } If you ever touched Java or Python, then the first two lines won't be too unfamiliar for you. It simply imports the Qt Quick and Qt Quick Controls. And the number behind is the version of the library. The body of this QML source file is really in JSON style, which enables you understand the hierarchy of the user interface through the code. Here, the root item is ApplicationWindow, which is basically the same thing as QMainWindow in Qt/C++. When you run this application in Windows, you can barely find the difference between the Text item and Label item. But on some platforms, or when you change system font and/or its colour, you'll find that Label follows the font and colour scheme of the system while Text doesn't. Run this application, you'll see there is a menu bar, a text, and a label in the application window. Exactly what we wrote in the QML file: You may miss the Design mode for traditional Qt/C++ development. Well, you can still design Qt Quick application in Design mode! Click on Design in mode selector when you edit main.qml file. Qt Creator will redirect you into Design mode where you can use mouse drag-and-drop UI components: Almost all widgets you use in Qt Widget application can be found here in a Qt Quick application. Moreover, you can use other modern widgets such as busy indicator in Qt Quick while there's no counterpart in Qt Widget application. However, QML is a declarative language whose performance is obviously poor than C++. Therefore, more and more developers choose to write UI with Qt Quick in order to deliver a better visual style, while keep core functions in Qt/C++. Summary In this article, we had a brief contact with various GUI components of Qt 5 and focus on the Design mode in Qt Creator. Two small examples used as a Qt-like "Hello World" demonstrations. Resources for Article: Further resources on this subject: Code interlude – signals and slots [article] Program structure, execution flow, and runtime objects [article] Configuring Your Operating System [article]
Read more
  • 0
  • 0
  • 5044

Packt
30 Mar 2015
28 min read
Save for later

PostgreSQL – New Features

Packt
30 Mar 2015
28 min read
In this article, Jayadevan Maymala, author of the book, PostgreSQL for Data Architects, you will see how to troubleshoot the initial hiccups faced by people who are new to PostgreSQL. We will look at a few useful, but not commonly used data types. We will also cover pgbadger, a nifty third-party tool that can run through a PostgreSQL log. This tool can tell us a lot about what is happening in the cluster. Also, we will look at a few key features that are part of PostgreSQL 9.4 release. We will cover a couple of useful extensions. (For more resources related to this topic, see here.) Interesting data types We will start with the data types. PostgreSQL does have all the common data types we see in databases. These include: The number data types (smallint, integer, bigint, decimal, numeric, real, and double) The character data types (varchar, char, and text) The binary data types The date/time data types (including date, timestamp without timezone, and timestamp with timezone) BOOLEAN data types However, this is all standard fare. Let's start off by looking at the RANGE data type. RANGE This is a data type that can be used to capture values that fall in a specific range. Let's look at a few examples of use cases. Cars can be categorized as compact, convertible, MPV, SUV, and so on. Each of these categories will have a price range. For example, the price range of a category of cars can start from $15,000 at the lower end and the price range at the upper end can start from $40,000. We can have meeting rooms booked for different time slots. Each room is booked during different time slots and is available accordingly. Then, there are use cases that involve shift timings for employees. Each shift begins at a specific time, ends at a specific time, and involves a specific number of hours on duty. We would also need to capture the swipe-in and swipe-out time for employees. These are some use cases where we can consider range types. Range is a high-level data type; we can use int4range as the appropriate subtype for the car price range scenario. For the booking the meeting rooms and shifting use cases, we can consider tsrange or tstzrange (if we want to capture time zone as well). It makes sense to explore the possibility of using range data types in most scenarios, which involve the following features: From and to timestamps/dates for room reservations Lower and upper limit for price/discount ranges Scheduling jobs Timesheets Let's now look at an example. We have three meeting rooms. The rooms can be booked and the entries for reservations made go into another table (basic normalization principles). How can we find rooms that are not booked for a specific time period, say, 10:45 to 11:15? We will look at this with and without the range data type: CREATE TABLE rooms(id serial, descr varchar(50));   INSERT INTO rooms(descr) SELECT concat('Room ', generate_series(1,3));   CREATE TABLE room_book (id serial , room_id integer, from_time timestamp, to_time timestamp , res tsrange);   INSERT INTO room_book (room_id,from_time,to_time,res) values(1,'2014-7-30 10:00:00', '2014-7-30 11:00:00', '(2014-7-30 10:00:00,2014-7-30 11:00:00)');   INSERT INTO room_book (room_id,from_time,to_time,res) values(2,'2014-7-30 10:00:00', '2014-7-30 10:40:00', '(2014-7-30 10:00,2014-7-30 10:40:00)');   INSERT INTO room_book (room_id,from_time,to_time,res) values(2,'2014-7-30 11:20:00', '2014-7-30 12:00:00', '(2014-7-30 11:20:00,2014-7-30 12:00:00)');   INSERT INTO room_book (room_id,from_time,to_time,res) values(3,'2014-7-30 11:00:00', '2014-7-30 11:30:00', '(2014-7-30 11:00:00,2014-7-30 11:30:00)'); PostgreSQL has the OVERLAPS operator. This can be used to get all the reservations that overlap with the period for which we wanted to book a room: SELECT room_id FROM room_book WHERE (from_time,to_time) OVERLAPS ('2014-07-30 10:45:00','2014-07-30 11:15:00'); If we eliminate these room IDs from the master list, we have the list of rooms available. So, we prefix the following command to the preceding SQL: SELECT id FROM rooms EXCEPT We get a room ID that is not booked from 10:45 to 11:15. This is the old way of doing it. With the range data type, we can write the following SQL statement: SELECT id FROM rooms EXCEPT SELECT room_id FROM room_book WHERE res && '(2014-07-30 10:45:00,2014-07-30 11:15:00)'; Do look up GIST indexes to improve the performance of queries that use range operators. Another way of achieving the same is to use the following command: SELECT id FROM rooms EXCEPT SELECT room_id FROM room_book WHERE '2014-07-30 10:45:00' < to_time AND '2014-07-30 11:15:00' > from_time; Now, let's look at the finer points of how a range is represented. The range values can be opened using [ or ( and closed with ] or ). [ means include the lower value and ( means exclude the lower value. The closing (] or )) has a similar effect on the upper values. When we do not specify anything, [) is assumed, implying include the lower value, but exclude the upper value. Note that the lower bound is 3 and upper bound is 6 when we mention 3,5, as shown here: SELECT int4range(3,5,'[)') lowerincl ,int4range(3,5,'[]') bothincl, int4range(3,5,'()') bothexcl , int4range(3,5,'[)') upperexcl; lowerincl | bothincl | bothexcl | upperexcl -----------+----------+----------+----------- [3,5)       | [3,6)       | [4,5)       | [3,5) Using network address types The network address types are cidr, inet, and macaddr. These are used to capture IPv4, IPv6, and Mac addresses. Let's look at a few use cases. When we have a website that is open to public, a number of users from different parts of the world access it. We may want to analyze the access patterns. Very often, websites can be used by users without registering or providing address information. In such cases, it becomes even more important that we get some insight into the users based on the country/city and similar location information. When anonymous users access our website, an IP is usually all we get to link the user to a country or city. Often, this becomes our not-so-accurate unique identifier (along with cookies) to keep track of repeat visits, to analyze website-usage patterns, and so on. The network address types can also be useful when we develop applications that monitor a number of systems in different networks to check whether they are up and running, to monitor resource consumption of the systems in the network, and so on. While data types (such as VARCHAR or BIGINT) can be used to store IP addresses, it's recommended to use one of the built-in types PostgreSQL provides to store network addresses. There are three data types to store network addresses. They are as follows: inet: This data type can be used to store an IPV4 or IPV6 address along with its subnet. The format in which data is to be inserted is Address/y, where y is the number of bits in the netmask. cidr: This data type can also be used to store networks and network addresses. Once we specify the subnet mask for a cidr data type, PostgreSQL will throw an error if we set bits beyond the mask, as shown in the following example: CREATE TABLE nettb (id serial, intclmn inet, cidrclmn cidr); CREATE TABLE INSERT INTO nettb (intclmn , cidrclmn) VALUES ('192.168.64.2/32', '192.168.64.2/32'); INSERT 0 1 INSERT INTO nettb (intclmn , cidrclmn) VALUES ('192.168.64.2/24', '192.168.64.2/24'); ERROR: invalid cidr value: "192.168.64.2/24" LINE 1: ...b (intclmn , cidrclmn) VALUES ('192.168.64.2/24', '192.168.6...                                                              ^ DETAIL: Value has bits set to right of mask. INSERT INTO nettb (intclmn , cidrclmn) VALUES ('192.168.64.2/24', '192.168.64.0/24'); INSERT 0 1 SELECT * FROM nettb; id |     intclmn     |   cidrclmn     ----+-----------------+----------------- 1 | 192.168.64.2   | 192.168.64.2/32 2 | 192.168.64.2/24 | 192.168.64.0/24 Let's also look at a couple of useful operators available within network address types. Does an IP fall in a subnet? This can be figured out using <<=, as shown here: SELECT id,intclmn FROM nettb ; id |   intclmn   ----+-------------- 1 | 192.168.64.2 3 | 192.168.12.2 4 | 192.168.13.2 5 | 192.168.12.4   SELECT id,intclmn FROM nettb where intclmn <<= inet'192.168.12.2/24'; id |   intclmn   3 | 192.168.12.2 5 | 192.168.12.4   SELECT id,intclmn FROM nettb where intclmn <<= inet'192.168.12.2/32'; id |   intclmn   3 | 192.168.12.2 The operator used in the preceding command checks whether the column value is contained within or equal to the value we provided. Similarly, we have the equality operator, that is, greater than or equal to, bitwise AND, bitwise OR, and other standard operators. The macaddr data type can be used to store Mac addresses in different formats. hstore for key-value pairs A key-value store available in PostgreSQL is hstore. Many applications have requirements that make developers look for a schema-less data store. They end up turning to one of the NoSQL databases (Cassandra) or the simple and more prevalent stores such as Redis or Riak. While it makes sense to opt for one of these if the objective is to achieve horizontal scalability, it does make the system a bit complex because we now have more moving parts. After all, most applications do need a relational database to take care of all the important transactions along with the ability to write SQL to fetch data with different projections. If a part of the application needs to have a key-value store (and horizontal scalability is not the prime objective), the hstore data type in PostgreSQL should serve the purpose. It may not be necessary to make the system more complex by using different technologies that will also add to the maintenance overhead. Sometimes, what we want is not an entirely schema-less database, but some flexibility where we are certain about most of our entities and their attributes but are unsure about a few. For example, a person is sure to have a few key attributes such as first name, date of birth, and a couple of other attributes (irrespective of his nationality). However, there could be other attributes that undergo change. A U.S. citizen is likely to have a Social Security Number (SSN); someone from Canada has a Social Insurance Number (SIN). Some countries may provide more than one identifier. There can be more attributes with a similar pattern. There is usually a master attribute table (which links the IDs to attribute names) and a master table for the entities. Writing queries against tables designed on an EAV approach can get tricky. Using hstore may be an easier way of accomplishing the same. Let's see how we can do this using hstore with a simple example. The hstore key-value store is an extension and has to be installed using CREATE EXTENSION hstore. We will model a customer table with first_name and an hstore column to hold all the dynamic attributes: CREATE TABLE customer(id serial, first_name varchar(50), dynamic_attributes hstore); INSERT INTO customer (first_name ,dynamic_attributes) VALUES ('Michael','ssn=>"123-465-798" '), ('Smith','ssn=>"129-465-798" '), ('James','ssn=>"No data" '), ('Ram','uuid=>"1234567891" , npr=>"XYZ5678", ratnum=>"Somanyidentifiers" '); Now, let's try retrieving all customers with their SSN, as shown here: SELECT first_name, dynamic_attributes FROM customer        WHERE dynamic_attributes ? 'ssn'; first_name | dynamic_attributes Michael   | "ssn"=>"123-465-798" Smith     | "ssn"=>"129-465-798" James     | "ssn"=>"No data" Also, those with a specific SSN: SELECT first_name,dynamic_attributes FROM customer        WHERE dynamic_attributes -> 'ssn'= '123-465-798'; first_name | dynamic_attributes - Michael   | "ssn"=>"123-465-798" If we want to get records that do not contain a specific SSN, just use the following command: WHERE NOT dynamic_attributes -> 'ssn'= '123-465-798' Also, replacing it with WHERE NOT dynamic_attributes ? 'ssn'; gives us the following command: first_name |                          dynamic_attributes         ------------+----------------------------------------------------- Ram       | "npr"=>"XYZ5678", "uuid"=>"1234567891", "ratnum"=>"Somanyidentifiers" As is the case with all data types in PostgreSQL, there are a number of functions and operators available to fetch data selectively, update data, and so on. We must always use the appropriate data types. This is not just for the sake of doing it right, but because of the number of operators and functions available with a focus on each data type; hstore stores only text. We can use it to store numeric values, but these values will be stored as text. We can index the hstore columns to improve performance. The type of index to be used depends on the operators we will be using frequently. json/jsonb JavaScript Object Notation (JSON) is an open standard format used to transmit data in a human-readable format. It's a language-independent data format and is considered an alternative to XML. It's really lightweight compared to XML and has been steadily gaining popularity in the last few years. PostgreSQL added the JSON data type in Version 9.2 with a limited set of functions and operators. Quite a few new functions and operators were added in Version 9.3. Version 9.4 adds one more data type: jsonb.json, which is very similar to JSONB. The jsonb data type stores data in binary format. It also removes white spaces (which are insignificant) and avoids duplicate object keys. As a result of these differences, JSONB has an overhead when data goes in, while JSON has extra processing overhead when data is retrieved (consider how often each data point will be written and read). The number of operators available with each of these data types is also slightly different. As it's possible to cast one data type to the other, which one should we use depends on the use case. If the data will be stored as it is and retrieved without any operations, JSON should suffice. However, if we plan to use operators extensively and want indexing support, JSONB is a better choice. Also, if we want to preserve whitespace, key ordering, and duplicate keys, JSON is the right choice. Now, let's look at an example. Assume that we are doing a proof of concept project for a library management system. There are a number of categories of items (ranging from books to DVDs). We wouldn't have information about all the categories of items and their attributes at the piloting stage. For the pilot stage, we could use a table design with the JSON data type to hold various items and their attributes: CREATE TABLE items (    item_id serial,    details json ); Now, we will add records. All DVDs go into one record, books go into another, and so on: INSERT INTO items (details) VALUES ('{                  "DVDs" :[                         {"Name":"The Making of Thunderstorms", "Types":"Educational",                          "Age-group":"5-10","Produced By":"National Geographic"                          },                          {"Name":"My nightmares", "Types":"Movies", "Categories":"Horror",                          "Certificate":"A", "Director":"Dracula","Actors":                                [{"Name":"Meena"},{"Name":"Lucy"},{"Name":"Van Helsing"}]                          },                          {"Name":"My Cousin Vinny", "Types":"Movies", "Categories":"Suspense",                          "Certificate":"A", "Director": "Jonathan Lynn","Actors":                          [{"Name":"Joe "},{"Name":"Marissa"}] }] }' ); A better approach would be to have one record for each item. Now, let's take a look at a few JSON functions: SELECT   details->>'DVDs' dvds, pg_typeof(details->>'DVDs') datatype      FROM items; SELECT   details->'DVDs' dvds ,pg_typeof(details->'DVDs') datatype      FROM items; Note the difference between ->> and -> in the following screenshot. We are using the pg_typeof function to clearly see the data type returned by the functions. Both return the JSON object field. The first function returns text and the second function returns JSON: Now, let's try something a bit more complex: retrieve all movies in DVDs in which Meena acted with the following SQL statement: WITH tmp (dvds) AS (SELECT json_array_elements(details->'DVDs') det FROM items) SELECT * FROM tmp , json_array_elements(tmp.dvds#>'{Actors}') as a WHERE    a->>'Name'='Meena'; We get the record as shown here: We used one more function and a couple of operators. The json_array_elements expands a JSON array to a set of JSON elements. So, we first extracted the array for DVDs. We also created a temporary table, which ceases to exist as soon as the query is over, using the WITH clause. In the next part, we extracted the elements of the array actors from DVDs. Then, we checked whether the Name element is equal to Meena. XML PostgreSQL added the xml data type in Version 8.3. Extensible Markup Language (XML) has a set of rules to encode documents in a format that is both human-readable and machine-readable. This data type is best used to store documents. XML became the standard way of data exchanging information across systems. XML can be used to represent complex data structures such as hierarchical data. However, XML is heavy and verbose; it takes more bytes per data point compared to the JSON format. As a result, JSON is referred to as fat-free XML. XML structure can be verified against XML Schema Definition Documents (XSD). In short, XML is heavy and more sophisticated, whereas JSON is lightweight and faster to process. We need to configure PostgreSQL with libxml support (./configure --with-libxml) and then restart the cluster for XML features to work. There is no need to reinitialize the database cluster. Inserting and verifying XML data Now, let's take a look at what we can do with the xml data type in PostgreSQL: CREATE TABLE tbl_xml(id serial, docmnt xml); INSERT INTO tbl_xml(docmnt ) VALUES ('Not xml'); INSERT INTO tbl_xml (docmnt)        SELECT query_to_xml( 'SELECT now()',true,false,'') ; SELECT xml_is_well_formed_document(docmnt::text), docmnt        FROM tbl_xml; Then, take a look at the following screenshot: First, we created a table with a column to store the XML data. Then, we inserted a record, which is not in the XML format, into the table. Next, we used the query_to_xml function to get the output of a query in the XML format. We inserted this into the table. Then, we used a function to check whether the data in the table is well-formed XML. Generating XML files for table definitions and data We can use the table_to_xml function if we want to dump the data from a table in the XML format. Append and_xmlschema so that the function becomes table_to_xml_and_xmlschema, which will also generate the schema definition before dumping the content. If we want to generate just the definitions, we can use table_to_xmlschema. PostgreSQL also provides the xpath function to extract data as follows: SELECT xpath('/table/row/now/text()',docmnt) FROM tbl_xml        WHERE id = 2;                xpath               ------------------------------------ {2014-07-29T16:55:00.781533+05:30} Using properly designed tables with separate columns to capture each attribute is always the best approach from a performance standpoint and update/write-options perspective. Data types such as json/xml are best used to temporarily store data when we need to provide feeds/extracts/views to other systems or when we get data from external systems. They can also be used to store documents. The maximum size for a field is 1 GB. We must consider this when we use the database to store text/document data. pgbadger Now, we will look at a must-have tool if we have just started with PostgreSQL and want to analyze the events taking place in the database. For those coming from an Oracle background, this tool provides reports similar to AWR reports, although the information is more query-centric. It does not include data regarding host configuration, wait statistics, and so on. Analyzing the activities in a live cluster provides a lot of insight. It tells us about load, bottlenecks, which queries get executed frequently (we can focus more on them for optimization). It even tells us if the parameters are set right, although a bit indirectly. For example, if we see that there are many temp files getting created while a specific query is getting executed, we know that we either have a buffer issue or have not written the query right. For pgbadger to effectively scan the log file and produce useful reports, we should get our logging configuration right as follows: log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d.log' log_min_duration_statement = 0 log_connections = on log_disconnections = on log_duration = on log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d ' log_lock_waits = on track_activity_query_size = 2048 It might be necessary to restart the cluster for some of these changes to take effect. We will also ensure that there is some load on the database using pgbench. It's a utility that ships with PostgreSQL and can be used to benchmark PostgreSQL on our servers. We can initialize the tables required for pgbench by executing the following command at shell prompt: pgbench -i pgp This creates a few tables on the pgp database. We can log in to psql (database pgp) and check: \dt              List of relations Schema |       Name      | Type | Owner   --------+------------------+-------+---------- public | pgbench_accounts | table | postgres public | pgbench_branches | table | postgres public | pgbench_history | table | postgres    public | pgbench_tellers | table | postgres Now, we can run pgbench to generate load on the database with the following command: pgbench -c 5 -T10 pgp The T option passes the duration for which pgbench should continue execution in seconds, c passes the number of clients, and pgp is the database. At shell prompt, execute: wget https://github.com/dalibo/pgbadger/archive/master.zip Once the file is downloaded, unzip the file using the following command: unzip master.zip Use cd to the directory pgbadger-master as follows: cd pgbadger-master Execute the following command: ./pgbadger /pgdata/9.3/pg_log/postgresql-2014-07-31.log –o myoutput.html Replace the log file name in the command with the actual name. It will generate a myoutput.html file. The HTML file generated will have a wealth of information about what happened in the cluster with great charts/tables. In fact, it takes quite a bit of time to go through the report. Here is a sample chart that provides the distribution of queries based on execution time: The following screenshot gives an idea about the number of performance metrics provided by the report: If our objective is to troubleshoot performance bottlenecks, the slowest individual queries and most frequent queries under the top drop-down list is the right place to start. Once the queries are identified, locks, temporary file generation, and so on can be studied to identify the root cause. Of course, EXPLAIN is the best option when we want to refine individual queries. If the objective is to understand how busy the cluster is, the Overview section and Sessions are the right places to explore. The logging configuration used may create huge log files in systems with a lot of activity. Tweak the parameters appropriately to ensure that this does not happen. With this, we covered most of the interesting data types, an interesting extension and a must-use tool from PostgreSQL ecosystem. Now, let's cover a few interesting features in PostgreSQL Version 9.4. Features over time Applying filters in Versions 8.0, 9.0, and 9.4 gives us a good idea about how quickly features are getting added to the database. Interesting features in 9.4 Each version of PostgreSQL adds many features grouped into different categories (such as performance, backend, data types, and so on). We will look at a few features that are more likely to be of interest (because they help us improve performance or they make maintenance and configuration easy). Keeping the buffer ready As we saw earlier, reads from disk have a significant overhead compared to those from memory. There are quite a few occasions when disk reads are unavoidable. Let's see a few examples. In a data warehouse, the Extract, Transform, Load (ETL) process, which may happen once a day usually, involves a lot of raw data getting processed in memory before being loaded into the final tables. This data is mostly transactional data. The master data, which does not get processed on a regular basis, may be evicted from memory as a result of this churn. Reports typically depend a lot on master data. When users refresh their reports after ETL, it's highly likely that the master data will be read from disk, resulting in a drop in the response time. If we could ensure that the master data as well as the recently processed data is in the buffer, it can really improve user experience. In a transactional system like an airline reservation system, a change to the fare rule may result in most of the fares being recalculated. This is a situation similar to the one described previously, ensuring that the fares and availability data for the most frequently searched routes in the buffer can provide a better user experience. This applies to an e-commerce site selling products also. If the product/price/inventory data is always available in memory, it can be retrieved very fast. You must use PostgreSQL 9.4 for trying out the code in the following sections. So, how can we ensure that the data is available in the buffer? A pg_prewarm module has been added as an extension to provide this functionality. The basic syntax is very simple: SELECT pg_prewarm('tablename');. This command will populate the buffers with data from the table. It's also possible to mention the blocks that should be loaded into the buffer from the table. We will install the extension in a database, create a table, and populate some data. Then, we will stop the server, drop buffers (OS), and restart the server. We will see how much time a SELECT count(*) takes. We will repeat the exercise, but we will use pg_prewarm before executing SELECT count(*) at psql: CREATE EXTENSION pg_prewarm; CREATE TABLE myt(id SERIAL, name VARCHAR(40)); INSERT INTO myt(name) SELECT concat(generate_series(1,10000),'name'); Now, stop the server using pg_ctl at the shell prompt: pg_ctl stop -m immediate Clean OS buffers using the following command at the shell prompt (will need to use sudo to do this): echo 1 > /proc/sys/vm/drop_caches The command may vary depending on the OS. Restart the cluster using pg_ctl start. Then, execute the following command: SELECT COUNT(*) FROM myt; Time: 333.115 ms We should repeat the steps of shutting down the server, dropping the cache, and starting PostgreSQL. Then, execute SELECT pg_prewarm('myt'); before SELECT count(*). The response time goes down significantly. Executing pg_prewarm does take some time, which is close to the time taken to execute the SELECT count(*) against a cold cache. However, the objective is to ensure that the user does not experience a delay. SELECT COUNT(*) FROM myt; count ------- 10000 (1 row) Time: 7.002 ms Better recoverability A new parameter called recovery_min_apply_delay has been added in 9.4. This will go to the recovery.conf file of the slave server. With this, we can control the replay of transactions on the slave server. We can set this to approximately 5 minutes and then the standby will replay the transaction from the master when the standby system time is 5 minutes past the time of commit at the master. This provides a bit more flexibility when it comes to recovering from mistakes. When we keep the value at 1 hour, the changes at the master will be replayed at the slave after one hour. If we realize that something went wrong on the master server, we have about 1 hour to stop the transaction replay so that the action that caused the issue (for example, accidental dropping of a table) doesn't get replayed at the slave. Easy-to-change parameters An ALTER SYSTEM command has been introduced so that we don't have to edit postgresql.conf to change parameters. The entry will go to a file named postgresql.auto.conf. We can execute ALTER SYSTEM SET work_mem='12MB'; and then check the file at psql: \! more postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by ALTER SYSTEM command. work_mem = '12MB' We must execute SELECT pg_reload_conf(); to ensure that the changes are propagated. Logical decoding and consumption of changes Version 9.4 introduces physical and logical replication slots. We will look at logical slots as they let us track changes and filter specific transactions. This lets us pick and choose from the transactions that have been committed. We can grab some of the changes, decode, and possibly replay on a remote server. We do not have to have an all-or-nothing replication. As of now, we will have to do a lot of work to decode/move the changes. Two parameter changes are necessary to set this up. These are as follows: The max_replication_slots parameter (set to at least 1) and wal_level (set to logical). Then, we can connect to a database and create a slot as follows: SELECT * FROM pg_create_logical_replication_slot('myslot','test_decoding'); The first parameter is the name we give to our slot and the second parameter is the plugin to be used. Test_decoding is the sample plugin available, which converts WAL entries into text representations as follows: INSERT INTO myt(id) values (4); INSERT INTO myt(name) values ('abc'); Now, we will try retrieving the entries: SELECT * FROM pg_logical_slot_peek_changes('myslot',NULL,NULL); Then, check the following screenshot: This function lets us take a look at the changes without consuming them so that the changes can be accessed again: SELECT * FROM pg_logical_slot_get_changes('myslot',NULL,NULL); This is shown in the following screenshot: This function is similar to the peek function, but the changes are no longer available to be fetched again as they get consumed. Summary In this article, we covered a few data types that data architects will find interesting. We also covered what is probably the best utility available to parse the PostgreSQL log file to produce excellent reports. We also looked at some of the interesting features in PostgreSQL version 9.4, which will be of interest to data architects. Resources for Article: Further resources on this subject: PostgreSQL as an Extensible RDBMS [article] Getting Started with PostgreSQL [article] PostgreSQL Cookbook - High Availability and Replication [article]
Read more
  • 0
  • 0
  • 3012

article-image-getting-started-intel-galileo
Packt
30 Mar 2015
12 min read
Save for later

Getting Started with Intel Galileo

Packt
30 Mar 2015
12 min read
In this article by Onur Dundar, author of the book Home Automation with Intel Galileo, we will see how to develop home automation examples using the Intel Galileo development board along with the existing home automation sensors and devices. In the book, a good review of Intel Galileo will be provided, which will teach you to develop native C/C++ applications for Intel Galileo. (For more resources related to this topic, see here.) After a good introduction to Intel Galileo, we will review home automation's history, concepts, technology, and current trends. When we have an understanding of home automation and the supporting technologies, we will develop some examples on two main concepts of home automation: energy management and security. We will build some examples under energy management using electrical switches, light bulbs and switches, as well as temperature sensors. For security, we will use motion, water leak sensors, and a camera to create some examples. For all the examples, we will develop simple applications with C and C++. Finally, when we are done building good and working examples, we will work on supporting software and technologies to create more user friendly home automation software. In this article, we will take a look at the Intel Galileo development board, which will be the device that we will use to build all our applications; also, we will configure our host PC environment for software development. The following are the prerequisites for this article: A Linux PC for development purposes. All our work has been done on an Ubuntu 12.04 host computer, for this article and others as well. (If you use newer versions of Ubuntu, you might encounter problems with some things in this article.) An Intel Galileo (Gen 2) development board with its power adapter. A USB-to-TTL serial UART converter cable; the suggested cable is TTL-232R-3V3 to connect to the Intel Galileo Gen 2 board and your host system. You can see an example of a USB-to-TTL serial UART cable at http://www.amazon.com/GearMo%C2%AE-3-3v-Header-like-TTL-232R-3V3/dp/B004LBXO2A. If you are going to use Intel Galileo Gen 1, you will need a 3.5 mm jack-to-UART cable. You can see the mentioned cable at http://www.amazon.com/Intel-Galileo-Gen-Serial-cable/dp/B00O170JKY/. An Ethernet cable connected to your modem or switch in order to connect Intel Galileo to the local network of your workplace. A microSD card. Intel Galileo supports microSD cards up to 32 GB storage. Introducing Intel Galileo The Intel Galileo board is the first in a line of Arduino-certified development boards based on Intel x86 architecture. It is designed to be hardware and software pin-compatible with Arduino shields designed for the UNOR3. Arduino is an open source physical computing platform based on a simple microcontroller board, and it is a development environment for writing software for the board. Arduino can be used to develop interactive objects, by taking inputs from a variety of switches or sensors and controlling a variety of lights, motors, and other physical outputs. The Intel Galileo board is based on the Intel Quark X1000 SoC, a 32-bit Intel Pentium processor-class system on a chip (SoC). In addition to Arduino compatible I/O pins, Intel Galileo inherited mini PCI Express slots, a 10/100 Mbps Ethernet RJ45 port, USB 2.0 host, and client I/O ports from the PC world. The Intel Galileo Gen 1 USB host is a micro USB slot. In order to use a generation 1 USB host with USB 2.0 cables, you will need an OTG (On-the-go) cable. You can see an example cable at http://www.amazon.com/Cable-Matters-2-Pack-Micro-USB-Adapter/dp/B00GM0OZ4O. Another good feature of the Intel Galileo board is that it has open source hardware designed together with its software. Hardware design schematics and the bill of materials (BOM) are distributed on the Intel website. Intel Galileo runs on a custom embedded Linux operating system, and its firmware, bootloader, as well as kernel source code can be downloaded from https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=23171. Another helpful URL to identify, locate, and ask questions about the latest changes in the software and hardware is the open source community at https://communities.intel.com/community/makers. Intel delivered two versions of the Intel Galileo development board called Gen 1 and Gen 2. At the moment, only Gen 2 versions are available. There are some hardware changes in Gen 2, as compared to Gen 1. You can see both versions in the following image: The first board (on the left-hand side) is the Intel Galileo Gen 1 version and the second one (on the right-hand side) is Intel Galileo Gen 2. Using Intel Galileo for home automation As mentioned in the previous section, Intel Galileo supports various sets of I/O peripherals. Arduino sensor shields and USB and mini PCI-E devices can be used to develop and create applications. Intel Galileo can be expanded with the help of I/O peripherals, so we can manage the sensors needed to automate our home. When we take a look at the existing home automation modules in the market, we can see that preconfigured hubs or gateways manage these modules to automate homes. A hub or a gateway is programmed to send and receive data to/from home automation devices. Similarly, with the help of a Linux operating system running on Intel Galileo and the support of multiple I/O ports on the board, we will be able to manage home automation devices. We will implement new applications or will port existing Linux applications to connect home automation devices. Connecting to the devices will enable us to collect data as well as receive and send commands to these devices. Being able to send and receive commands to and from these devices will make Intel Galileo a gateway or a hub for home automation. It is also possible to develop simple home automation devices with the help of the existing sensors. Pinout helps us to connect sensors on the board and read/write data to sensors and come up with a device. Finally, the power of open source and Linux on Intel Galileo will enable you to reuse the developed libraries for your projects. It can also be used to run existing open source projects on technologies such as Node.js and Python on the board together with our C application. This will help you to add more features and extend the board's capability, for example, serving a web user interface easily from Intel Galileo with Node.js. Intel Galileo – hardware specifications The Intel Galileo board is an open source hardware design. The schematics, Cadence Allegro board files, and BOM can be downloaded from the Intel Galileo web page. In this section, we will just take a look at some key hardware features for feature references to understand the hardware capability of Intel Galileo in order to make better decisions on software design. Intel Galileo is an embedded system with the required RAM and flash storages included on the board to boot it and run without any additional hardware. The following table shows the features of Intel Galileo: Processor features 1 Core 32-bit Intel Pentium processor-compatible ISA Intel Quark SoC X1000 400 MHz 16 KB L1 Cache 512 KB SRAM Integrated real-time clock (RTC) Storage 8 MB NOR Flash for firmware and bootloader 256 MB DDR3; 800 MT/s SD card, up to 32 GB 8 KB EEPROM Power 7 V to 15 V Power over Ethernet (PoE) requires you to install the PoE module Ports and connectors USB 2.0 host (standard type A), client (micro USB type B) RJ45 Ethernet 10-pin JTAG for debugging 6-pin UART 6-pin ICSP 1 mini-PCI Express slot 1 SDIO Arduino compatible headers 20 digital I/O pins 6 analog inputs 6 PWMs with 12-bit resolution 1 SPI master 2 UARTs (one shared with the console UART) 1 I2C master Intel Galileo – software specifications Intel delivers prebuilt images and binaries along with its board support package (BSP) to download the source code and build all related software with your development system. The running operating system on Intel Galileo is Linux; sometimes, it is called Yocto Linux because of the Linux filesystem, cross-compiled toolchain, and kernel images created by the Yocto Project's build mechanism. The Yocto Project is an open source collaboration project that provides templates, tools, and methods to help you create custom Linux-based systems for embedded products, regardless of the hardware architecture. The following diagram shows the layers of the Intel Galileo development board: Intel Galileo is an embedded Linux product; this means you need to compile your software on your development machine with the help of a cross-compiled toolchain or software development kit (SDK). A cross-compiled toolchain/SDK can be created using the Yocto project; we will go over the instructions in the following sections. The toolchain includes the necessary compiler and linker for Intel Galileo to compile and build C/C++ applications for the Intel Galileo board. The binary created on your host with the Intel Galileo SDK will not work on the host machine since it is created for a different architecture. With the help of the C/C++ APIs and libraries provided with the Intel Galileo SDK, you can build any C/C++ native application for Intel Galileo as well as port any existing native application (without a graphical user interface) to run on Intel Galileo. Intel Galileo doesn't have a graphical processor unit. You can still use OpenCV-like libraries, but the performance of matrix operations is so poor on CPU compared to systems with GPU that it is not wise to perform complex image processing on Intel Galileo. Connecting and booting Intel Galileo We can now proceed to power up Intel Galileo and connect it to its terminal. Before going forward with the board connection, you need to install a modem control program to your host system in order to connect Intel Galileo from its UART interface with minicom. Minicom is a text-based modem control and terminal emulation program for Unix-like operating systems. If you are not comfortable with text-based applications, you can use graphical serial terminals such as CuteCom or GtkTerm. To start with Intel Galileo, perform the following steps: Install minicom: $ sudo apt-get install minicom Attach the USB of your 6-pin TTL cable and start minicom for the first time with the –s option: $ sudo minicom –s Before going into the setup details, check the device is connected to your host. In our case, the serial device is /dev/ttyUSB0 on our host system. You can check it from your host's device messages (dmesg) to see the connected USB. When you start minicom with the –s option, it will prompt you. From minicom's Configuration menu, select Serial port setup to set the values, as follows: After setting up the serial device, select Exit to go to the terminal. This will prompt you with the booting sequence and launch the Linux console when the Intel Galileo serial device is connected and powered up. Next, complete connections on Intel Galileo. Connect the TTL-232R cable to your Intel Galileo board's UART pins. UART pins are just next to the Ethernet port. Make sure that you have connected the cables correctly. The black-colored cable on TTL is the ground connection. It is written on TTL pins which one is ground on Intel Galileo. We are ready to power up Intel Galileo. After you plug the power cable into the board, you will see the Intel Galileo board's boot sequence on the terminal. When the booting process is completed, it will prompt you to log in; log in with the root user, where no password is needed. The final prompt will be as follows; we are in the Intel Galileo Linux console, where you can just use basic Linux commands that already exist on the board to discover the Intel Galileo filesystem: Poky 9.0.2 (Yocto Project 1.4 Reference Distro) 1.4.2   clanton clanton login: root root@clanton:~# Your board will now look like the following image: Connecting to Intel Galileo via Telnet If you have connected Intel Galileo to a local network with an Ethernet cable, you can use Telnet to connect it without using a serial connection, after performing some simple steps: Run the following commands on the Intel Galileo terminal: root@clanton:~# ifup eth0 root@clanton:~# ifconfig root@clanton:~# telnetd The ifup command brings the Ethernet interface up, and the second command starts the Telnet daemon. You can check the assigned IP address with the ifconfig command. From your host system, run the following command with your Intel Galileo board's IP address to start a Telnet session with Intel Galileo: $ telnet 192.168.2.168 Summary In this article, we learned how to use the Intel Galileo development board, its software, and system development environment. It takes some time to get used to all the tools if you are not used to them. A little practice with Eclipse is very helpful to build applications and make remote connections or to write simple applications on the host console with a terminal and build them. Let's go through all the points we have covered in this article. First, we read some general information about Intel Galileo and why we chose Intel Galileo, with some good reasons being Linux and the existing I/O ports on the board. Then, we saw some more details about Intel Galileo's hardware and software specifications and understood how to work with them. I believe understanding the internal working of Intel Galileo in building a Linux image and a kernel is a good practice, leading us to customize and run more tools on Intel Galileo. Finally, we learned how to develop applications for Intel Galileo. First, we built an SDK and set up the development environment. There were more instructions about how to deploy the applications on Intel Galileo over a local network as well. Then, we finished up by configuring the Eclipse IDE to quicken the development process for future development. In the next article, we will learn about home automation concepts and technologies. Resources for Article: Further resources on this subject: Hardware configuration [article] Our First Project – A Basic Thermometer [article] Pulse width modulator [article]
Read more
  • 0
  • 0
  • 24738
article-image-basic-concepts-machine-learning-and-logistic-regression-example-mahout
Packt
30 Mar 2015
33 min read
Save for later

Basic Concepts of Machine Learning and Logistic Regression Example in Mahout

Packt
30 Mar 2015
33 min read
In this article by Chandramani Tiwary, author of the book, Learning Apache Mahout, we will discuss some core concepts of machine learning and discuss the steps of building a logistic regression classifier in Mahout. (For more resources related to this topic, see here.) The purpose of this article is to understand the core concepts of machine learning. We will focus on understanding the steps involved in, resolving different types of problems and application areas in machine learning. In particular we will cover the following topics: Supervised learning Unsupervised learning The recommender system Model efficacy A wide range of software applications today try to replace or augment human judgment. Artificial Intelligence is a branch of computer science that has long been trying to replicate human intelligence. A subset of AI, referred to as machine learning, tries to build intelligent systems by using the data. For example, a machine learning system can learn to classify different species of flowers or group-related news items together to form categories such as news, sports, politics, and so on, and for each of these tasks, the system will learn using data. For each of the tasks, the corresponding algorithm would look at the data and try to learn from it. Supervised learning Supervised learning deals with training algorithms with labeled data, inputs for which the outcome or target variables are known, and then predicting the outcome/target with the trained model for unseen future data. For example, historical e-mail data will have individual e-mails marked as ham or spam; this data is then used for training a model that can predict future e-mails as ham or spam. Supervised learning problems can be broadly divided into two major areas, classification and regression. Classification deals with predicting categorical variables or classes; for example, whether an e-mail is ham or spam or whether a customer is going to renew a subscription or not, for example a postpaid telecom subscription. This target variable is discrete, and has a predefined set of values. Regression deals with a target variable, which is continuous. For example, when we need to predict house prices, the target variable price is continuous and doesn't have a predefined set of values. In order to solve a given problem of supervised learning, one has to perform the following steps. Determine the objective The first major step is to define the objective of the problem. Identification of class labels, what is the acceptable prediction accuracy, how far in the future is prediction required, is insight more important or is accuracy of classification the driving factor, these are the typical objectives that need to be defined. For example, for a churn classification problem, we could define the objective as identifying customers who are most likely to churn within three months. In this case, the class label from the historical data would be whether a customer has churned or not, with insights into the reasons for the churn and a prediction of churn at least three months in advance. Decide the training data After the objective of the problem has been defined, the next step is to decide what training data should be used. The training data is directly guided by the objective of the problem to be solved. For example, in the case of an e-mail classification system, it would be historical e-mails, related metadata, and a label marking each e-mail as spam or ham. For the problem of churn analysis, different data points collected about a customer such as product usage, support case, and so on, and a target label for whether a customer has churned or is active, together form the training data. Churn Analytics is a major problem area for a lot of businesses domains such as BFSI, telecommunications, and SaaS. Churn is applicable in circumstances where there is a concept of term-bound subscription. For example, postpaid telecom customers subscribe for a monthly term and can choose to renew or cancel their subscription. A customer who cancels this subscription is called a churned customer. Create and clean the training set The next step in a machine learning project is to gather and clean the dataset. The sample dataset needs to be representative of the real-world data, though all available data should be used, if possible. For example, if we assume that 10 percent of e-mails are spam, then our sample should ideally start with 10 percent spam and 90 percent ham. Thus, a set of input rows and corresponding target labels are gathered from data sources such as warehouses, or logs, or operational database systems. If possible, it is advisable to use all the data available rather than sampling the data. Cleaning data for data quality purposes forms part of this process. For example, training data inclusion criteria should also be explored in this step. An example of this in the case of customer analytics is to decide the minimum age or type of customers to use in the training set, for example including customers aged at least six months. Feature extraction Determine and create the feature set from the training data. Features or predictor variables are representations of the training data that is used as input to a model. Feature extraction involves transforming and summarizing that data. The performance of the learned model depends strongly on its input feature set. This process is primarily called feature extraction and requires good understanding of data and is aided by domain expertise. For example, for churn analytics, we use demography information from the CRM, product adoption (phone usage in case of telecom), age of customer, and payment and subscription history as the features for the model. The number of features extracted should neither be too large nor too small; feature extraction is more art than science and, optimum feature representation can be achieved after some iterations. Typically, the dataset is constructed such that each row corresponds to one variable outcome. For example, in the churn problem, the training dataset would be constructed so that every row represents a customer. Train the models We need to try out different supervised learning algorithms. This step is called training the model and is an iterative process where you might try building different training samples and try out different combinations of features. For example, we may choose to use support vector machines or decision trees depending upon the objective of the study, the type of problem, and the available data. Machine learning algorithms can be bucketed into groups based on the ability of a user to interpret how the predictions were arrived at. If the model can be interpreted easily, then it is called a white box, for example decision tree and logistic regression, and if the model cannot be interpreted easily, they belong to the black box models, for example support vector machine (SVM). If the objective is to gain insight, a white box model such as decision tree or logistic regression can be used, and if robust prediction is the criteria, then algorithms such as neural networks or support vector machines can be used. While training a model, there are a few techniques that we should keep in mind, like bagging and boosting. Bagging Bootstrap aggregating, which is also known as bagging, is a technique where the data is taken from the original dataset S times to make S new datasets. The datasets are the same size as the original. Each dataset is built by randomly selecting an example from the original with replacement. By with replacement we mean that you can select the same example more than once. This property allows you to have values in the new dataset that are repeated, and some values from the original won't be present in the new set. Bagging helps in reducing the variance of a model and can be used to train different models using the same datasets. The final conclusion is arrived at after considering the output of each model. For example, let's assume our data is a, b, c, d, e, f, g, and h. By sampling our data five times, we can create five different samples as follows: Sample 1: a, b, c, c, e, f, g, h Sample 2: a, b, c, d, d, f, g, h Sample 3: a, b, c, c, e, f, h, h Sample 4: a, b, c, e, e, f, g, h Sample 5: a, b, b, e, e, f, g, h As we sample with replacement, we get the same examples more than once. Now we can train five different models using the five sample datasets. Now, for the prediction; as each model will provide the output, let's assume classes are yes and no, and the final outcome would be the class with maximum votes. If three models say yes and two no, then the final prediction would be class yes. Boosting Boosting is a technique similar to bagging. In boosting and bagging, you always use the same type of classifier. But in boosting, the different classifiers are trained sequentially. Each new classifier is trained based on the performance of those already trained, but gives greater weight to examples that were misclassified by the previous classifier. Boosting focuses new classifiers in the sequence on previously misclassified data. Boosting also differs from bagging in its approach of calculating the final prediction. The output is calculated from a weighted sum of all classifiers, as opposed to the method of equal weights used in bagging. The weights assigned to the classifier output in boosting are based on the performance of the classifier in the previous iteration. Validation After collecting the training set and extracting the features, you need to train the model and validate it on unseen samples. There are many approaches for creating the unseen sample called the validation set. We will be discussing a couple of them shortly. Holdout-set validation One approach to creating the validation set is to divide the feature set into train and test samples. We use the train set to train the model and test set to validate it. The actual percentage split varies from case to case but commonly it is split at 70 percent train and 30 percent test. It is also not uncommon to create three sets, train, test and validation set. Train and test set is created from data out of all considered time periods but the validation set is created from the most recent data. K-fold cross validation Another approach is to divide the data into k equal size folds or parts and then use k-1 of them for training and one for testing. The process is repeated k times so that each set is used as a validation set once and the metrics are collected over all the runs. The general standard is to use k as 10, which is called 10-fold cross-validation. Evaluation The objective of evaluation is to test the generalization of a classifier. By generalization, we mean how good the model performs on future data. Ideally, evaluation should be done on an unseen sample, separate to the validation sample or by cross-validation. There are standard metrics to evaluate a classifier against. There are a few things to consider while training a classifier that we should keep in mind. Bias-variance trade-off The first aspect to keep in mind is the trade-off between bias and variance. To understand the meaning of bias and variance, let's assume that we have several different, but equally good, training datasets for a specific supervised learning problem. We train different models using the same technique; for example, build different decision trees using the different training datasets available. Bias measures how far off in general a model's predictions are from the correct value. Bias can be measured as the average difference between a predicted output and its actual value. A learning algorithm is biased for a particular input X if, when trained on different training sets, it is incorrect when predicting the correct output for X. Variance is how greatly the predictions for a given point vary between different realizations of the model. A learning algorithm has high variance for a particular input X if it predicts different output values for X when trained on different training sets. Generally, there will be a trade-off between bias and variance. A learning algorithm with low bias must be flexible so that it can fit the data well. But if the learning algorithm is too flexible, it will fit each training dataset differently, and hence have high variance. A key aspect of many supervised learning methods is that they are able to adjust this trade-off between bias and variance. The plot on the top left is the scatter plot of the original data. The plot on the top right is a fit with high bias; the error in prediction in this case will be high. The bottom left image is a fit with high variance; the model is very flexible, and error on the training set is low but the prediction on unseen data will have a much higher degree of error as compared to the training set. The bottom right plot is an optimum fit with a good trade-off of bias and variance. The model explains the data well and will perform in a similar way for unseen data too. If the bias-variance trade-off is not optimized, it leads to problems of under-fitting and over-fitting. The plot shows a visual representation of the bias-variance trade-off. Over-fitting occurs when an estimator is too flexible and tries to fit the data too closely. High variance and low bias leads to over-fitting of data. Under-fitting occurs when a model is not flexible enough to capture the underlying trends in the observed data. Low variance and high bias leads to under-fitting of data. Function complexity and amount of training data The second aspect to consider is the amount of training data needed to properly represent the learning task. The amount of data required is proportional to the complexity of the data and learning task at hand. For example, if the features in the data have low interaction and are smaller in number, we could train a model with a small amount of data. In this case, a learning algorithm with high bias and low variance is better suited. But if the learning task at hand is complex and has a large number of features with higher degree of interaction, then a large amount of training data is required. In this case, a learning algorithm with low bias and high variance is better suited. It is difficult to actually determine the amount of data needed, but the complexity of the task provides some indications. Dimensionality of the input space A third aspect to consider is the dimensionality of the input space. By dimensionality, we mean the number of features the training set has. If the input feature set has a very high number of features, any machine learning algorithm will require a huge amount of data to build a good model. In practice, it is advisable to remove any extra dimensionality before training the model; this is likely to improve the accuracy of the learned function. Techniques like feature selection and dimensionality reduction can be used for this. Noise in data The fourth issue is noise. Noise refers to inaccuracies in data due to various issues. Noise can be present either in the predictor variables, or in the target variable. Both lead to model inaccuracies and reduce the generalization of the model. In practice, there are several approaches to alleviate noise in the data; first would be to identify and then remove the noisy training examples prior to training the supervised learning algorithm, and second would be to have an early stopping criteria to prevent over-fitting. Unsupervised learning Unsupervised learning deals with unlabeled data. The objective is to observe structure in data and find patterns. Tasks like cluster analysis, association rule mining, outlier detection, dimensionality reduction, and so on can be modeled as unsupervised learning problems. As the tasks involved in unsupervised learning vary vastly, there is no single process outline that we can follow. We will follow the process of some of the most common unsupervised learning problems. Cluster analysis Cluster analysis is a subset of unsupervised learning that aims to create groups of similar items from a set of items. Real life examples could be clustering movies according to various attributes like genre, length, ratings, and so on. Cluster analysis helps us identify interesting groups of objects that we are interested in. It could be items we encounter in day-to-day life such as movies, songs according to taste, or interests of users in terms of their demography or purchasing patterns. Let's consider a small example so you understand what we mean by interesting groups and understand the power of clustering. We will use the Iris dataset, which is a standard dataset used for academic research and it contains five variables: sepal length, sepal width, petal length, petal width, and species with 150 observations. The first plot we see shows petal length against petal width. Each color represents a different species. The second plot is the groups identified by clustering the data. Looking at the plot, we can see that the plot of petal length against petal width clearly separates the species of the Iris flower and in the process, it clusters the group's flowers of the same species together. Cluster analysis can be used to identify interesting patterns in data. The process of clustering involves these four steps. We will discuss each of them in the section ahead. Objective Feature representation Algorithm for clustering A stopping criteria Objective What do we want to cluster? This is an important question. Let's assume we have a large customer base for some kind of an e-commerce site and we want to group them together. How do we want to group them? Do we want to group our users according to their demography, such as age, location, income, and so on or are we interested in grouping them together? A clear objective is a good start, though it is not uncommon to start without an objective and see what can be done with the available data. Feature representation As with any machine learning task, feature representation is important for cluster analysis too. Creating derived features, summarizing data, and converting categorical variables to continuous variables are some of the common tasks. The feature representation needs to represent the objective of clustering. For example, if the objective is to cluster users based upon purchasing behavior, then features should be derived from purchase transaction and user demography information. If the objective is to cluster documents, then features should be extracted from the text of the document. Feature normalization To compare the feature vectors, we need to normalize them. Normalization could be across rows or across columns. In most cases, both are normalized. Row normalization The objective of normalizing rows is to make the objects to be clustered, comparable. Let's assume we are clustering organizations based upon their e-mailing behavior. Now organizations are very large and very small, but the objective is to capture the e-mailing behavior, irrespective of size of the organization. In this scenario, we need to figure out a way to normalize rows representing each organization, so that they can be compared. In this case, dividing by user count in each respective organization could give us a good feature representation. Row normalization is mostly driven by the business domain and requires domain expertise. Column normalization The range of data across columns varies across datasets. The unit could be different or the range of columns could be different, or both. There are many ways of normalizing data. Which technique to use varies from case to case and depends upon the objective. A few of them are discussed here. Rescaling The simplest method is to rescale the range of features to make the features independent of each other. The aim is scale the range in [0, 1] or [−1, 1]: Here x is the original value and x', the rescaled valued. Standardization Feature standardization allows for the values of each feature in the data to have zero-mean and unit-variance. In general, we first calculate the mean and standard deviation for each feature and then subtract the mean in each feature. Then, we divide the mean subtracted values of each feature by its standard deviation: Xs = (X – mean(X)) / standard deviation(X). A notion of similarity and dissimilarity Once we have the objective defined, it leads to the idea of similarity and dissimilarity of object or data points. Since we need to group things together based on similarity, we need a way to measure similarity. Likewise to keep dissimilar things apart, we need a notion of dissimilarity. This idea is represented in machine learning by the idea of a distance measure. Distance measure, as the name suggests, is used to measure the distance between two objects or data points. Euclidean distance measure Euclidean distance measure is the most commonly used and intuitive distance measure: Squared Euclidean distance measure The standard Euclidean distance, when squared, places progressively greater weight on objects that are farther apart as compared to the nearer objects. The equation to calculate squared Euclidean measure is shown here: Manhattan distance measure Manhattan distance measure is defined as the sum of the absolute difference of the coordinates of two points. The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|: Cosine distance measure The cosine distance measure measures the angle between two points. When this angle is small, the vectors must be pointing in the same direction, and so in some sense the points are close. The cosine of this angle is near one when the angle is small, and decreases as it gets larger. The cosine distance equation subtracts the cosine value from one in order to give a proper distance, which is 0 when close and larger otherwise. The cosine distance measure doesn't account for the length of the two vectors; all that matters is that the points are in the same direction from the origin. Also note that the cosine distance measure ranges from 0.0, if the two vectors are along the same direction, to 2.0, when the two vectors are in opposite directions: Tanimoto distance measure The Tanimoto distance measure, like the cosine distance measure, measures the angle between two points, as well as the relative distance between the points: Apart from the standard distance measure, we can also define our own distance measure. Custom distance measure can be explored when existing ones are not able to measure the similarity between items. Algorithm for clustering The type of clustering algorithm to be used is driven by the objective of the problem at hand. There are several options and the predominant ones are density-based clustering, distance-based clustering, distribution-based clustering, and hierarchical clustering. The choice of algorithm to be used depends upon the objective of the problem. A stopping criteria We need to know when to stop the clustering process. The stopping criteria could be decided in different ways: one way is when the cluster centroids don't move beyond a certain margin after multiple iterations, a second way is when the density of the clusters have stabilized, and third way could be based upon the number of iterations, for example stopping the algorithm after 100 iterations. The stopping criteria depends upon the algorithm used, the goal being to stop when we have good enough clusters. Logistic regression Logistic regression is a probabilistic classification model. It provides the probability of a particular instance belonging to a class. It is used to predict the probability of binary outcomes. Logistic regression is computationally inexpensive, is relatively easier to implement, and can be interpreted easily. Logistic regression belongs to the class of discriminative models. The other class of algorithms is generative models. Let's try to understand the differences between the two. Suppose we have some input data represented by X and a target variable Y, the learning task obviously is P(Y|X), finding the conditional probability of Y occurring given X. A generative model concerns itself with learning the joint probability of P(Y, X), whereas a discriminative model will directly learn the conditional probability of P(Y|X) from the training set. This is the actual objective of classification. A generative model first learns P(Y, X), and then gets to P(Y|X) by conditioning on X by using Bayes' theorem. In more intuitive terms, generative models first learn the distribution of the data, then they model how the data is actually generated. However, discriminative models don't try to learn the underlying data distribution; they are concerned with finding the decision boundaries for the classification. Since generative models learn the distribution, it is possible to generate synthetic samples of X, Y. This is not possible with discriminative models. Some common examples of generative and discriminative models are as follows: Generative: naïve Bayes, Latent Dirichlet allocation Discriminative: Logistic regression, SVM, Neural networks Logistic regression belongs to the family of statistical techniques called regression. For regression problems and few other optimization problems, we first define a hypothesis, then define a cost function, and optimize it using an optimization algorithm such as Gradient descent. The optimization algorithm tries to find the regression coefficient, which best fits the data. Let's assume that the target variable is Y and the predictor variable or feature is X. Any regression problem starts with defining the hypothesis function, for example, an equation of the predictor variable , defines a cost function and then tweaks the weights; in this case, and are tweaked to minimize or maximize the cost function by using an optimization algorithm. For logistic regression, the predicted target needs to fall between zero and one. We start by defining the hypothesis function for it: Here, f(z) is the sigmoid or logistic function that has a range of zero to one, x is a matrix of features, and is the vector of weights. The next step is to define the cost function, which measures the difference between predicted and actual values. The objective of the optimization algorithm here is to find . This fits the regression coefficients so that the difference between predicted and actual target values are minimized. We will discuss gradient descent as the choice for the optimization algorithm shortly. To find the local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of that function at the current point. This will give us the optimum value of vector , once we achieve the stopping criteria. The stopping criteria is when the change in the weight vectors falls below a certain threshold, although sometimes it could be set to a predefined number of iterations. Logistic regression falls into the category of white box techniques and can be interpreted. Features or variables are of two major types, categorical and continuous, defined as follows: Categorical variable: This is a variable or feature that can take on a limited, and usually fixed, number of possible values. Example, variables such as industry, zip code, and country are categorical variables. Continuous variable: This is a variable that can take on any value between its minimum value and maximum value or range. Example, variable such as age, price, and so on, are continuous variables. Mahout logistic regression command line Mahout employs a modified version of gradient descent called stochastic gradient descent. The previous optimization algorithm, gradient ascent, uses the whole dataset on each update. This was fine with 100 examples, but with billions of data points containing thousands of features, it's unnecessarily expensive in terms of computational resources. An alternative to this method is to update the weights using, only one instance at a time. This is known as stochastic gradient ascent. Stochastic gradient ascent is an example of an online learning algorithm. This is known as online learning algorithm because we can incrementally update the classifier as new data comes in, rather than all at once. The all-at-once method is known as batch processing. We will now train and test a logistic regression algorithm using Mahout. We will also discuss both command line and code examples. The first step is to get the data and explore it. Getting the data The dataset required for this article is included in the code repository that comes with this book. It is present in the learningApacheMahout/data/chapter4 directory. If you wish to download the data, the same can be downloaded from the UCI link. The UCI is a repository for many datasets for machine learning. You can check out the other datasets available for further practice via this link http://archive.ics.uci.edu/ml/datasets.html. Create a folder in your home directory with the following command: cd $HOME mkdir bank_data cd bank_data Download the data in the bank_data directory: wget http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip Unzip the file using whichever utility you like, we use unzip: unzip bank-additional.zip cd bank-additional We are interested in the file bank-additional-full.csv. Copy the file to the learningApacheMahout/data/chapter4 directory. The file is semicolon delimited and the values are enclosed by ", it also has a header line with column name. We will use sed to preprocess the data. The sed editor is a very powerful editor in Linux and the command to use it is as follows: sed -e 's/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' fileName > Output_fileName For inplace editing, the command is as follows: sed -i 's/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' Command to replace ; with , and remove " are as follows: sed -e 's/;/,/g' bank-additional-full.csv > input_bank_data.csv sed -i 's/"//g' input_bank_data.csv The dataset contains demographic and previous campaign-related data about a client and the outcome of whether or not the client did subscribed to the term deposit. We are interested in training a model, which can predict whether a client will subscribe to a term deposit, given the input data. The following table shows various input variables along with their types: Column name Description Variable type Age This represents the age of the Client Numeric Job This represents their type of the job, for example, entrepreneur, housemaid, management Categorical Marital This represents their marital status Categorical Education This represents their education level Categorical Default States whether the client has defaulted on credit Categorical Housing States whether the client has a housing loan Categorical Loan States whether the client has a personal loan Categorical contact States the contact communication type Categorical Month States the last contact month of the year Categorical day_of_week States the last contact day of the week Categorical duration States the last contact duration, in seconds Numeric campaign This represents the number of contacts Numeric Pdays This represents the number of days that passed since the last contact Numeric previous This represents the number of contacts performed before this campaign Numeric poutcome This represents the outcome of the previous marketing campaign Categorical emp.var.rate States the employment variation rate - quarterly indicator Numeric cons.price.idx States the consumer price index - monthly indicator Numeric cons.conf.idx States the consumer confidence index - monthly indicator Numeric euribor3m States the euribor three month rate - daily indicator Numeric nr.employed This represents the number of employees - quarterly indicator Numeric Model building via command line Mahout uses command line implementation of logistic regression. We will first build a model using the command line implementation. Logistic regression does not have a map to reduce implementation, but as it uses stochastic gradient descent, it is pretty fast, even for large datasets. The Mahout Java class is OnlineLogisticRegression in the org.mahout.classifier.sgd package. Splitting the dataset To split a dataset, we can use the Mahout split command. Let's look at the split command arguments as follows: mahout split ––help We need to remove the first line before running the split command, as the file contains the header file and the split command doesn't make any special allowances for header lines. It will land in any line in the split file. We first remove the header line from the input_bank_data.csv file. sed -i '1d' input_bank_data.csv mkdir input_bank cp input_bank_data.csv input_bank Logistic regression in Mahout is implemented for single-machine execution. We set the variable MAHOUT_LOCAL to instruct Mahout to execute in the local mode. export MAHOUT_LOCAL=TRUE   mahout split --input input_bank --trainingOutput train_data --testOutput test_data -xm sequential --randomSelectionPct 30 This will create different datasets, with the split based on number passed to the argument --randomSelectionPct. The split command can run in both Hadoop and the local file system. For current execution, it runs in the local mode on the local file system and splits the data into two sets, 70 percent as train in the train_data directory and 30 percent as test in test_data directory. Next, we restore the header line to the train and test files as follows: sed -i '1s/^/age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,yn/' train_data/input_bank_data.csv sed -i '1s/^/age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,yn/' test_data/input_bank_data.csv Train the model command line option Let's have a look at some important and commonly used parameters and their descriptions: mahout trainlogistic ––help   --help print this list --quiet be extra quiet --input "input directory from where to get the training data" --output "output directory to store the model" --target "the name of the target variable" --categories "the number of target categories to be considered" --predictors "a list of predictor variables" --types "a list of predictor variables types (numeric, word or text)" --passes "the number of times to pass over the input data" --lambda "the amount of coeffiecient decay to use" --rate     "learningRate the learning rate" --noBias "do not include a bias term" --features "the number of internal hashed features to use"   mahout trainlogistic --input train_data/input_bank_data.csv --output model --target y --predictors age job marital education default housing loan contact month day_of_week duration campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed --types n w w w w w w w w w n n n n w n n n n n --features 20 --passes 100 --rate 50 --categories 2 We pass the input filename and the output folder name, identify the target variable name using --target option, the predictors using the --predictors option, and the variable or predictor type using --types option. Numeric predictors are represented using 'n', and categorical variables are predicted using 'w'. Learning rate passed using --rate is used by gradient descent to determine the step size for each descent. We pass the maximum number of passes over data as 100 and categories as 2. The output is given below, which represents 'y', the target variable, as a sum of predictor variables multiplied by coefficient or weights. As we have not included the --noBias option, we see the intercept term in the equation: y ~ -990.322*Intercept Term + -131.624*age + -11.436*campaign + -990.322*cons.conf.idx + -14.006*cons.price.idx + -15.447*contact=cellular + -9.738*contact=telephone + 5.943*day_of_week=fri + -988.624*day_of_week=mon + 10.551*day_of_week=thu + 11.177*day_of_week=tue + -131.624*day_of_week=wed + -8.061*default=no + 12.301*default=unknown + -131.541*default=yes + 6210.316*duration + -17.755*education=basic.4y + 4.618*education=basic.6y + 8.780*education=basic.9y + -11.501*education=high.school + 0.492*education=illiterate + 17.412*education=professional.course + 6202.572*education=university.degree + -979.771*education=unknown + -189.978*emp.var.rate + -6.319*euribor3m + -21.495*housing=no + -14.435*housing=unknown + 6210.316*housing=yes + -190.295*job=admin. + 23.169*job=blue-collar + 6202.200*job=entrepreneur + 6202.200*job=housemaid + -3.208*job=management + -15.447*job=retired + 1.781*job=self-employed + 11.396*job=services + -6.637*job=student + 6202.572*job=technician + -9.976*job=unemployed + -4.575*job=unknown + -12.143*loan=no + -0.386*loan=unknown + -197.722*loan=yes + -12.308*marital=divorced + -9.185*marital=married + -1004.328*marital=single + 8.559*marital=unknown + -11.501*month=apr + 9.110*month=aug + -1180.300*month=dec + -189.978*month=jul + 14.316*month=jun + -124.764*month=mar + 6203.997*month=may + -0.884*month=nov + -9.761*month=oct + 12.301*month=sep + -990.322*nr.employed + -189.978*pdays + -14.323*poutcome=failure + 4.874*poutcome=nonexistent + -7.191*poutcome=success + 1.698*previous Interpreting the output The output of the trainlogistic command is an equation representing the sum of all predictor variables multiplied by their respective coefficient. The coefficients give the change in the log-odds of the outcome for one unit increase in the corresponding feature or predictor variable. Odds are represented as the ratio of probabilities, and they express the relative probabilities of occurrence or nonoccurrence of an event. If we take the base 10 logarithm of odds and multiply the results by 10, it gives us the log-odds. Let's take an example to understand it better. Let's assume that the probability of some event E occurring is 75 percent: P(E)=75%=75/100=3/4 The probability of E not happening is as follows: 1-P(A)=25%=25/100=1/4 The odds in favor of E occurring are P(E)/(1-P(E))=3:1 and odds against it would be 1:3. This shows that the event is three times more likely to occur than to not occur. Log-odds would be 10*log(3). For example, a unit increase in the age will decrease the log-odds of the client subscribing to a term deposit by 97.148 times, whereas a unit increase in cons.conf.idx will increase the log-odds by 1051.996. Here, the change is measured by keeping other variables at the same value. Testing the model After the model is trained, it's time to test the model's performance by using a validation set. Mahout has the runlogistic command for the same, the options are as follows: mahout runlogistic ––help We run the following command on the command line: mahout runlogistic --auc --confusion --input train_data/input_bank_data.csv --model model   AUC = 0.59 confusion: [[25189.0, 2613.0], [424.0, 606.0]] entropy: [[NaN, NaN], [-45.3, -7.1]] To get the scores for each instance, we use the --scores option as follows: mahout runlogistic --scores --input train_data/input_bank_data.csv --model model To test the model on the test data, we will pass on the test file created during the split process as follows: mahout runlogistic --auc --confusion --input test_data/input_bank_data.csv --model model   AUC = 0.60 confusion: [[10743.0, 1118.0], [192.0, 303.0]] entropy: [[NaN, NaN], [-45.2, -7.5]] Prediction Mahout doesn't have an out of the box command line for implementation of logistic regression for prediction of new samples. Note that the new samples for the prediction won't have the target label y, we need to predict that value. There is a way to work around this, though; we can use mahout runlogistic for generating a prediction by adding a dummy column as the y target variable and adding some random values. The runlogistic command expects the target variable to be present, hence the dummy columns are added. We can then get the predicted score using the --scores option. Summary In this article, we covered the basic machine learning concepts. We also saw the logistic regression example in Mahout. Resources for Article:   Further resources on this subject: Implementing the Naïve Bayes classifier in Mahout [article] Learning Random Forest Using Mahout [article] Understanding the HBase Ecosystem [article]
Read more
  • 0
  • 0
  • 4995

article-image-geocoding-address-based-data
Packt
30 Mar 2015
7 min read
Save for later

Geocoding Address-based Data

Packt
30 Mar 2015
7 min read
In this article by Kurt Menke, GISP, Dr. Richard Smith Jr., GISP, Dr. Luigi Pirelli, Dr. John Van Hoesen, GISP, authors of the book Mastering QGIS, we'll have a look at how to geocode address-based date using QGIS and MMQGIS. (For more resources related to this topic, see here.) Geocoding addresses has many applications, such as mapping the customer base for a store, members of an organization, public health records, or incidence of crime. Once mapped, the points can be used in many ways to generate information. For example, they can be used as inputs to generate density surfaces, linked to parcels of land, and characterized by socio-economic data. They may also be an important component of a cadastral information system. An address geocoding operation typically involves the tabular address data and a street network dataset. The street network needs to have attribute fields for address ranges on the left- and right-hand side of each road segment. You can geocode within QGIS using a plugin named MMQGIS (http://michaelminn.com/linux/mmqgis/). MMQGIS has many useful tools. For geocoding, we will use the tools found in MMQGIS | Geocode. There are two tools there: Geocode CSV with Google/ OpenStreetMap and Geocode from Street Layer as shown in the following screenshot. The first tool allows you to geocode a table of addresses using either the Google Maps API or the OpenStreetMap Nominatim web service. This tool requires an Internet connection but no local street network data as the web services provide the street network. The second tool requires a local street network dataset with address range attributes to geocode the address data: How address geocoding works The basic mechanics of address geocoding are straightforward. The street network GIS data layer has attribute columns containing the address ranges on both the even and odd side of every street segment. In the following example, you can see a piece of the attribute table for the Streets.shp sample data. The columns LEFTLOW, LEFTHIGH, RIGHTLOW, and RIGHTHIGH contain the address ranges for each street segment: In the following example we are looking at Easy Street. On the odd side of the street, the addresses range from 101 to 199. On the even side, they range from 102 to 200. If you wanted to map 150 Easy Street, QGIS would assume that the address is located halfway down the even side of that block. Similarly, 175 Easy Street would be on the odd side of the street three quarters the way down the block. Address geocoding assumes that the addresses are evenly spaced along the linear network. QGIS should place the address point very close to its actual position, but due to variability in lot sizes not every address point will be perfectly positioned. Now that you've learned the basics, let's work through an example. Here we will geocode addresses using web services. The output will be a point shapefile containing all the attribute fields found in the source Addresses.csv file. An example – geocoding using web services Here are the steps for geocoding the Addresses.csv sample data using web services. Load the Addresses.csv and the Streets.shp sample data into QGIS Desktop. Open Addresses.csv and examine the table. These are addresses of municipal facilities. Notice that the street address (for example, 150 Easy Street) is contained in a single field. There are also fields for the city, state, and country. Since both Google and OpenStreetMap are global services, it is wise to include such fields so that the services can narrow down the geography. Install and enable the MMQGIS plugin. Navigate to MMQGIS | Geocode | Geocode CSV with Google/OpenStreetMap. The Web Service Geocode dialog window will open. Select Input CSV File (UTF-8) by clicking on Browse… and locating the delimited text file on your system. Select the address fields by clicking on the drop-down menu and identifying the Address Field, City Field, State Field, and Country Field fields. MMQGIS may identify some or all of these fields by default if they are named with logical names such as Address or State. Choose the web service. Name the output shapefile by clicking on Browse…. Name Not Found Output List by clicking on Browse…. Any records that are not matched will be written to this file. This allows you to easily see and troubleshoot any unmapped records. Click on OK. The status of the geocoding operation can be seen in the lower-left corner of QGIS. The word Geocoding will be displayed, followed by the number of records that have been processed. The output will be a point shapefile and a CSV file listing that addresses were not matched. Two additional attribute columns will be added to the output address point shapefile: addrtype and addrlocat. These fields provide information on how the web geocoding service obtained the location. These may be useful for accuracy assessment. Addrtype is the Google <type> element or the OpenStreetMap class attribute. This will indicate what kind of address type this is (highway, locality, museum, neighborhood, park, place, premise, route, train_station, university etc.). Addrlocat is the Google <location_type> element or OpenStreetMap type attribute. This indicates the relationship of the coordinates to the addressed feature (approximate, geometric center, node, relation, rooftop, way interpolation, and so on). If the web service returns more than one location for an address, the first of the locations will be used as the output feature. Use of this plugin requires an active Internet connection. Google places both rate and volume restrictions on the number of addresses that can be geocoded within various time limits. You should visit the Google Geocoding API website: (http://code.google.com/apis/maps/documentation/geocoding/) for more details, and current information and Google's terms of service. Geocoding via these web services can be slow. If you don't get the desired results with one service, try the other. Geocoding operations rarely have 100% success. Street names in the street shapefile must match the street names in the CSV file exactly. Any discrepancies between the name of a street in the address table, and the street attribute table will lower the geocoding success rate. The following image shows the results of geocoding addresses via street address ranges. The addresses are shown with the street network used in the geocoding operation: Geocoding is often an iterative process. After the initial geocoding operation, you can review the Not Found CSV file. If it's empty then all the records were matched. If it has records in it, compare them with the attributes of the streets layer. This will help you determine why those records were not mapped. It may be due to inconsistencies in the spelling of street names. It may also be due to a street centerline layer that is not as current as the addresses. Once the errors have been identified they can be corrected by editing the data, or obtaining a different street centreline dataset. The geocoding operation can be re-run on those unmatched addresses. This process can be repeated until all records are matched. Use the Identify tool to inspect the mapped points, and the roads, to ensure that the operation was successful. Never take a GIS operation for granted. Check your results with a critical eye. Summary This article introduced you to the process of address geocoding using QGIS and the MMQGIS plugin. Resources for Article: Further resources on this subject: Editing attributes [article] How Vector Features are Displayed [article] QGIS Feature Selection Tools [article]
Read more
  • 0
  • 1
  • 3425
Modal Close icon
Modal Close icon