How-To Tutorials

11 Nov 2013

2 min read

Wrapping OpenCV

11 Nov 2013

(For more resources related to this topic, see here.) Architecture overview In this section we will examine and compare the architectures of OpenCV and Emgu CV. OpenCV In the hello-world project, we already knew our code had something to do with the bin folder in the Emgu library that we installed. Those files are OpenCV DLLs, which have the filename starting with opencv_. So the Emgu CV users need to have some basic knowledge about OpenCV. OpenCV is broadly structured into five main components. Four of them are described in the following section: The first one is the CV component, which includes the algorithms about computer vision and basic image processing. All the methods for basic processes are found here. ML is short for Machine Learning, which contains popular machine learning algorithms with clustering tools and statistical classifiers. HighGUI is designed to construct user-friendly interfaces to load and store media data. CXCore is the most important one. This component provides all the basic data structures and contents. The components can be seen in the following diagram: The preceding structure map does not include CvAux, which contains many areas. It can be divided into two parts: defunct areas and experimental algorithms. CvAux is not particularly well documented in the Wiki, but it covers many features. Some of them may migrate to CV in the future, others probably never will. Emgu CV Emgu CV can be seen as two layers on top of OpenCV, which are explained as follows: Layer 1 is the basic layer. It includes enumeration, structure, and function mappings. The namespaces are direct wrappers from OpenCV components. Layer 2 is an upper layer. It takes good advantage of .NET framework and mixes the classes together. It can be seen in the bridge from OpenCV to .NET. The architecture of Emgu CV can be seen in the following diagram, which includes more details: After we create our new Emgu CV project, the first thing we will do is add references. Now we can see what those DLLs are used for: Emgu.Util.dll: A collection of .NET utilities Emgu.CV.dll: Basic image-processing algorithms from OpenCV Emgu.CV.UI.dll: Useful tools for Emgu controls Emgu.CV.GPU.dll: GPU processing (Nvidia Cuda) Emgu.CV.ML.dll: Machine learning algorithms

0
1
4241

Packt

08 Nov 2013

8 min read

Building a To-do List with Ajax

Packt

08 Nov 2013

8 min read

(For more resources related to this topic, see here.) Creating and migrating our to-do list's database As you know, migrations are very helpful to control development steps. We'll use migrations in this article. To create our first migration, type the following command: php artisan migrate:make create_todos_table --table=todos --create When you run this command, Artisan will generate a migration to generate a database table named todos. Now we should edit the migration file for the necessary database table columns. When you open the folder migration in app/database/ with a file manager, you will see the migration file under it. Let's open and edit the file as follows: <?php use IlluminateDatabaseMigrationsMigration; class CreateTodosTable extends Migration { /** * Run the migrations. * * @return void */ public function up() { Schema::create('todos', function(Blueprint $table){ $table->create(); $table->increments("id"); $table->string("title", 255); $table->enum('status', array('0', '1'))->default('0'); $table->timestamps(); }); } /** * Reverse the migrations. * * @return void */ public function down() { Schema::drop("todos"); } } To build a simple TO-DO list, we need five columns: The id column will store ID numbers of to-do tasks The title column will store a to-do task's title The status column will store statuses of the tasks The created_at and updated_at columns will store the created and updated dates of tasks If you write $table->timestamps() in the migration file, Laravel's migration class automatically creates created_at and updated_at columns. As you know, to apply migrations, we should run the following command: php artisan migrate After the command is run, if you check your database, you will see that our todos table and columns have been created. Now we need to write our model. Creating a todos model To create a model, you should open the app/models/ directory with your file manager. Create a file named Todo.php under the directory and write the following code: <?php class Todo extends Eloquent { protected $table = 'todos'; } Let's examine the Todo.php file. As you see, our Todo class extends an Eloquent model, which is the ORM (Object Relational Mapper) database class of Laravel. The protected $table = 'todos'; code tells Eloquent about our model's table name. If we don't set the table variable, Eloquent accepts the plural version of the lower case model name as table name. So this isn't required technically. Now, our application needs a template file, so let's create it. Creating the template Laravel uses a template engine that is called blade for static and application template files. Laravel calls the template files from the app/views/ directory, so we need to create our first template under this directory. Create a file with the name index.blade.php. The file contains the following code: <html> <head> <title>To-do List Application</title> <link rel="stylesheet" href="assets/css/style.css">  </head> <body> <div class="container"> <section id="data_section" class="todo"> <ul class="todo-controls"> <li><img src = "/assets/img/add.png" width="14px" onClick="show_form('add_task');" /></li> </ul> <ul id="task_list" class="todo-list"> @foreach($todos as $todo) @if($todo->status) <li id="{{$todo->id}}" class="done"> <a href="#" class="toggle"></a> <span id="span_{{$todo->id}}">{ {$todo->title}}</span> <a href="#" onClick="delete_task('{{$todo->id}}');" class="icon-delete">Delete</a> <a href="#" onClick="edit_task('{{$todo->id}}', '{{$todo->title}}');" class="icon-edit">Edit</a></li> @else <li id="{{$todo->id}}"><a href="#" onClick="task_done('{{$todo->id}}');" class="toggle"></a> <span id="span_{ {$todo->id}}">{{$todo->title}}</span> <a href="#" onClick="delete_task('{ {$todo->id}}');" class= "icon-delete">Delete</a> <a href="#" onClick="edit_task('{ {$todo->id}}','{{$todo->title}}');" class="icon-edit">Edit</a></li> @endif @endforeach </ul> </section> <section id="form_section"> <form id="add_task" class="todo" style="display:none"> <input id="task_title" type="text" name="title" placeholder="Enter a task name" value=""/> <button name="submit">Add Task</button> </form> <form id="edit_task" class="todo" style="display:none"> <input id="edit_task_id" type="hidden" value="" /> <input id="edit_task_title" type="text" name="title" value="" /> <button name="submit">Edit Task</button> </form> </section> </div> <script src = "http://code.jquery.com/ jquery-latest.min.js"type="text/javascript"></script> <script src = "assets/js/todo.js" type="text/javascript"></script> </body> </html> The preceding code may be difficult to understand if you're writing a blade template for the first time, so we'll try to examine it. You see a foreach loop in the file. This statement loops our todo records. We will provide you with more knowledge about it when we are creating our controller in this article. If and else statements are used for separating finished and waiting tasks. We use if and else statements for styling the tasks. We need one more template file for appending new records to the task list on the fly. Create a file with the name ajaxData.blade.php under app/views/ folder. The file contains the following code: @foreach($todos as $todo) <li id="{{$todo->id}}"><a href="#" onClick="task_done('{{$todo- >id}}');" class="toggle"></a> <span id="span_{{$todo >id}}">{{$todo->title}}</span> <a href="#" onClick="delete_task('{{$todo->id}}');" class="icon delete">Delete</a> <a href="#" onClick="edit_task('{{$todo >id}}','{{$todo->title}}');" class="icon-edit">Edit</a></li> @endforeach Also, you see the /assets/ directory in the source path of static files. When you look at the app/views directory, there is no directory named assets. Laravel separates the system and public files. Public accessible files stay under your public folder in root. So you should create a directory under your public folder for asset files. We recommend working with these types of organized folders for developing tidy and easy-to-read code. Finally you see that we are calling jQuery from its main website. We also recommend this way for getting the latest, stable jQuery in your application. You can style your application as you wish, hence we'll not examine styling code here. We are putting our style.css files under /public/assets/css/. For performing Ajax requests, we need JavaScript coding. This code posts our add_task and edit_task forms and updates them when our tasks are completed. Let's create a JavaScript file with the name todo.js in /public/assets/js/. The files contain the following code: function task_done(id){ $.get("/done/"+id, function(data) { if(data=="OK"){ $("#"+id).addClass("done"); } }); } function delete_task(id){ $.get("/delete/"+id, function(data) { if(data=="OK"){ var target = $("#"+id); target.hide('slow', function(){ target.remove(); }); } }); } function show_form(form_id){ $("form").hide(); $('#'+form_id).show("slow"); } function edit_task(id,title){ $("#edit_task_id").val(id); $("#edit_task_title").val(title); show_form('edit_task'); } $('#add_task').submit(function(event) { /* stop form from submitting normally */ event.preventDefault(); var title = $('#task_title').val(); if(title){ //ajax post the form $.post("/add", {title: title}).done(function(data) { $('#add_task').hide("slow"); $("#task_list").append(data); }); } else{ alert("Please give a title to task"); } }); $('#edit_task').submit(function() { /* stop form from submitting normally */ event.preventDefault(); var task_id = $('#edit_task_id').val(); var title = $('#edit_task_title').val(); var current_title = $("#span_"+task_id).text(); var new_title = current_title.replace(current_title, title); if(title){ //ajax post the form $.post("/update/"+task_id, {title: title}).done(function(data) { $('#edit_task').hide("slow"); $("#span_"+task_id).text(new_title); }); } else{ alert("Please give a title to task"); } }); Let's examine the JavaScript file.

0
0
12641

Packt

08 Nov 2013

8 min read

Installing Gideros

Packt

08 Nov 2013

8 min read

(For more resources related to this topic, see here.) About Gideros Gideros is a set of software packages created and managed by a company named Gideros Mobile. It provides developers with the ability to create 2D games for multiple platforms by reusing the same code. Games created with Gideros run as native applications, thus having all the benefits of high performance and the utilization of the hardware power of a mobile device. Gideros uses Lua as its programming language, which is a lightweight scripting language with an easy learning curve and it is quite popular in the context of game development. A few of the greatest Gideros features are as follows: Its rapid prototyping and fast development time by providing a single-click on-device testing that enables you to compile and run your game from your computer to device in an instant A clean object-oriented approach that enables you to write clean and reusable code Additionally, Gideros is not limited to its provided API and can be extended to offer virtually any native platform features through its plugin system You can use all of these to create and even publish your game for free, if you don't mind a small Gideros splash screen being shown before your game starts Installing Gideros Currently, Gideros has no registration requirements for downloading its SDK, so you can easily navigate to their download page (http://giderosmobile.com/download) and download the version that is suitable for your operating system. As Gideros can be used on Linux only using the WINE emulator, it means that even for Linux you have to download the Windows version of Gideros. So, to sum it up: Download the Windows version for Windows and Linux OS Download the Mac version for OS X Gideros consists of multiple programs providing you with a basic package needed to develop your own mobile games. This software package includes the following features: Gideros Studio: It is a lightweight IDE to manage Gideros projects Gideros Player: It is a fast and lightweight desktop; iOS and Android players can run their apps with one click when testing Gideros Texture Packer: It is used to pack multiple textures in one texture for faster texture rendering Gideros Font Creator: It is used to create Bitmap fonts from different font formats for faster font rendering Gideros License Manager: It is used to license your downloaded copy of Gideros before exporting a project (required even for free accounts) An offline copy of the Gideros documentation and Reference API to get you started Creating your first project After you have downloaded and installed Gideros, you can try to create your first Gideros project. Although Gideros is IDE independent, and lot of other IDE's such as Lua Glider, Zero Brane, IntelliJ IDEA, and even Sublime can support Gideros, I would recommend that first-time users choose the provided Gideros Studio. That is what we will be using in this article. Trying out Gideros Studio You should note that I will be using the Windows version for screenshots and explanations, but Gideros Studio on other operating systems is quite similar, if not exactly the same. Therefore, it should not cause any confusion if you are using other versions of Gideros. When you open Gideros Studio, you will see a lot of different sections or what we will call panes. The largest pane will be the Start Page, which will provide you with the following options: Create New Project Access offline the Getting Started guide Access offline the Reference Manual Browse and try out Gideros Example Projects Go ahead and click on Create New Project, a New Project dialog will open. Now enter the name of your project, for example, New Project. Change the location of the project if you want to or leave it set to the default value, and click on OK when you are ready. Note that the Start Page is automatically closed and the space occupied by the Start Page is now free. This will be your coding pane, where all the code will be displayed. But first let's draw our attention to the Project pane, where you can see your chosen project name inside. In this pane, you will manage all the files used by your app. One important thing to note is that file/folder structure in Gideros Project pane is completely independent from your filesystem. This means that you will have to add files manually to the Gideros Studio Project pane. They won't show up automatically when you copy them into the project folder. And in your filesystem, files and folders may be organized completely different than those in Gideros Studio. This feature gives you the flexibility of managing multiple projects with the same code or asset base. When you, for example, want to include specific things in the iOS version of the game, which Android won't have, you can create two different projects in the same project directory, which could reuse the same files and simultaneously have their own independent, platform-specific files. So let's see how it actually works. Right-click on your project name inside the Project pane and select Add New File.... It will pop up the Add New File dialog. Like in many Lua development environments, an application should start with a main.lua file; so name your file main.lua and click on OK. You will now see that main.lua was added to your Project pane. And if you check the directory of your project in your filesystem, you will see that it also contains the main.lua file. Now double-click on main.lua inside the Project pane and it will open this file inside the code pane, where you can write a code for it. So let's try it out. Write a simple line of code: print("Hello world!") What this line will do is simply print out the provided string (Hello world!) inside the output console. Now save the project by either using the File menu or a diskette icon on the toolbar and let's run this project on a local desktop player. Using the Gideros desktop player To run our app, we first need to launch Gideros Player by clicking on a small joystick icon on the toolbar. This will open up the Gideros desktop player. The default screen of Gideros Player shows the current version of Gideros used and the IP address the player is bound to. Additionally, the desktop player provides different customizations: You can make it appear on the top of every window by navigating to View | Always on Top. You can change the zoom by navigating to View | Zoom. It is helpful when running the player in high resolutions, which might not fit the screen. You can select the orientation (portrait or landscape) of the player by navigating to Hardware | Orientation, to suit the needs of your app. You can provide the resolution you want to test your app in by navigating to Hardware | Resolution. It provides the most popular resolution templates to choose from. You can also set the frame rate of your app by navigating to Hardware | Frame Rate. Resolution selected in Gideros Player settings corresponds to the physical device you want to test your application on. All these options give you the flexibility to test your app across different device configurations from within one single desktop player. Now when the player is launched, you should see that the start and stop buttons of Gideros Studio are now enabled. And to run your project, all you need to do is click on the start button. You might need to launch Gideros Player and Gideros Studio with proper permissions and even add them to your Antivirus or Firewall's exceptions list to allow them to connect. The IP address and Gideros version of the player should disappear and you should only see a white screen there. That is because we did not actually display any graphical object as image. But what we did was printing some information to the console. So let's check the Output pane in the Gideros Studio. As you see the Output pane, there are some information messages, like the fact that main.lua was uploaded and the uploading process to the Gideros Player was finished successfully; but it also displays any text we pass to Lua print command, as in our case it was Hello world!. The Output pane is very handy for a simple debugging process by printing out the information using the print command. It also provides the error information if something is wrong with the project and it cannot be built. Now when we know what an Output pane is, let's actually display something on the player's screen. Summary In this article, you've learned a few features about Gideros Studio, such as installing Gideros on your machine, creating your first project, how to use the Gideros Player, and trying out your first project. Resources for Article: Further resources on this subject: Getting Started with PlayStation Mobile [Article] Getting Started with Marmalade [Article] Getting Started with GameSalad [Article]

0
0
2823

Packt

06 Nov 2013

9 min read

Dynamic POM

Packt

06 Nov 2013

9 min read

(For more resources related to this topic, see here.) Case study Our project meets the following requirements: It depends on org.codehaus.jedi:jedi-XXX:3.0.5. Actually, the XXX is related to the JDK version, that is, either jdk5 or jdk6. The project is built and run on three different environments: PRODuction, UAT, and DEVelopment The underlying database differs owing to the environment: PostGre in PROD, MySQL in UAT, and HSQLDB in DEV. Besides, the connection is set in a Spring file, which can be spring-PROD.xml, spring-UAT.xml, or spring-DEV.xml, all being in the same src/main/resource folder. The first bullet point can be easily answered, using a jdk-version property. The dependency is then declared as follows: <dependency> <groupId>org.codehaus.jedi</groupId>  <artifactId>jedi-${jdk.version}</artifactId> <version>${jedi.version}</version> </dependency> Still, the fourth bullet point is resolved by specifying a resource folder: <resources> <resource> <directory>src/main/resource</directory>  <includes> <include> **/*-${environment}.xml </include> </includes> </resource> </resources> Then, we will have to run Maven adding the property values using one of the following commands: mvn clean install –Denvironment=PROD –Djdk.version=jdk6 mvn clean install –Denvironment=DEV –Djdk.version=jdk5 By the way, we could have merged the three XML files as a unique one, setting dynamically the content thanks to Maven's filter tag and mechanism. The next point to solve is the dependency to actual JDBC drivers. A quick and dirty solution A quick and dirty solution is to mention the three dependencies:  <dependency> <groupId>postgresql</groupId> <artifactId>postgresql</artifactId> <version>9.1-901.jdbc4</version> <scope>runtime</scope> </dependency>  <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.25</version> <scope>runtime</scope> </dependency>  <dependency> <groupId>org.hsqldb</groupId> <artifactId>hsqldb</artifactId> <version>2.3.0</version> <scope>runtime</scope> </dependency> Anyway, this idea has drawbacks. Even though only the actual driver (org. postgresql.Driver, com.mysql.jdbc.Driver, or org.hsqldb.jdbcDriver as described in the Spring files) will be instantiated at runtime, the three JARs will be transitively transmitted—and possibly packaged—in a further distribution. You may argue that we can work around this problem in most of situations, by confining the scope to provided, and embed the actual dependency by any other mean (such as rely on an artifact embarked in an application server); however, even then you should concede the dirtiness of the process. A clean solution Better solutions consist in using dynamic POM. Here, too, there will be a gradient of more or less clean solutions. Once more, as a disclaimer, beware of dynamic POMs! Dynamic POMs are a powerful and tricky feature of Maven. Moreover, modern IDEs manage dynamic POMs better than a few years ago. Yet, their use may be dangerous for newcomers: as with generated code and AOP for instance, what you write is not what you execute, which may result in strange or unexpected behaviors, needing long hours of debug and an aspirin tablet for the headache. This is why you have to carefully weigh their interest, relatively to your project before introducing them. With properties in command lines As a first step, let's define the dependency as follows:  <dependency> <groupId>${effective.groupId}</groupId> <artifactId> ${effective.artifactId} </artifactId> <version>${effective.version}</version> </dependency> As you can see, the dependency is parameterized thanks to three properties: effective.groupId, effective.artifactId, and effective.version. Then, in the same way we added earlier the –Djdk.version property, we will have to add those properties in the command line, for example,: mvn clean install –Denvironment=PROD –Djdk.version=jdk6 -Deffective.groupId=postgresql -Deffective.artifactId=postgresql -Deffective.version=9.1-901.jdbc4 Or add the following property mvn clean install –Denvironment=DEV –Djdk.version=jdk5 -Deffective.groupId=org.hsqldb -Deffective.artifactId=hsqldb -Deffective.version=2.3.0 Then, the effective POM will be reconstructed by Maven, and include the right dependencies: <dependencies> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-core</artifactId> <version>3.2.3.RELEASE</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.codehaus.jedi</groupId> <artifactId>jedi-jdk6</artifactId> <version>3.0.5</version> <scope>compile</scope> </dependency> <dependency> <groupId>postgresql</groupId> <artifactId>postgresql</artifactId> <version>9.1-901.jdbc4</version> <scope>compile</scope> </dependency> </dependencies> Yet, as you can imagine, writing long command lines like the preceding one increases the risks of human error, all the more that such lines are "write-only". These pitfalls are solved by profiles. Profiles and settings As an easy improvement, you can define profiles within the POM itself. The profiles gather the information you previously wrote in the command line, for example: <profile>  <id>PROD</id> <properties> <environment>PROD</environment> <effective.groupId> postgresql </effective.groupId> <effective.artifactId> postgresql </effective.artifactId> <effective.version> 9.1-901.jdbc4 </effective.version> <jdk.version>jdk6</jdk.version> </properties> <activation>  <activeByDefault>true</activeByDefault> </activation> </profile> Or: <profile>  <id>DEV</id> <properties> <environment>DEV</environment> <effective.groupId> org.hsqldb </effective.groupId> <effective.artifactId> hsqldb </effective.artifactId> <effective.version> 2.3.0 </effective.version> <jdk.version>jdk5</jdk.version> </properties> <activation>  <activeByDefault>false</activeByDefault> </activation> </profile> The corresponding command lines will be shorter: mvn clean install (Equivalent to mvn clean install –PPROD) Or: mvn clean install –PDEV You can list several profiles in the same POM, and one, many or all of them may be enabled or disabled. Nonetheless, multiplying profiles and properties hurts the readability. Moreover, if your team has 20 developers, then each developer will have to deal with 20 blocks of profiles, out of which 19 are completely irrelevant for him/her. So, in order to make the thing smoother, a best practice is to extract the profiles and inset them in the personal settings.xml files, with the same information: <?xml version="1.0" encoding="UTF-8"?> <settings xsi_schemaLocation="http://maven.apache.org/ SETTINGS/1.0.0 http://maven.apache.org/xsd/ settings-1.0.0.xsd"> <profiles> <profile> <id>PROD</id> <properties> <environment>PROD</environment> <effective.groupId> postgresql </effective.groupId> <effective.artifactId> postgresql </effective.artifactId> <effective.version> 9.1-901.jdbc4 </effective.version> <jdk.version>jdk6</jdk.version> </properties> <activation> <activeByDefault>true</activeByDefault> </activation> </profile> </profiles> </settings> Dynamic POMs – conclusion As a conclusion, the best practice concerning dynamic POMs is to parameterize the needed fields within the POM. Then, by order of priority: Set an enabled profile and corresponding properties within the settings.xml. mvn <goals> [-f <pom_Without_Profiles.xml> ] [-s <settings_With_Enabled_Profile.xml>] Otherwise, include profiles and properties within the POM mvn <goals> [-f <pom_With_Profiles.xml> ] [-P<actual_Profile> ] [-s <settings_Without_Profile.xml>] Otherwise, launch Maven with the properties in command lines mvn <goals> [-f <pom_Without_Profiles.xml> ] [-s <settings_Without_Profile.xml>] -D<property_1>=<value_1> -D<property_2>=<value_2> (...) -D<property_n>=<value_n> Summary In this article we learned about Dynamic POM. We saw a case study and also saw its quick and easy solutions. Resources for Article: Further resources on this subject: Integrating Scala, Groovy, and Flex Development with Apache Maven [Article] Creating a Camel project (Simple) [Article] Using Hive non-interactively (Simple) [Article]

0
0
11071

article-image-downloading-pyrocms-and-its-pre-requisites

Packt

31 Oct 2013

6 min read

Downloading PyroCMS and it's pre-requisites

Packt

31 Oct 2013

6 min read

(For more resources related to this topic, see here.) Getting started PyroCMS, like many other content management systems including WordPress, Typo3, or Drupal, comes with a pre-developed installation process. For PyroCMS, this installation process is easy to use and comes with a number of helpful hints just in case you hit a snag while installing the system. If, for example, your system files don't have the correct permissions profile (writeable versus write-protected), the PyroCMS installer will help you, along with all the other installation details, such as checking for required software and taking care of file permissions. Before you can install PyroCMS (the version used for examples in this article is 2.2) on a server, there are a number of server requirements that need to be met. If you aren't sure if these requirements have been met, the PyroCMS installer will check to make sure they are available before installation is complete. Following are the software requirements for a server before PyroCMS can be installed: HTTP Web Server MySQL 5.x or higher PHP 5.2.x or higher GD2 cURL Among these requirements, web developers interested in PyroCMS will be glad to know that it is built on CodeIgniter, a popular MVC patterned PHP framework. I recommend that the developers looking to use PyroCMS should also have working knowledge of CodeIgniter and the MVC programming pattern. Learn more about CodeIgniter and see their excellent system documentation online at http://ellislab.com/codeigniter. CodeIgniter If you haven't explored the Model-View-Controller (MVC) programming pattern, you'll want to brush up before you start developing for PyroCMS. The primary reason that CodeIgniter is a good framework for a CMS is that it is a well-documented framework that, when leveraged in the way PyroCMS has done, gives developers power over how long a project will take to build and the quality with which it is built. Add-on modules for PyroCMS, for example, follow the MVC method, a programming pattern that saves developers time and keeps their code dry and portable. Dry and portable programming are two different concepts. Dry is an acronym for "don't repeat yourself" code. Portable code is like "plug-and-play" code—write it once so that it can be shared with other projects and used quickly. HTTP web server Out of the PyroCMS software requirements, it is obvious, you can guess, that a good HTTP web server platform will be needed. Luckily, PyroCMS can run on a variety of web server platforms, including the following: Abyss Web Server Apache 2.x Nginx Uniform Server Zend Community Server If you are new to web hosting and haven't worked with web hosting software before, or this is your first time installing PyroCMS, I suggest that you use Apache as a HTTP web server. It will be the system for which you will find the most documentation and support online. If you'd prefer to avoid Apache, there is also good support for running PyroCMS on Nginx, another fairly-well documented web server platform. MySQL Version 5 is the latest major release of MySQL, and it has been in use for quite some time. It is the primary database choice for PyroCMS and is thoroughly supported. You don't need expert level experience with MySQL to run PyroCMS, but you'll need to be familiar with writing SQL queries and building relational databases if you plan to create add-ons for the system. You can learn more about MySQL at http://www.mysql.com. PHP Version 5.2 of PHP is no longer the officially supported release of PHP, which is, at the time of this article, Version 5.4. Version 5.2, which has been criticized as being a low server requirement for any CMS, is allowed with PyroCMS because it is the minimum version requirement for CodeIgniter, the framework upon which PyroCMS is built. While future versions of PyroCMS may upgrade this minimum requirement to PHP 5.3 or higher, you can safely use PyroCMS with PHP 5.2. Also, many server operating systems, like SUSE and Ubuntu, install PHP 5.2 by default. You can, of course, upgrade PHP to the latest version without causing harm to your instance of PyroCMS. To help future-proof your installation of PyroCMS, it may be wise to install PHP 5.3 or above, to maximize your readiness for when PyroCMS more strictly adopts features found in PHP 5.3 and 5.4, such as namespaceing. GD2 GD2, a library used in the manipulation and creation of images, is used by PyroCMS to dynamically generate images (where needed) and to crop and resize images used in many PyroCMS modules and add-ons. The image-based support offered by this library is invaluable. cURL As described on the cURL project website, cURL is "a command line tool for transferring data with URL syntax" using a large number of methods, including HTTP(S) GET, POST, PUT, and so on. You can learn more about the project and how to use cURL on their website http://curl.haxx.se. If you've never used cURL with PHP, I recommend taking time to learn how to use it, especially if you are thinking about building a web-based API using PyroCMS. Most popular web hosting companies meet the basic server requirements for PyroCMS. Downloading PyroCMS Getting your hands on a copy of PyroCMS is very simple. You can download the system files from one of two locations, the PryoCMS project website and GitHub. To download PyroCMS from the project website, visit http://www.pyrocms.com and click on the green button labeled Get PyroCMS! This will take you to a download page that gives you the choice between downloading the Community version of PyroCMS and buying the Professional version. If you are new to PyroCMS, you can start with the Community version, currently at Version 2.2.3. The following screenshot shows the download screen: To download PyroCMS from GitHub, visit https://github.com/pyrocms/pyrocms and click on the button labeled Download ZIP to get the latest Community version of PyroCMS, as shown in the following screenshot: If you know how to use Git, you can also clone a fresh version of PyroCMS using the following command. A word of warning, cloning PyroCMS from GitHub will usually give you the latest, stable release of the system, but it could include changes not described in this article. Make sure you checkout a stable release from PyroCMS's repository. git clone https://github.com/pyrocms/pyrocms.git As a side-note, if you've never used Git, I recommend taking some time to get started using it. PyroCMS is an open source project hosted in a Git repository on Github, which means that the system is open to being improved by any developer looking to contribute to the well-being of the project. It is also very common for PyroCMS developers to host their own add-on projects on Github and other online Git repository services. Summary In this article, we have covered the pre-requisites for using PyroCMS, and also how to download PyroCMS. Resources for Article : Further resources on this subject: Kentico CMS 5 Website Development: Managing Site Structure [Article] Kentico CMS 5 Website Development: Workflow Management [Article] Web CMS [Article]

0
0
13109

article-image-building-ladder-diagram-programs-simple

Packt

31 Oct 2013

7 min read

Building Ladder Diagram programs (Simple)

Packt

31 Oct 2013

7 min read

(For more resources related to this topic, see here.) There are several editions of RSLogix 5000 available today, which are similar to Microsoft Windows' home and professional versions. The more "basic" (less expensive) editions of RSLogix 5000 have many features disabled. For example, only the full and professional editions, which are more expensive, support the editing of Function Block Diagrams, Graphical Structured Text, and Sequential Function Chart. In my experience, Ladder Logic is the most commonly used language. Refer to http://www.rockwellautomation.com/rockwellsoftware/design/rslogix5000/orderinginfo.html for more on this. Getting ready You will need to have added the cards and tags from the previous recipes to complete this exercise. How to do it... Open Controller Organizer and expand the leaf Tasks | Main Tasks | Main Program. Right-click on Main Program and select New Routine as shown in the following screenshot: Configure a new Ladder Logic program by setting the following values: Name: VALVES Description: Valve Control Program Type: Ladder Diagram For our newly created routine to be executed with each scan of the PLC, we will need to add a reference to it in MainRoutine that is executed with each scan of the MainTask task. Double-click on our MainRoutine program to display the Ladder Logic contained within it. Next, we will add a Jump To Subroutine (JSR) element that will add our newly added Ladder Diagram program to the main task and ensure that it is executed with each scan. Above the Ladder Diagram, there are tab buttons that organize Ladder Elements into Element Groups. Click on the left and right arrows that are on the left side of Element Groups and find the one labeled Program Control. After clicking on the Program Control element group, you will see the JSR element. Click on the JSR element to add it to the current Ladder Logic Rung in MainRoutine. Next, we will make some modifications to the JSR routine so that it calls our newly added Ladder Diagram. Click on the Routine Name parameter of the JSR element and select the VALVES routine from the list as shown in the following screenshot: There are three additional parameters that we are not using as part of the JSR element, which can be removed. Select the Input Par parameter and then click on the Remove Parameter icon in the toolbar above the Ladder Diagram. This icon looks as shown in the following screenshot: Repeat this process for the other optional parameter: Return Par. Now that we have ensured that our newly added Ladder Logic routine will be scanned, we can add the elements to our Ladder Logic routine. Double-click on our VALVES routine in the Controller Organizer tab under the MainTask task. Find the Timer/Counter element group and click on the TON (Timer On Delay) element to add it to our Ladder Diagram. Now we will create the Timer object. Enter the name in the Timer field as FC1001_TON. Right-click on the TIMER object tag name we just entered and select New "FC1001_TON" (or press Ctrl + W). In the New Tag form that appears, enter in the description FAULT TIMER FOR FLOW CONTROL VALVE 1001 and click on OK to create the new TIMER tag. Next, we will configure our TON element to count to five seconds (5,000 milliseconds). Double-click on the Preset parameter and enter in the value 5000, which is in milliseconds. Now, we will need to add the condition that will start the TIMER object. We will be adding a Less Than (LES) element from the Compare element group. Be sure to add the element to the same Ladder Logic Rung as the Timer on Delay element. The LES element will compare the valve position with the valve set point and return true if the values do not match. So set the two parameters of the LES element to the following: FC1001_PV FC1001_SP Now, we will add a second Ladder Logic Rung where a latched fault alarm is triggered after TIMER reaches five seconds. Right-click under the first Ladder Logic Rung and select Add Rung (or press Ctrl + R). Find the Favorites element group and select the Examine On icon as shown in the following screenshot: Click on ? above the Examine On tab and select the TIMER object's Done property, FC1001_TON.DN, as shown in the following screenshot. Now, once the valve values are not equal, and the TIMER has completed its count to five seconds, this Ladder Logic Rung will be activated as shown in the following screenshot: Next, we will add an Output Latched element to this Ladder Logic Rung. Click on the Output Latched element from the Favorites element group with our new rung selected. Click on ? above the Output Latched element and type in the name of a new base tag we are going to add as FC1001_FLT. Press Enter or click on the element to complete the text entry. Right-click on FC1001_FLT and select New "FC1001_FLT" (or press Ctrl + W). Set the following values in the New Tag form that appears: Description: FLOW CONTROL VALVE 1001 POSITION FAULT Type: Base Scope: FirstController Data Type: Bool Click on OK to add the new tag. Our new tag will look like the following screenshot: It is considered bad practice to latch a bit without having the code to unlatch the bit directly below it. Create a new BOOL type tag called ALARM_RESET with the following properties: Name: ALARM_RESET Description: RESET ALARMS Type: Base Scope: FirstController Data Type: BOOL Click on OK to add the new tag. Then add the following coil and OTU to unlatch the fault when the master alarm reset is triggered. Finally, we will add a comment so that we can see what our Ladder Diagram is doing at a glance. Right-click in the far-right area of the first Ladder Logic Rung (where the 0 is) and select Edit Rung Comment (Ctrl + D). Enter the following helpful comment: TRIGGER FAULT IF THE SETPOINT OF THE FLOW CONTROL VALVE 1001 IS NOT EQUAL TO THE VALVE POSITION How it works... We have created our first Ladder Logic Diagram and linked it to the MainTask task. Now, each time that the task is scanned (executed), our Ladder Logic routine will be run from left to right and top to bottom. There's more... More information on Ladder Logic can be found in the Rockwell publication Logix5000 Controllers Ladder Diagram available at http://literature.rockwellautomation.com/idc/groups/literature/documents/pm/1756-pm008_-en-p.pdf. Ladder Logic is the most commonly used programming language in RSLogix 5000. This recipe describes a few more helpful hints to get you started. Understanding Ladder Rung statuses Did you notice the vertical output eeeeeee on the left-hand side of your Ladder Logic Rung? This indicates that an error is present in your Ladder Logic code. After making changes to your controller project, it is a good practice to Verify your project using the drop-down menu item Logic | Verify | Controller. Once Verify has been run, you will see the error pane appear with any errors that it has detected. Element help You can easily get detailed documentation on Ladder Logic Elements, Function Block Diagram Elements, Structured Text Code, and other element types by selecting the object and pressing F1. Copying and pasting Ladder Logic Ladder Logic Rungs and elements can be copied and pasted within your ladder routine. Simply select the rung or element you wish to copy and press Ctrl + C. Then, to paste the rung or element, select the location where you would like to paste it and press Ctrl + V. Summary This article took a first look at creating new routines using ladder logic diagrams. The reader was introduced to the concept of Tasks and also learns how to link routines. In this article, we learned how to navigate the ladder elements that are available, how to find help on each element, and how to create a simple alarm timer using ladder logic. Resources for Article: Further resources on this subject: DirectX graphics diagnostic [Article] Flash 10 Multiplayer Game: Game Interface Design [Article] HTML5 Games Development: Using Local Storage to Store Game Data [Article]

0
0
5883

article-image-specialized-machine-learning-topics

Packt

31 Oct 2013

20 min read

Specialized Machine Learning Topics

Packt

31 Oct 2013

20 min read

(For more resources related to this topic, see here.) As you attempted to gather data, you might have realized that the information was trapped in a proprietary spreadsheet format or spread across pages on the Web. Making matters worse, after spending hours manually reformatting the data, perhaps your computer slowed to a crawl after running out of memory. Perhaps R even crashed or froze your machine. Hopefully you were undeterred; it does get easier with time. You might find the information particularly useful if you tend to work with data that are: Stored in unstructured or proprietary formats such as web pages, web APIs, or spreadsheets From a domain such as bioinformatics or social network analysis, which presents additional challenges So extremely large that R cannot store the dataset in memory or machine learning takes a very long time to complete You're not alone if you suffer from any of these problems. Although there is no panacea—these issues are the bane of the data scientist as well as the reason for data skills to be in high demand—through the dedicated efforts of the R community, a number of R packages provide a head start toward solving the problem. This article also provides a cookbook of such solutions. Even if you are an experienced R veteran, you may discover a package that simplifies your workflow, or perhaps one day you will author a package that makes work easier for everybody else! Working with specialized data Unlike the analyses in this article, real-world data are rarely packaged in a simple CSV form that can be downloaded from a website. Instead, significant effort is needed to prepare data for analysis. Data must be collected, merged, sorted, filtered, or reformatted to meet the requirements of the learning algorithm. This process is known informally as data munging. Munging has become even more important as the size of typical datasets has grown from megabytes to gigabytes and data are gathered from unrelated and messy sources, many of which are domain-specific. Several packages and resources for working with specialized or domain-specific data are listed as follows: Getting data from the Web with the RCurl package The RCurl package by Duncan Temple Lang provides an R interface to the curl (client for URLs) utility, a command-line tool for transferring data over networks. The curl utility is useful for web scraping, which refers to the practice of harvesting data from websites and transforming it into a structured form. Documentation for the RCurl package can be found on the Web at http://www.omegahat.org/RCurl/. After installing the RCurl package, downloading a page is as simple as typing: > library(RCurl) > webpage <- getURL("http://www.packtpub.com/") This will save the full text of the Packt Publishing's homepage (including all web markup) into the R character object named webpage. As shown in the following lines, this is not very useful as-is: > str(webpage) chr "<!DOCTYPE html>n<html >More information on the XML package, including simple examples to get you started quickly, can be found at the project's website: http://www.omegahat.org/RSXML/. Reading and writing JSON with the rjson package The rjson package by Alex Couture-Beil can be used to read and write files in the JavaScript Object Notation (JSON) format. JSON is a standard, plaintext format, most often used for data structures and objects on the Web. The format has become popular recently due to its utility in creating web applications, but despite the name, it is not limited to web browsers. For details about the JSON format, go to http://www.json.org/. The JSON format stores objects in plain text strings. After installing the rjson package, to convert from JSON to R: > library(rjson) > r_object <- fromJSON(json_string) To convert from an R object to a JSON object: >json_string <- toJSON(r_object) Used with the Rcurl package (noted previously), it is possible to write R programs that utilize JSON data directly from many online data stores. Reading and writing Microsoft Excel spreadsheets using xlsx The xlsx package by Adrian A. Dragulescu offers functions to read and write to spreadsheets in the Excel 2007 (or earlier) format—a common task in many business environments. The package is based on the Apache POI Java API for working with Microsoft's documents. For more information on xlsx, including a quick start document, go to https://code.google.com/p/rexcel/. Working with bioinformatics data Data analysis in the field of bioinformatics offers a number of challenges relative to other fields due to the unique nature of genetic data. The use of DNA and protein microarrays has resulted in datasets that are often much wider than they are long (that is, they have more features than examples). This creates problems when attempting to apply conventional visualizations, statistical tests, and machine learning-methods to such data. A CRAN task view for statistical genetics/bioinformatics is available at http://cran.r-project.org/web/views/Genetics.html. The Bioconductor project (http://www.bioconductor.org/) of the Fred Hutchinson Cancer Research Center in Seattle, Washington, provides a centralized hub for methods of analyzing genomic data. Using R as its foundation, Bioconductor adds packages and documentation specific to the field of bioinformatics. Bioconductor provides workflows for analyzing microarray data from common platforms such as for analysis of microarray platforms, including Affymetrix, Illumina, Nimblegen, and Agilent. Additional functionality includes sequence annotation, multiple testing procedures, specialized visualizations, and many other functions. Working with social network data and graph data Social network data and graph data present many challenges. These data record connections, or links, between people or objects. With N people, an N by N matrix of links is possible, which creates tremendous complexity as the number of people grows. The network is then analyzed using statistical measures and visualizations to search for meaningful patterns of relationships. The network package by Carter T. Butts, David Hunter, and Mark S. Handcock offers a specialized data structure for working with such networks. A closely-related package, sna, allows analysis and visualization of the network objects. For more information on network and sna, refer to the project website hosted by the University of Washington: http://www.statnet.org/. Improving the performance of R R has a reputation for being slow and memory inefficient, a reputation that is at least somewhat earned. These faults are largely unnoticed on a modern PC for datasets of many thousands of records, but datasets with a million records or more can push the limits of what is currently possible with consumer-grade hardware. The problem is worsened if the data have many features or if complex learning algorithms are being used. CRAN has a high performance computing task view that lists packages pushing the boundaries on what is possible in R: http://cran.r-project.org/web/views/HighPerformanceComputing.html. Packages that extend R past the capabilities of the base package are being developed rapidly. This work comes primarily on two fronts: some packages add the capability to manage extremely large datasets by making data operations faster or by allowing the size of data to exceed the amount of available system memory; others allow R to work faster, perhaps by spreading the work over additional computers or processors, by utilizing specialized computer hardware, or by providing machine learning optimized to Big Data problems. Some of these packages are listed as follows. Managing very large datasets Very large datasets can sometimes cause R to grind to a halt when the system runs out of memory to store the data. Even if the entire dataset can fit in memory, additional RAM is needed to read the data from disk, which necessitates a total memory size much larger than the dataset itself. Furthermore, very large datasets can take a long amount of time to process for no reason other than the sheer volume of records; even a quick operation can add up when performed many millions of times. Years ago, many would suggest performing data preparation of massive datasets outside R in another programming language, then using R to perform analyses on a smaller subset of data. However, this is no longer necessary, as several packages have been contributed to R to address these Big Data problems. Making data frames faster with data.table The data.table package by Dowle, Short, and Lianoglou provides an enhanced version of a data frame called a data table. The data.table objects are typically much faster than data frames for subsetting, joining, and grouping operations. Yet, because it is essentially an improved data frame, the resulting objects can still be used by any R function that accepts a data frame. The data.table project is found on the Web at http://datatable.r-forge.r-project.org/. One limitation of data.table structures is that like data frames, they are limited by the available system memory. The next two sections discuss packages that overcome this shortcoming at the expense of breaking compatibility with many R functions. Creating disk-based data frames with ff The ff package by Daniel Adler, Christian Glaser, Oleg Nenadic, Jens Oehlschlagel, and Walter Zucchini provides an alternative to a data frame (ffdf) that allows datasets of over two billion rows to be created, even if this far exceeds the available system memory. The ffdf structure has a physical component that stores the data on disk in a highly efficient form and a virtual component that acts like a typical R data frame but transparently points to the data stored in the physical component. You can imagine the ffdf object as a map that points to a location of data on a disk. The ff project is on the Web at http://ff.r-forge.r-project.org/. A downside of ffdf data structures is that they cannot be used natively by most R functions. Instead, the data must be processed in small chunks, and the results should be combined later on. The upside of chunking the data is that the task can be divided across several processors simultaneously using the parallel computing methods presented later in this article. The ffbase package by Edwin de Jonge, Jan Wijffels, and Jan van der Laan addresses this issue somewhat by adding capabilities for basic statistical analyses using ff objects. This makes it possible to use ff objects directly for data exploration. The ffbase project is hosted at http://github.com/edwindj/ffbase. Using massive matrices with bigmemory The bigmemory package by Michael J. Kane and John W. Emerson allows extremely large matrices that exceed the amount of available system memory. The matrices can be stored on disk or in shared memory, allowing them to be used by other processes on the same computer or across a network. This facilitates parallel computing methods, such as those covered later in this article. Additional documentation on the bigmemory package can be found at http://www.bigmemory.org/. Because bigmemory matrices are intentionally unlike data frames, they cannot be used directly with most of the machine learning methods covered in this book. They also can only be used with numeric data. That said, since they are similar to a typical R matrix, it is easy to create smaller samples or chunks that can be converted to standard R data structures. The authors also provide bigalgebra, biganalytics, and bigtabulate packages, which allow simple analyses to be performed on the matrices. Of particular note is the bigkmeans() function in the biganalytics package, which performs k-means clustering. Learning faster with parallel computing In the early days of computing, programs were entirely serial, which limited them to performing a single task at a time. The next instruction could not be performed until the previous instruction was complete. However, many tasks can be completed more efficiently by allowing work to be performed simultaneously. This need was addressed by the development of parallel computing methods, which use a set of two or more processors or computers to solve a larger problem. Many modern computers are designed for parallel computing. Even in the case that they have a single processor, they often have two or more cores which are capable of working in parallel. This allows tasks to be accomplished independently from one another. Networks of multiple computers called clusters can also be used for parallel computing. A large cluster may include a variety of hardware and be separated over large distances. In this case, the cluster is known as a grid. Taken to an extreme, a cluster or grid of hundreds or thousands of computers running commodity hardware could be a very powerful system. The catch, however, is that not every problem can be parallelized; certain problems are more conducive to parallel execution than others. You might expect that adding 100 processors would result in 100 times the work being accomplished in the same amount of time (that is, the execution time is 1/100), but this is typically not the case. The reason is that it takes effort to manage the workers; the work first must be divided into non-overlapping tasks and second, each of the workers' results must be combined into one final answer. So-called embarrassingly parallel problems are the ideal. These tasks are easy to reduce into non-overlapping blocks of work, and the results are easy to recombine. An example of an embarrassingly parallel machine learning task would be 10-fold cross-validation; once the samples are decided, each of the 10 evaluations is independent, meaning that its result does not affect the others. As you will soon see, this task can be sped up quite dramatically using parallel computing. Measuring execution time Efforts to speed up R will be wasted if it is not possible to systematically measure how much time was saved. Although you could sit and observe a clock, an easier solution is to wrap the offending code in a system.time() function. For example, on the author's laptop, the system.time() function notes that it takes about 0.13 seconds to generate a million random numbers: > system.time(rnorm(1000000)) user system elapsed 0.13 0.00 0.13 The same function can be used for evaluating improvement in performance, obtained with the methods that were just described or any R function. Working in parallel with foreach The foreach package by Steve Weston of Revolution Analytics provides perhaps the easiest way to get started with parallel computing, particularly if you are running R on the Windows operating system, as some of the other packages are platform-specific. The core of the package is a new foreach looping construct. If you have worked with other programming languages, this may be familiar. Essentially, it allows looping over a number of items in a set, without explicitly counting the number of items; in other words, for each item in the set, do something. In addition to the foreach package, Revolution Analytics has developed high-performance, enterprise-ready R builds. Free versions are available for trial and academic use. For more information, see their website at http://www.revolutionanalytics.com/. If you're thinking that R already provides a set of apply functions to loop over sets of items (for example, apply(), lapply(), sapply(), and so on), you are correct. However, the foreach loop has an additional benefit: iterations of the loop can be completed in parallel using a very simple syntax. The sister package doParallel provides a parallel backend for foreach that utilizes the parallel package included with R (Version 2.14.0 and later). The parallel package includes components of the multicore and snow packages described in the following sections. Using a multitasking operating system with multicore The multicore package by Simon Urbanek allows parallel processing on single machines that have multiple processors or processor cores. Because it utilizes multitasking capabilities of the operating system, it is not supported natively on Windows systems. An easy way to get started with the code package is using the mcapply() function, which is a parallelized version of lapply(). The multicore project is hosted at http://www.rforge.net/multicore/. Networking multiple workstations with snow and snowfall The snow package (simple networking of workstations) by Luke Tierney, A. J. Rossini, Na Li, and H. Sevcikova allows parallel computing on multicore or multiprocessor machines as well as on a network of multiple machines. The snowfall package by Jochen Knaus provides an easier-to-use interface for snow. For more information on code, including a detailed FAQ and information on how to configure parallel computing over a network, see http://www.imbi.uni-freiburg.de/parallel/. Parallel cloud computing with MapReduce and Hadoop The MapReduce programming model was developed at Google as a way to process their data on a large cluster of networked computers. MapReduce defined parallel programming as a two-step process: A map step, in which a problem is divided into smaller tasks that are distributed across the computers in the cluster A reduce step, in which the results of the small chunks of work are collected and synthesized into a final solution to the original problem A popular open source alternative to the proprietary MapReduce framework is Apache Hadoop. The Hadoop software comprises of the MapReduce concept plus a distributed filesystem capable of storing large amounts of data across a cluster of computers. Packt Publishing has published quite a number of books on Hadoop. To view the list of books on this topic, refer to Hadoop titles from Packt. Several R projects that provide an R interface to Hadoop are in development. One such project is RHIPE by Saptarshi Guha, which attempts to bring the divide and recombine philosophy into R by managing the communication between R and Hadoop. The RHIPE package is not yet available at CRAN, but it can be built from the source available on the Web at http://www.datadr.org. The RHadoop project by Revolution Analytics provides an R interface to Hadoop. The project provides a package, rmr, intended to be an easy way for R developers to write MapReduce programs. Additional RHadoop packages provide R functions for accessing Hadoop's distributed data stores. At the time of publication, development of RHadoop is progressing very rapidly. For more information about the project, see https://github.com/RevolutionAnalytics/RHadoop/wiki. GPU computing An alternative to parallel processing uses a computer's graphics processing unit (GPU) to increase the speed of mathematical calculations. A GPU is a specialized processor that is optimized for rapidly displaying images on a computer screen. Because a computer often needs to display complex 3D graphics (particularly for video games), many GPUs use hardware designed for parallel processing and extremely efficient matrix and vector calculations. A side benefit is that they can be used for efficiently solving certain types of mathematical problems. Where a computer processor may have on the order of 16 cores, a GPU may have thousands. The downside of GPU computing is that it requires specific hardware that is not included with many computers. In most cases, a GPU from the manufacturer Nvidia is required, as they provide a proprietary framework called CUDA (Complete Unified Device Architecture) that makes the GPU programmable using common languages such as C++. For more information on Nvidia's role in GPU computing, go to http://www.nvidia.com/object/what-is-gpu-computing.html. The gputools package by Josh Buckner, Mark Seligman, and Justin Wilson implements several R functions, such as matrix operations, clustering, and regression modeling using the Nvidia CUDA toolkit. The package requires a CUDA 1.3 or higher GPU and the installation of the Nvidia CUDA toolkit. Deploying optimized learning algorithms Some of the machine learning algorithms covered in this book are able to work on extremely large datasets with relatively minor modifications. For instance, it would be fairly straightforward to implement naive Bayes or the Apriori algorithm using one of the Big Data packages described previously. Some types of models such as ensembles, lend themselves well to parallelization, since the work of each model can be distributed across processors or computers in a cluster. On the other hand, some algorithms require larger changes to the data or algorithm, or need to be rethought altogether before they can be used with massive datasets. Building bigger regression models with biglm The biglm package by Thomas Lumley provides functions for training regression models on datasets that may be too large to fit into memory. It works by an iterative process in which the model is updated little-by-little using small chunks of data. The results will be nearly identical to what would have been obtained running the conventional lm() function on the entire dataset. The biglm() function allows use of a SQL database in place of a data frame. The model can also be trained with chunks obtained from data objects created by the ff package described previously. Growing bigger and faster random forests with bigrf The bigrf package by Aloysius Lim implements the training of random forests for classification and regression on datasets that are too large to fit into memory using bigmemory objects as described earlier in this article. The package also allows faster parallel processing using the foreach package described previously. Trees can be grown in parallel (on a single computer or across multiple computers), as can forests, and additional trees can be added to the forest at any time or merged with other forests. For more information, including examples and Windows installation instructions, see the package wiki hosted at GitHub: https://github.com/aloysius-lim/bigrf. Training and evaluating models in parallel with caret The caret package by Max Kuhn will transparently utilize a parallel backend if one has been registered with R (for instance, using the foreach package described previously). Many of the tasks involved in training and evaluating models, such as creating random samples and repeatedly testing predictions for 10-fold cross-validation are embarrassingly parallel. This makes a particularly good caret. Configuration instructions and a case study of the performance improvements for enabling parallel processing in caret are available at the project's website: http://caret.r-forge.r-project.org/parallel.html. Summary It is certainly an exciting time to be studying machine learning. Ongoing work on the relatively uncharted frontiers of parallel and distributed computing offers great potential for tapping the knowledge found in the deluge of Big Data. And the burgeoning data science community is facilitated by the free and open source R programming language, which provides a very low barrier for entry - you simply need to be willing to learn. The topics you have learned, provide the foundation for understanding more advanced machine learning methods. It is now your responsibility to keep learning and adding tools to your arsenal. Along the way, be sure to keep in mind the No Free Lunch theorem—no learning algorithm can rule them all. There will always be a human element to machine learning, adding subject-specific knowledge and the ability to match the appropriate algorithm to the task at hand. In the coming years, it will be interesting to see how the human side changes as the line between machine learning and human learning is blurred. Services such as Amazon's Mechanical Turk provide crowd-sourced intelligence, offering a cluster of human minds ready to perform simple tasks at a moment's notice. Perhaps one day, just as we have used computers to perform tasks that human beings cannot do easily, computers will employ human beings to do the reverse; food for thought. Resources for Article: Further resources on this subject: First steps with R [Article] SciPy for Computational Geometry [Article] Generating Reports in Notebooks in RStudio [Article]

0
0
2473

Packt

31 Oct 2013

7 min read

Installing Apache Karaf

Packt

31 Oct 2013

7 min read

Before Apache Karaf can provide you with an OSGi-based container runtime, we'll have to set up our environment first. The process is quick, requiring a minimum of normal Java usage integration work. In this article we'll review: The prerequisites for Apache Karaf Obtaining Apache Karaf Installing Apache Karaf and running it for the first time Prerequisites As a lightweight container, Apache Karaf has sparse system requirements. You will need to check that you have all of the below specifications met or exceeded: Operating System: Apache Karaf requires recent versions of Windows, AIX, Solaris, HP-UX, and various Linux distributions (RedHat, Suse, Ubuntu, and so on). Disk space: It requires at least 20 MB free disk space. You will require more free space as additional resources are provisioned into the container. As a rule of thumb, you should plan to allocate 100 to 1000 MB of disk space for logging, bundle cache, and repository. Memory: At least 128 MB memory is required; however, more than 2 GB is recommended. Java Runtime Environment (JRE): The runtime environments such as JRE 1.6 or JRE 1.7 are required. The location of the JRE should be made available via environment setting JAVA_HOME. At the time of writing, Java 1.6 is "end of life". For our demos we'll use Apache Maven 3.0.x and Java SDK 1.7.x; these tools should be obtained for future use. However, they will not be necessary to operate the base Karaf installation. Before attempting to build demos, please set the MAVEN_HOME environment variable to point towards your Apache Maven distribution. After verifying you have the above prerequisite hardware, operating system, JVM, and other software packages, you will have to set up your environment variables for JAVA_HOME and MAVEN_HOME. Both of these will be added to the system PATH. Setting up JAVA_HOME Environment Variable Apache Karaf honors the setting of JAVA_HOME in the system environment; if this is not set, it will pick up and use Java from PATH. For users unfamiliar with setting environment variables, the following batch setup script will set up your windows environment: @echo off REM execute setup.bat to setup environment variables. set JAVA_HOME=C:Program FilesJavajdk1.6.0_31 set MAVEN_HOME=c:x1apache-maven-3.0.4 set PATH=%JAVA_HOME%bin;%MAVEN_HOME%bin;%PATH%echo %PATH% The script creates and sets the JAVA_HOME and MAVEN_HOME variables to point to their local installation directories, and then adds their values to the system PATH. The initial echo off directive reduces console output as the script executes; the final echo command prints the value of PATH. Managing Windows System Environment Variables Windows environment settings can be managed via the Systems Properties control panel. Access to these controls varies according to the Windows release. Conversely, in a Unix-like environment, a script similar to the following one will set up your environment: # execute setup.sh to setup environment variables. JAVA_HOME=/path/to/jdk1.6.0_31 MAVEN_HOME=/path/to/apache-maven-3.0.4 PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$PATH export PATH JAVA_HOME MAVEN_HOME echo $PATH The first two directives create and set the JAVA_HOME and MAVEN_HOME environment variables, respectively. These values are added to the PATH setting, and then made available to the environment via the export command. Obtaining Apache Karaf distribution As an Apache open source project, Apache Karaf is made available in both binary and source distributions. The binary distribution comes in a Linux-friendly, GNU-compressed archive and in Windows ZIP format. Your selection of distribution kit will affect which set of scripts are available in Karaf's bin folder. So, if you're using Windows, select the ZIP file; on Unix-like systems choose the tar.gz file. Apache Karaf distributions may be obtained from http://karaf.apache.org/index/community/download.html. The following screenshot shows this link: The primary download site for Apache Karaf provides a list of available mirror sites; it is advisable that you select a server nearer to your location for faster downloads. For the purposes of this article, we will be focusing on Apache Karaf 2.3.x with notes upon the 3.0.x release series. Apache Karaf 2.3.x versus 3.0.x series The major difference between Apache Karaf 2.3 and 3.0 lines is the core OSGi specification supported. Karaf 2.3 utilizes OSGi rev4.3, while Karaf 3.0 uses rev5.0. Karaf 3 also introduces several command name changes. There are a multitude of other internal differences between the code bases, and wherever appropriate, we'll highlight those changes that impact users throughout this text. Installing Apache Karaf The installation of Apache Karaf only requires you to extract the tar.gz or .zip file in your desired target folder destination. The following command is used in Windows: unzip apache-karaf-.zip The following command is used in Unix: tar –zxf apache-karaf-.tar.gz After extraction, the following folder structure will be present: The LICENSE, NOTICE, README, and RELEASE-NOTES files are plain text artifacts contained in each Karaf distribution. The RELEASE-NOTES files are of particular interest, as upon each major and minor release of Karaf, this file is updated with a list of changes. The LICENSE, NOTICE, README, and RELEASE-NOTES files are plain text artifacts contained in each Karaf distribution. The RELEASE-NOTES files are of particular interest, as upon each major and minor release of Karaf, this file is updated with a list of changes. The bin folder contains the Karaf scripts for the interactive shell (Karaf), starting and stopping background Karaf service, a client for connecting to running Karaf instances, and additional utilities. The data folder is home to Karaf's logfiles, bundle cache, and various other persistent data. The demos folder contains an assortment of sample projects for Karaf. It is advisable that new users explore these examples to gain familiarity with the system. For the purposes of this book we strived to create new sample projects to augment those existing in the distribution. The instances folder will be created when you use Karaf child instances. It stores the child instance folders and files. The deploy folder is monitored for hot deployment of artifacts into the running container. The etc folder contains the base configuration files of Karaf; it is also monitored for dynamic configuration updates to the configuration admin service in the running container. An HTML and PDF format copy of the Karaf manual is included in each kit. The lib folder contains the core libraries required for Karaf to boot upon a JVM. The system folder contains a simple repository of dependencies Karaf requires for operating at runtime. This repository has each library jar saved under a Maven-style directory structure, consisting of the library Maven group ID, artifact ID, version, artifact ID-version, any classifier, and extension. First boot! After extracting the Apache Karaf distribution kit and setting our environment variables, we are now ready to start up the container. The container can be started by invoking the Karaf script provided in the bin directory: On Windows, use the following command: binkaraf.bat On Unix, use the following command: ./bin/karaf The following image shows the first boot screen: Congratulations, you have successfully booted Apache Karaf! To stop the container, issue the following command in the console: karaf@root> shutdown –f The inclusion of the –for –-force flag to the shutdown command instructs Karaf to skip asking for confirmation of container shutdown. Pressing Ctrl+ D will shut down Karaf when you are on the shell; however, if you are connected remotely (using SSH), this action will just log off the SSH session, it won't shut down Karaf. Summary We have discovered the prerequisites for installing Karaf, which distribution to obtain, how to install the container, and finally how to start it. Resources for Article: Further resources on this subject: Apache Felix Gogo [Article] WordPress 3 Security: Apache Modules [Article] Configuring Apache and Nginx [Article]

0
0
11816

article-image-web-app-penetration-testing-kali

Packt

30 Oct 2013

4 min read

Web app penetration testing in Kali

Packt

30 Oct 2013

4 min read

(For more resources related to this topic, see here.) Web apps are now a major part of today's World Wide Web. Keeping them safe and secure is the prime focus of webmasters. Building web apps from scratch can be a tedious task, and there can be small bugs in the code that can lead to a security breach. This is where web apps jump in and help you secure your application. Web app penetration testing can be implemented at various fronts such as the frontend interface, database, and web server. Let us leverage the power of some of the important tools of Kali that can be helpful during web app penetration testing. WebScarab proxy WebScarab is an HTTP and HTTPS proxy interceptor framework that allows the user to review and modify the requests created by the browser before they are sent to the server. Similarly, the responses received from the server can be modified before they are reflected in the browser. The new version of WebScarab has many more advanced features such as XSS/CSRF detection, Session ID analysis, and Fuzzing. Follow these three steps to get started with WebScarab: To launch WebScarab, browse to Applications | Kali Linux | Web applications | Web application proxies | WebScarab. Once the application is loaded, you will have to change your browser's network settings. Set the proxy settings for IP as 127.0.0.1 and Port as 8008: Save the settings and go back to the WebScarab GUI. Click on the Proxy tab and check Intercept request. Make sure that both GET and POST requests are highlighted on the left-hand side panel. To intercept the response, check Intercept responses to begin reviewing the responses coming from the server. Attacking the database using sqlninja sqlninja is a popular tool used to test SQL injection vulnerabilities in Microsoft SQL servers. Databases are an integral part of web apps hence, even a single flaw in it can lead to mass compromising of information. Let us see how sqlninja can be used for database penetration testing. To launch SQL ninja, browse to Applications | Kali Linux | Web applications | Database Exploitation | sqlninja. This will launch the terminal window with sqlninja parameters. The important parameter to look for is either the mode parameter or the –m parameter: The –m parameter specifies the type of operation we want to perform over the target database.Let us pass a basic command and analyze the output: root@kali:~#sqlninja –m test Sqlninja rel. 0.2.3-r1 Copyright (C) 2006-2008 icesurfer [-] sqlninja.conf does not exist. You want to create it now ? [y/n] This will prompt you to set up your configuration file (sqlninja.conf). You can pass the respective values and create the config file. Once you are through with it, you are ready to perform database penetration testing. The Websploit framework Websploit is an open source framework designed for vulnerability analysis and penetration testing of web applications. It is very much similar to Metasploit and incorporates many of its plugins to add functionalities. To launch Websploit, browse to Applications | Kali Linux | Web Applications | Web Application Fuzzers | Websploit. We can begin by updating the framework. Passing the update command at the terminal will begin the updating process as follows: wsf>update [*]Updating Websploit framework, Please Wait… Once the update is over, you can check out the available modules by passing the following command: wsf>show modules Let us launch a simple directory scanner module against www.target.com as follows: wsf>use web/dir_scanner wsf:Dir_Scanner>show options wsf:Dir_Scanner>set TARGET www.target.com wsf:Dir_Scanner>run Once the run command is executed, Websploit will launch the attack module and display the result. Similarly, we can use other modules based on the requirements of our scenarios. Summary In this article, we covered the following sections: WebScarab proxy Attacking the database using sqlninja The Websploit framework Resources for Article: Further resources on this subject: Installing VirtualBox on Linux [Article] Linux Shell Script: Tips and Tricks [Article] Installing Arch Linux using the official ISO [Article]

0
0
11722

How-To Tutorials

article-image-using-location-data-phonegap

Packt

30 Oct 2013

11 min read

Using Location Data with PhoneGap

Packt

30 Oct 2013

11 min read

(For more resources related to this topic, see here.) An introduction to Geolocation The term geolocation is used in order to refer to the identification process of the real-world geographic location of an object. Devices that are able to detect the user's position are becoming more common each day and we are now used to getting content based on our location ( geo targeting ). Using the Global Positioning System (GPS )—a space-based satellite navigation system that provides location and time information consistently across the globe—you can now get the accurate location of a device. During the early 1970s, the US military created Navstar, a defense navigation satellite system. Navstar was the system that created the basis for the GPS infrastructure used today by billions of devices. Since 1978 more than 60 GPS satellites have been successfully placed in the orbit around the Earth (refer to http://en.wikipedia.org/wiki/List_of_GPS_satellite_launches for a detailed report about the past and planned launches). The location of a device is represented through a point. This point is comprised of two components: latitude and longitude. There are many methods for modern devices to determine the location information, these include: Global Positioning System (GPS) IP address GSM/CDMA cell IDs Wi-Fi and Bluetooth MAC address Each approach delivers the same information; what changes is the accuracy of the device's position. The GPS satellites continuously transmit information that can parse, for example, the general health of the GPS array, roughly, where all of the satellites are in orbit, information on the precise orbit or path of the transmitting satellite, and the time of the transmission. The receiver calculates its own position by timing the signals sent by any of the satellites in the array that are visible. The process of measuring the distance from a point to a group of satellites to locate a position is known as trilateration . The distance is determined using the speed of light as a constant along with the time that the signal left the satellites. The emerging trend in mobile development is GPS-based "people discovery" apps such as Highlight, Sonar, Banjo, and Foursquare. Each app has different features and has been built for different purposes, but all of them share the same killer feature: using location as a piece of metadata in order to filter information according to the user's needs. The PhoneGap Geolocation API The Geolocation API is not a part of the HTML5 specification but it is tightly integrated with mobile development. The PhoneGap Geolocation API and the W3C Geolocation API mirror each other; both define the same methods and relative arguments. There are several devices that already implement the W3C Geolocation API; for those devices you can use native support instead of the PhoneGap API. As per the HTML specification, the user has to explicitly allow the website or the app to use the device's current position. The Geolocation API is exposed through the geolocation object child of the navigator object and consists of the following three methods: getCurrentPosition() returns the device position. watchPosition() watches for changes in the device position. clearWatch() stops the watcher for the device's position changes. The watchPosition() and clearWatch() methods work in the same way that the setInterval() and clearInterval() methods work; in fact the first one returns an identifier that is passed in to the second one. The getCurrentPosition() and watchPosition() methods mirror each other and take the same arguments: a success and a failure callback function and an optional configuration object. The configuration object is used in order to specify the maximum age of a cached value of the device's position, to set a timeout after which the method will fail and to specify whether the application requires only accurate results. var options = {maximumAge: 3000, timeout: 5000, enableHighAccuracy: true }; navigator.geolocation.watchPosition(onSuccess, onFailure, options); Only the first argument is mandatory; but it's recommended to handle always the failure use case. The success handler function receives as argument, a Position object. Accessing its properties you can read the device's coordinates and the creation timestamp of the object that stores the coordinates. function onSuccess(position) { console.log('Coordinates: ' + position.coords); console.log('Timestamp: ' + position.timestamp); } The coords property of the Position object contains a Coordinates object; so far the most important properties of this object are longitude and latitude. Using those properties it's possible to start to integrate positioning information as relevant metadata in your app. The failure handler receives as argument, a PositionError object. Using the code and the message property of this object you can gracefully handle every possible error. function onError(error) { console.log('message: ' + error.message); console.log ('code: ' + error.code); } The message property returns a detailed description of the error, the code property returns an integer; the possible values are represented through the following pseudo constants: PositionError.PERMISSION_DENIED, the user denies the app to use the device's current position PositionError.POSITION_UNAVAILABLE, the position of the device cannot be determined If you want to recover the last available position when the POSITION_UNAVAILABLE error is returned, you have to write a custom plugin that uses the platform-specific API. Android and iOS have this feature. You can find a detailed example at http://stackoverflow.com/questions/10897081/retrieving-last-known-geolocation-phonegap. PositionError.TIMEOUT, the specified timeout has elapsed before the implementation could successfully acquire a new Position object JavaScript doesn't support constants such as Java and other object-oriented programming languages. With the term "pseudo constants", I refer to those values that should never change in a JavaScript app. One of the most common tasks to perform with the device position information is to show the device location on a map. You can quickly perform this task by integrating Google Maps in your app; the only requirement is a valid API key. To get the key, use the following steps: Visit the APIs console at https://code.google.com/apis/console and log in with your Google account. Click the Services link on the left-hand menu. Activate the Google Maps API v3 service. Time for action – showing device position with Google Maps Get ready to add a map renderer to the PhoneGap default app template. Refer to the following steps: Open the command-line tool and create a new PhoneGap project named MapSample. $ cordova create ~/the/path/to/your/source/ mapmample com.gnstudio.pg.MapSample MapSample Add the Geolocation API plugin using the command line. $ cordova plugins add https: //git-wip-us.apache.org /repos/asf/cordova-plugin-geolocation.git Go to the www folder, open the index.html file, and add a div element with the id value #map inside the main div of the app below the #deviceready one. <div id='map'></div> Add a new script tag to include the Google Maps JavaScript library. <script type="text/javascript" src ="https: //maps.googleapis.com/maps/api/js?key= YOUR_API_KEY &sensor=true"> </script> Go to the css folder and define a new rule inside the index.css file to give to the div element and its content an appropriate size. #map{ width: 280px; height: 230px; display: block; margin: 5px auto; position: relative; } Go to the js folder, open the index.js file, and define a new function named initMap. initMap: function(lat, long){ // The code needed to show the map and the // device position will be added here } In the body of the function, define an options object in order to specify how the map has to be rendered. var options = { zoom: 8, center: new google.maps.LatLng(lat, long), mapTypeId: google.maps.MapTypeId.ROADMAP }; Add to the body of the initMap function the code to initialize the rendering of the map, and to show a marker representing the current device's position over it. var map = new google.maps.Map(document.getElementById('map'), options); var markerPoint = new google.maps.LatLng(lat, long); var marker = new google.maps.Marker({ position: markerPoint, map: map, title: 'Device's Location' }); Define a function to use as the success handler and call from its body the initMap function previously defined. onSuccess: function(position){ var coords = position.coords; app.initMap(coords.latitude, coords.longitude); } Define another function in order to have a failure handler able to notify the user that something went wrong. onFailure: function(error){ navigator.notification.alert(error.message, null); } Go into the deviceready function and add as the last statement the call to the Geolocation API needed to recover the device's position. navigator.geolocation.getCurrentPosition(app.onSuccess, app.onError, {timeout: 5000, enableAccuracy: false}); Open the command-line tool, build the app, and then run it on your testing devices. $ cordova build $ cordova run android What just happened? You integrated Google Maps inside an app. The map is an interactive map most users are familiar with—the most common gestures are already working and the Google Street View controls are already enabled. To successfully load the Google Maps API on iOS, it's mandatory to whitelist the googleapis.com and gstatic.com domains. Open the .plist file of the project as source code (right-click on the file and then Open As | Source Code ) and add the following array of domains: <key>ExternalHosts</key> <array> <string>*.googleapis.com</string> <string>*.gstatic.com</string> </array> Other Geolocation data In the previous example, you only used the latitude and longitude properties of the position object that you received. There are other attributes that can be accessed as properties of the Coordinates object: altitude, the height of the device, in meters, above the sea level. accuracy, the accuracy level of the latitude and longitude, in meters; it can be used to show a radius of accuracy when mapping the device's position. altitudeAccuracy, the accuracy of the altitude in meters. heading, the direction of the device in degrees clockwise from true north. speed, the current ground speed of the device in meters per second. Latitude and longitude are the best supported of these properties, and the ones that will be most useful when communicating with remote APIs. The other properties are mainly useful if you're developing an application for which Geolocation is a core component of its standard functionality, such as apps that make use of this data to create a flow of information contextualized to the geolocation data. The accuracy property is the most important of these additional features, because as an application developer, you typically won't know which particular sensor is giving you the location and you can use the accuracy property as a range in your queries to external services. There are several APIs that allow you to discover interesting data related to a place; among these the most interesting are the Google Places API and the Foursquare API. The Google Places and Foursquare online documentation is very well organized and it's the right place to start if you want to dig deeper into these topics. You can access the Google Places docs at https://developers.google.com/maps/documentation/javascript/places and Foursquare at https://developer.foursquare.com/. The itinero reference app for this article implements both the APIs. In the next example, you will look at how to integrate Google Places inside the RequireJS app. In order to include the Google Places API inside an app, all you have to do is add the libraries parameter to the Google Maps API call. The resulting URL should look similar to http://maps.google.com/maps/api/js?key=SECRET_KEY&sensor=true&libraries=places. The itinero app lets users create and plan a trip with friends. Once the user provides the name of the trip, the name of the country to be visited, and the trip mates and dates, it's time to start selecting the travel, eat, and sleep options. When the user selects the Eat option, the Google Places data provider will return bakeries, take-out places, groceries, and so on, close to the trip's destination. The app will show on the screen a list of possible places the user can select to plan the trip. For a complete list of the types of place searches supported by the Google API, refer to the online documentation at https://developers.google.com/places/documentation/supported_types.

0
0
10599

article-image-mocking-static-methods-simple

Packt

30 Oct 2013

7 min read

Mocking static methods (Simple)

Packt

30 Oct 2013

7 min read

(For more resources related to this topic, see here.) Getting ready The use of static methods is usually considered a bad Object Oriented Programming practice, but if we end up in a project that uses a pattern such as active record (see http://en.wikipedia.org/wiki/Active_record_pattern), we will end up having a lot of static methods. In such situations, we will need to write some unit tests and PowerMock could be quite handy. Start your favorite IDE (which we set up in the Getting and installing PowerMock (Simple) recipe), and let's fire away. How to do it... We will start where we left off. In the EmployeeService.java file, we need to implement the getEmployeeCount method; currently it throws an instance of UnsupportedOperationException. Let's implement the method in the EmployeeService class; the updated classes are as follows: /** * This class is responsible to handle the CRUD * operations on the Employee objects. * @author Deep Shah */ public class EmployeeService { /** * This method is responsible to return * the count of employees in the system. * It does it by calling the * static count method on the Employee class. * @return Total number of employees in the system. */ public int getEmployeeCount() { return Employee.count(); } } /** * This is a model class that will hold * properties specific to an employee in the system. * @author Deep Shah */ public class Employee { /** * The method that is responsible to return the * count of employees in the system. * @return The total number of employees in the system. * Currently this * method throws UnsupportedOperationException. */ public static int count() { throw new UnsupportedOperationException(); } } The getEmployeeCount method of EmployeeService calls the static method count of the Employee class. This method in turn throws an instance of UnsupportedOperationException. To write a unit test of the getEmployeeCount method of EmployeeService, we will need to mock the static method count of the Employee class. Let's create a file called EmployeeServiceTest.java in the test directory. This class is as follows: /** * The class that holds all unit tests for * the EmployeeService class. * @author Deep Shah */ @RunWith(PowerMockRunner.class) @PrepareForTest(Employee.class) public class EmployeeServiceTest { @Test public void shouldReturnTheCountOfEmployeesUsingTheDomainClass() { PowerMockito.mockStatic(Employee.class); PowerMockito.when(Employee.count()).thenReturn(900); EmployeeService employeeService = newEmployeeService(); Assert.assertEquals(900,employeeService.getEmployeeCount()); } } If we run the preceding test, it passes. The important things to notice are the two annotations (@RunWith and @PrepareForTest) at the top of the class, and the call to the PowerMockito.mockStatic method. The @RunWith(PowerMockRunner.class) statement tells JUnit to execute the test using PowerMockRunner. The @PrepareForTest(Employee.class) statement tells PowerMock to prepare the Employee class for tests. This annotation is required when we want to mock final classes or classes with final, private, static, or native methods. The PowerMockito.mockStatic(Employee.class) statement tells PowerMock that we want to mock all the static methods of the Employee class. The next statements in the code are pretty standard, and we have looked at them earlier in the Saying Hello World! (Simple) recipe. We are basically setting up the static count method of the Employee class to return 900. Finally, we are asserting that when the getEmployeeCount method on the instance of EmployeeService is invoked, we do get 900 back. Let's look at one more example of mocking a static method; but this time, let's mock a static method that returns void. We want to add another method to the EmployeeService class that will increment the salary of all employees (wouldn't we love to have such a method in reality?). Updated code is as follows: /** * This method is responsible to increment the salary * of all employees in the system by the given percentage. * It does this by calling the static giveIncrementOf method * on the Employee class. * @param percentage the percentage value by which * salaries would be increased * @return true if the increment was successful. * False if increment failed because of some exception* otherwise. */ public boolean giveIncrementToAllEmployeesOf(intpercentage) { try{ Employee.giveIncrementOf(percentage); return true; } catch(Exception e) { return false; } } The static method Employee.giveIncrementOf is as follows: /** * The method that is responsible to increment * salaries of all employees by the given percentage. * @param percentage the percentage value by which * salaries would be increased * Currently this method throws * UnsupportedOperationException. */ public static void giveIncrementOf(int percentage) { throw new UnsupportedOperationException(); } The earlier syntax would not work for mocking a void static method . The test case that mocks this method would look like the following: @RunWith(PowerMockRunner.class) @PrepareForTest(Employee.class) public class EmployeeServiceTest { @Test public void shouldReturnTrueWhenIncrementOf10PercentageIsGivenSuccessfully() { PowerMockito.mockStatic(Employee.class); PowerMockito.doNothing().when(Employee.class); Employee.giveIncrementOf(10); EmployeeService employeeService = newEmployeeService(); Assert.assertTrue(employeeService.giveIncrementToAllEmployeesOf(10)); } @Test public void shouldReturnFalseWhenIncrementOf10PercentageIsNotGivenSuccessfully() { PowerMockito.mockStatic(Employee.class); PowerMockito.doThrow(newIllegalStateException()).when(Employee.class); Employee.giveIncrementOf(10); EmployeeService employeeService = newEmployeeService(); Assert.assertFalse(employeeService.giveIncrementToAllEmployeesOf(10)); } } Notice that we still need the two annotations @RunWith and @PrepareForTest, and we still need to inform PowerMock that we want to mock the static methods of the Employee class. Notice the syntax for PowerMockito.doNothing and PowerMockito.doThrow: The PowerMockito.doNothing method tells PowerMock to literally do nothing when a certain method is called. The next statement of the doNothing call sets up the mock method. In this case it's the Employee.giveIncrementOf method. This essentially means that PowerMock will do nothing when the Employee.giveIncrementOf method is called. The PowerMockito.doThrow method tells PowerMock to throw an exception when a certain method is called. The next statement of the doThrow call tells PowerMock about the method that should throw an exception; in this case, it would again be Employee.giveIncrementOf. Hence, when the Employee.giveIncrementOf method is called, PowerMock will throw an instance of IllegalStateException. How it works... PowerMock uses custom class loader and bytecode manipulation to enable mocking of static methods. It does this by using the @RunWith and @PrepareForTest annotations. The rule of thumb is whenever we want to mock any method that returns a non-void value , we should be using the PowerMockito.when().thenReturn() syntax. It's the same syntax for instance methods as well as static methods. But for methods that return void, the preceding syntax cannot work. Hence, we have to use PowerMockito.doNothing and PowerMockito.doThrow. This syntax for static methods looks a bit like the record-playback style. On a mocked instance created using PowerMock, we can choose to return canned values only for a few methods; however, PowerMock will provide defaults values for all the other methods. This means that if we did not provide any canned value for a method that returns an int value, PowerMock will mock such a method and return 0 (since 0 is the default value for the int datatype) when invoked. There's more... The syntax of PowerMockito.doNothing and PowerMockito.doThrow can be used on instance methods as well. .doNothing and .doThrow on instance methods The syntax on instance methods is simpler compared to the one used for static methods. Let's say we want to mock the instance method save on the Employee class. The save method returns void, hence we have to use the doNothing and doThrow syntax. The test code to achieve is as follows: /** * The class that holds all unit tests for * the Employee class. * @author Deep Shah */ public class EmployeeTest { @Test() public void shouldNotDoAnythingIfEmployeeWasSaved() { Employee employee =PowerMockito.mock(Employee.class); PowerMockito.doNothing().when(employee.save(); try { employee.save(); } catch(Exception e) { Assert.fail("Should not have thrown anexception"); } } @Test(expected = IllegalStateException.class) public void shouldThrowAnExceptionIfEmployeeWasNotSaved() { Employee employee =PowerMockito.mock(Employee.class); PowerMockito.doThrow(newIllegalStateException()).when(employee).save(); employee.save(); } } To inform PowerMock about the method to mock, we just have to invoke it on the return value of the when method. The line PowerMockito.doNothing().when(employee).save() essentially means do nothing when the save method is invoked on the mocked Employee instance. Similarly, PowerMockito.doThrow(new IllegalStateException()).when(employee).save() means throw IllegalStateException when the save method is invoked on the mocked Employee instance. Notice that the syntax is more fluent when we want to mock void instance methods. Summary In this article, we saw how easily we can mock static methods. Resources for Article: Further resources on this subject: Important features of Mockito [Article] Python Testing: Mock Objects [Article] Easily Writing SQL Queries with Spring Python [Article]

0
0
15338

Packt

30 Oct 2013

5 min read

Creating an image gallery

Packt

30 Oct 2013

5 min read

0
0
2143

How-To Tutorials

article-image-performance-testing-and-load-balancing

Packt

30 Oct 2013

17 min read

Performance Testing and Load Balancing

Packt

30 Oct 2013

17 min read

(For more resources related to this topic, see here.) Initial and on-going performance measurement Performance measurements begin prior to system deployment. In terms of a failover cluster of Hyper-V systems, it begins prior to creating any virtual machines. Your first goal is to obtain baselines. The term baseline has different meanings in different contexts; in this case it means gathering data on a system during a known healthy period. Its purpose is to serve as a point of comparison for later data gathering operations. The first set of performance measurements you take will be with no virtual machines. Once you have reached your target deployment level, you will obtain another. These will be your baselines. All future performance measurements will be compared to these in order to determine how your systems are working. Microsoft provides a thorough document for performance tuning of Windows Server 2012. These concepts carry forward to R2 and many apply to Hyper-V Server as well. Download it from the following site: http://download.microsoft.com/download/0/0/B/00BE76AF-D340-4759-8ECD-C80BC53B6231/performance-tuning-guidelines-windows-server-2012.docx General performance measurement Baselines and ongoing performance evaluations tend to be fairly generic in nature. They can be carried out in a number of ways. This section will examine two others. The first is the free Server Performance Advisor ( SPA ) provided by Microsoft. The second is the Performance Monitor tool in-built in Windows operating systems. Server Performance Advisor This tool can be run quickly to determine the performance characteristics of a new system and on a schedule to track the performance trends of an active system. Do not install or run Server Performance Advisor directly on a Hyper-V host or any guests that are to be measured. Doing so adds a load that will make the results inaccurate. The following instructions can be used to quickly set up SPA to run in a basic environment. They assume that you'll be running the application with a domain account that has administrative privileges on the systems to be measured. To scan a system that has an active firewall, run the following cmdlet: Enable-NetFirewallRule -DisplayName "Performance Logs and Alerts (TCP‑In)" Service Performance Advisor is published on the developer center, which is accessible at http://msdn.microsoft.com/en-us/library/windows/hardware/hh367834.aspx. For best results, this tool should be run from a remote computer that's not on the host being measured. It can be run from any modern Windows system. It requires a connection to an installation of Microsoft SQL Server 2008 R2 or newer. The Express edition is perfectly acceptable. The latest version can be obtained at no charge from the Microsoft download center at http://search.microsoft.com/en-us/downloadresults.aspx?q=sql%20express. There is another requirement that's listed on the download page but not in the included documentation. The CAB file that SPA is delivered in must be extracted with its directory structure intact. If you use Windows Explorer to open the CAB, it will not extract the files properly. Use the built-in extrac32 tool according to the directions (they're on the download page) or use another extraction application that can reproduce the proper folder structure. The final prerequisite you must satisfy is the creation of a folder to hold the results. This folder can be in any location on the system you'll be running SPA from, and it can have any name. This folder must be shared. Determine the domain account that you'll be running SPA with and give that account full permissions to the folder and its share. All that's left is to run SPA. In the folder where you extracted the CAB's contents, run SPAConsole.exe. When it opens, choose File and then New Project to get started. The first screen is just a basic introductory screen. Click on Next and you'll see the following screen, which has been filled in with examples: The previous entries direct the application to create a database on the local computer, in this case an instance of SQL Server Express. For a large environment with many systems to scan, it is recommended to use SQL Server Standard instead. The database name can be anything you like; this one has been named to reflect that it will contain data on the first Hyper-V cluster in the sample organization. Be aware that this will create a new database on the selected server. Once you have selected the database server, instance, and name, and then click on Next to move to the following screen: This screen allows you to select the advisor packs that you'd like to make available in this project. Even though you only need the Hyper-V advisor and perhaps the CoreOS advisor, it's best to select all three. The interface sometimes hangs if only a subset is selected. You won't be required to use all three during a scan. Click Next . This will bring you to the final screen: On this screen, enter the servers that you want to scan. The File Share Location is a file share that will hold the results of the scan. As with the SQL database, it's not required to be on the same system as the scanner. Servers can be added to the list later. You can use Test Configuration to ensure that the indicated servers are reachable. Once you're happy with the entries, click on Finish . You'll be returned to the main screen of SPA. Now, you should see the host(s) that you selected for this project. Select their checkboxes, and then press the Run Analysis button in the lower-right corner. Here, you'll be able to select the actual advisor packs that you want to use. At the bottom of the screen, you'll be able to enter how long you want the scan to run, and if you wish to collect numerous data points over a period of time—how often you want it to run. Click on OK when you're satisfied with your selections and the data collection process will begin. Once it is complete, you can click on the small down arrow in the Analysis Result column of one of the hosts. This will show three buttons, indicated in the following screenshot: These buttons are, from left to right: View Latest Report : See the report from the latest analysis. This is the screen you're most likely to be interested in after a one-time scan of a new system. It will show warnings for any items and settings it finds that might impede optimal performance of Hyper-V. It can also compare one report against another and export result sets to XML. Find Reports : Search through all result sets for this host according to the criteria that you choose. View Charts : These are detailed charts that examine and graph very specific performance metrics of the host. The wording of the Logical Processor count limit when Hyper-V is enabled warning is misleading. The management operating system is restricted to using 64 logical processors, but Hyper-V itself can still schedule guest processes up to the maximum of 320 logical processors. The first two buttons are very simple to understand and you should have no trouble navigating them on your own. Do remember to check the various tabs inside the report. The third button, View Charts , brings you to a tool that isn't as easy to decipher. You'll begin by picking a range of dates, and assuming that you've got more than one report to chart, you'll get a screen that looks something like the following screenshot: The sheer amount of data shown can make this difficult to interpret. In the lower section, you'll notice that there is a large number of performance counters. Select only those that you're actually interested in viewing and you'll find that the chart becomes much easier to understand. To deselect all items, select the first item and press Ctrl + A , and then press the Space bar. The items marked as 90% remove all utilization above the 90 percent mark. These are assumed to be momentary spikes that can skew the outcome in a way that makes the data meaningless. Compare these to the same metrics marked as Max . Use the Pick Series button at the bottom of the window if you wish to reduce the number of selectable items. This button is more useful on the other two tabs; in fact, they'll have no data to display if you do not select an item. As indicated, these two tabs show the way that the selected metrics have been trending over a specified period of time. These can show you how your systems behave differently during the day or across a week. Comparing these reports against those generated by other servers can help you to determine how your guests should be load balanced. Performance Monitor The built-in Performance Monitor tool is much more powerful than most others; but it's up to you to choose what to measure. One of its major strengths is that there's nothing to install. All you need is a Windows system with a GUI. As with Server Performance Advisor, it is not recommended that you run this on a Hyper-V host or guest that you are going to measure. There are two ways to run Performance Monitor. One is as a real-time tool that graphs the monitored performance counters as they occur. The second is as a collector that gathers metrics and stores them for trend analysis. The differentiating features of Performance Monitor from Server Performance Advisor are: Real-time graphing Precise selection of metrics No software downloads required No database system required Performance logs can be opened on any Windows system Performance Monitor is found in Administrative Tools. Depending on how your system is configured, Administrative Tools may be found on the Start screen or menu. It's available in the Control Panel in all versions of Windows. It's also available under the Performance node of Computer Management . If you will be running it for real-time graphing, ensure that you start it with an account that has administrative privileges on the target system. For collectors, you'll be able to specify the account to run it under. You may also need to modify the firewall as indicated under the Server Performance Advisor section mentioned earlier. Real-time monitoring with Performance Monitor To start a real-time monitoring session, expand the Monitoring Tools node and click on Performance Monitor . In the center pane, click on the button with the green plus, which will open the Add Counters window. In this window, you'll want to change the counters' source to the target computer. Your screen should look like the following screenshot: Navigate through the various counters in the upper list box. When you click on one, it will show the instances of that counter that are available to be monitored. Double-click on an instance or highlight it and click on the Add >> button to move it to the list box on the right. These are the objects that will be tracked. When you are satisfied with your selection, click on OK . See Step 4 in the next section for a screenshot of this window and more information about its contents. You will be returned to the main screen. The display will be updated every second. Each counter you picked will be displayed as a line of a various color. The legend will be shown at the bottom. You can uncheck an item to hide it from the running display; however, its counter will still be monitored. Using the buttons across the top of the graph pane, you can modify the output. Most of the options are self-explanatory; change them until the display suits your desires. You have the ability to modify the graph from its default line output to a histogram or to a running digital display. Click on the Highlight button and then select a counter to make it stand out against the others. Several of the buttons open various tabs on the Performance Monitor Properties window where you can change many settings, such as the delay between samples. Of interest here is the Source tab, which will be used in the next section. Trend tracking with Performance Monitor The second use for Performance Monitor is to pull performance statistics across a span of time. In active deployments, it can be used to track the performance of Hyper-V hosts. You'll create scheduled gathering of data collector sets for this. What makes Performance Monitor especially useful for this is that a single collector set can gather from all the hosts in your cluster simultaneously. Before you start, ensure that the Performance Monitor console is not connected to the target computer system as it would be for a real-time monitor. For instance, if you are using Computer Management as shown in the first screenshot in this article, the tree root should say Computer Management (Local) and not contain the name of another system. The first reason is that running and managing the collector sets creates a small drain on the system's resources. Second, you're going to be running collectors against multiple systems and it's better to use a single remote computer for those purposes. Third, it's easiest to look at the results of performance logs on the system that took them. Otherwise, you have to move them around. Look under the Data Collector Sets tree item. There are a number of predefined collector sets and you can add more. Just right-click on the User Defined node and choose New and Data Collector Set. The following steps will take you through the creation of a collector set: On the first screen, come up with a name for the set, then choose to manually create the set, then click on Next : This wizard will create a data collector named DataCollector01 which cannot be renamed. If you wish, you can skip through the wizard to the end, delete the generic collector, and then create new ones with friendlier names. On the second screen, you want to create performance counter data logs: On the third screen, you can change how often the collector polls for data. As you can see in the following screenshot, the default is every 15 seconds: Click on the Add… button in the previous screen to pick the counters that you want to poll. This is the same screen that you see when selecting counters in the real-time screen. Enter the name of the computer you want to poll data from in the Select counters from computer text box. Upon pressing Tab or Enter or clicking on another control, it will load the counters from that system. Select the counters and instances that you desire and click on Add >> . You can monitor counters from multiple computer systems in the same collector set if you like, but you may also choose to use one collector per computer per set. Remember that you'll want to select Hyper-V related counters for CPU, memory, and networking or you'll be retrieving collectors from the parent partition only. Physical disk counters are read from the management operating system. You cannot retrieve statistics for pass-through disks by setting performance counters on the management operating system. If you click on the Add >> button and nothing happens, it is because instances are required but didn't load. Click on another counter and then back on the desired counter until the instances are displayed. On clicking OK , you'll be returned to the previous screen that will now be populated with the counters that you chose. Ensure everything looks as you wish and click on Next . You'll now be asked for a location to save the logs to. Although it will allow you to enter a UNC, logfile creation is usually unsuccessful anywhere but on the local system. You may place them in a local folder that is shared for easy accessibility from other systems, if you wish. The final screen will have you provide the credentials that the set will use. If you leave it on its default setting, it will use the Local System account that will not have the necessary rights to run the collection on the target computer. You have two choices: you can add the computer account of the collector machine to the Performance Log Users group on all target machines or you can use an account that is a member of that group on all machines. For the purposes of this step-through, we're just going to use the domain administrator account: Before clicking on Finish , you are encouraged to set the radio button to Open properties for this data collector set . This will allow you to jump straight to the properties window where you can schedule the scan. Alternately, you can open the properties window by right-clicking on the completed collector set and clicking on Properties . In the properties window, change the options as you like. The Schedule tab is where you establish the Start and End times. You can create multiple schedules for a collector set: If you want to use a separate collector in this set for another host, right-click on the new Collector Set in the left pane and click on New and Data Collector . The wizard is very nearly identical to the one you just completed. You aren't required to follow a schedule. You can manually start and stop collector sets by right-clicking on the menu in the left pane. Once the collector has begun its work, you can go back to the real-time monitor screen and open the Performance Monitor Properties window to the Source tab. Select the logfile that you instructed the collector set to use. The display will switch to the static output of the logfile. However, it will be blank because by default, no counters are selected. Add counters with the green plus button just as you did with the real-time display. This time, you'll only be able to choose from counters that are contained in the logfiles. You can now manipulate the log contents as you did with the live display. Note that you can view a log of an actively running collector, but the screen will not update in real time. Selecting counters practically If you use the exact counters as shown in the example, you'll notice that some of them aren't very useful. For instance, the number of processors in a host is highly unlikely to change during a monitoring session, although the number of virtual processors might. Not all of the available counters are well documented, but there is a Show description check box on the counter selection screen that provides a bit of information. Also, some of the counters you can pull don't compare well from one host to another. In the sample, we instructed SV-HYPERV1 to monitor the amount of data traveling across the virtual adapter in SV-DC1. This is useful data in its own right, but probably in isolation, not as a comparator. Of course, if the virtual machine migrates to another host, it will no longer be readable. You may find the aggregate counters to be more useful than specific virtual machine counters. The counters that are truly useful are simply too numerous to make a meaningful list out of, and not all counters are universally useful in all organizations. The four generic categories you're likely to be interested in are CPU, disk, memory, and networking. Be judicious about selecting counters that look at specific highly available virtual machines. Alternative ways to read performance logs Performance logs can be confusing, especially when you first encounter them. There are a number of tools on the market designed to aid you. One free and popular tool is the Performance Analysis of Logs ( PAL ) Tool. It is a free and open source tool downloadable from Codeplex at http://pal.codeplex.com.

0
0
4883

Packt

30 Oct 2013

10 min read

Highlights of Greenplum

Packt

30 Oct 2013

10 min read

(For more resources related to this topic, see here.) Big Data analytics – platform requirements Organizations are striving towards becoming more data driven and leverage data to gain the competitive advantage. It is inevitable that any current business intelligence infrastructure needs to be upgraded to include Big Data technologies and analytics needs to be embedded into every core business process. The following diagram depicts a matrix that connects requirements from low storage/cost to high storage/cost information management systems and analytics applications. The following section lists all the capabilities that an integrated platform for Big Data analytics should have: A data integration platform that can integrate data from any source, of any type, and highly voluminous in nature. This includes efficient data extraction, data cleansing, transformation, and loading capabilities. A data storage platform that can hold structured, unstructured, and semistructured data with a capability to slice and dice data to any degree, discarding the format. In short, while we store data, we should be able to use the best suited platform for a given data format (for example: structured data to use relational store, semi-structured data to use NoSQL store, and unstructured data to use a file store) and still be able to join data across platforms to run analytics. Support for running standard analytics functions and standard analytical tools on data that has characteristics described previously. Modular and elastically scalable hardware that wouldn't force changes to architecture/design with growing needs to handle bigger data and more complex processing requirements. A centralized management and monitoring system. Highly available and fault tolerant platform that can repair itself in times of any hardware failure seamlessly. Support for advanced visualizations to communicate insights in an effective way. A collaboration platform that can help end users perform the functions of loading, exploring, and visualizing data, and other workflow aspects as an end-to-end process. Core components The following figure depicts core software components of Greenplum UAP: In this section, we will take a brief look at what each component is and take a deep dive into their functions in the sections to follow. Greenplum Database Greenplum Database is a shared nothing, massively parallel processing solution built to support next generation data warehousing and Big Data analytics processing. It stores and analyzes voluminous structured data. It comes in a software-only version that works on commodity servers (this being its unique selling point) and additionally also is available as an appliance (DCA) that can take advantage of large clusters of powerful servers, storage, and switches. GPDB (Greenplum Database) comes with a parallel query optimizer that uses a cost-based algorithm to evaluate and select optimal query plans. Its high-speed interconnection supports continuous pipelining for data processing. In its new distribution under Pivotal, Greenplum Database is called Pivotal (Greenplum) Database. Shared nothing, massive parallel processing (MPP) systems, and elastic scalability Until now, our applications have been benchmarked for certain performance and the core hardware and its architecture determined its readiness for further scalability that came at a cost, be it in terms of changes to the design or hardware augmentation. With growing data volumes, scalability and total cost of ownership is becoming a big challenge and the need for elastic scalability has become prime. This section compares shared disk, shared memory, and shared nothing data architectures and introduces the concept of massive parallel processing. Greenplum Database and HD components implement shared nothing data architecture with master/worker paradigm demonstrating massive parallel processing capabilities. Shared disk data architecture Have a look at the following figure which gives an idea about shared disk data architecture: Shared disk data architecture refers to an architecture where there is a data disk that holds all the data and each node in the cluster accesses this data for processing. Any data operations can be performed by any node at a given point in time and in case two nodes attempt persisting/writing a tuple at the same time, to ensure consistency, a disk-based lock or intended lock communication is passed on thus affecting the performance. Further with increase in the number of nodes, contention at the database level increases. These architectures are write limited as there is a need to handle the locks across the nodes in the cluster. Even in case of the reads, partitioning should be implemented effectively to avoid complete table scans. Shared memory data architecture Have a look at the following figure which gives an idea about shared memory data architecture: In memory, data grids come under the shared memory data architecture category. In this architecture paradigm, data is held in memory that is accessible to all the nodes within the cluster. The major advantage with this architecture is that there would be no disk I/O involved and data access is very quick. This advantage comes with an additional need for loading and synchronizing data in memory with the underlying data store. The memory layer seen in the following figure can be distributed and local to the compute nodes or can exist as data node. Shared nothing data architecture Though an old paradigm, shared nothing data architecture is gaining traction in the context of Big Data. Here the data is distributed across the nodes in the cluster and every processor operates on the data local to itself. The location where data resides is referred to as data node and where the processing logic resides is called compute node. It can happen that both nodes, compute and data, are physically one. These nodes within the cluster are connected using high-speed interconnects. The following figure depicts two aspects of the architecture, the one on the left represents data and computes decoupled processes and the other to the right represents data and computes processes co-located: One of the most important aspects of shared nothing data architecture is the fact that there will not be any contention or locks that would need to be addressed. Data is distributed across the nodes within the cluster using a distribution plan that is defined as a part of the schema definition. Additionally, for higher query efficiency, partitioning can be done at the node level. Any requirement for a distributed lock would bring in complexity and an efficient distribution and partitioning strategy would be a key success factor. Reads are usually the most efficient relative to shared disk databases. Again, the efficiency is determined by the distribution policy, if a query needs to join data across the nodes in the cluster, users would see a temporary redistribution step that would bring required data elements together into another node before the query result is returned. Shared nothing data architecture thus supports massive parallel processing capabilities. Some of the features of shared nothing data architecture are as follows: It can scale extremely well on general purpose systems It provides automatic parallelization in loading and querying any database It has optimized I/O and can scan and process nodes in parallel It supports linear scalability, also referred to as elastic scalability, by adding a new node to the cluster, additional storage, and processing capability, both in terms of load performance and query performance is gained The Greenplum high-availability architecture In addition to primary Greenplum system components, we can also optionally deploy redundant components for high availability and avoiding single point of failure. The following components need to be implemented for data redundancy: Mirror segment instances: A mirror segment always resides on a different host than its primary segment. Mirroring provides you with a replica of the database contained in a segment. This may be useful in the event of disk/hardware failure. The metadata regarding the replica is stored on the master server in system catalog tables. Standby master host: For a fully redundant Greenplum Database system, a mirror of the Greenplum master can be deployed. A backup Greenplum master host serves as a warm standby in cases when the primary master host becomes unavailable. The standby master host is synchronized periodically and kept up-to-date using transaction replication log process that runs on the standby master to keep the master host and standby in sync. In the event of master host failure the standby master is activated and constructed using the transaction logs. Dual interconnect switches: A highly available interconnect can be achieved by deploying redundant network interfaces on all Greenplum hosts and a dual Gigabit Ethernet. The default configuration is to have one network interface per primary segment instance on a segment host (both the interconnects are by default 10Gig in DCA). External tables External tables in Greenplum refer to those database tables that help Greenplum Database access data from a source that is outside of the database. We can have different external tables for different formats. Greenplum supports fast, parallel, as well as nonparallel data loading and unloading. The external tables act as an interfacing point to external data source and give an impression of a local data source to the accessing function. File-based data sources are supported by external tables. The following file formats can be loaded onto external tables: Regular file-based source (supports Text, CSV, and XML data formats): file:// or gpfdist:// protocol Web-based file source (supports Text, CSV, OS commands, and scripts): http:// protocol Hadoop-based file source (supports Text and custom/user-defined formats): gphdfs:// protocol Following is the syntax for the creation and deletion of readable and writable external tables: To create a read-only external table: CREATE EXTERNAL (WEB) TABLE LOCATION (<<file paths>>) | EXECUTE '<<query>>' FORMAT '<<Format name for example: 'TEXT'>>' (DELIMITER, '<<name the delimiter>>'); To create a writable external table: CREATE WRITABLE EXTERNAL (WEB) TABLE LOCATION (<<file paths>>) | EXECUTE '<<query>>' FORMAT '<<Format name for example: 'TEXT'>>' (DELIMITER, '<<name the delimiter>>'); To drop an external table: DROP EXTERNAL (WEB) TABLE; Following are the examples on using file:// and gphdfs:// protocol: CREATE EXTERNAL TABLE test_load_file ( id int, name text, date date, description text ) LOCATION ( 'file://filehost:6781/data/folder1/*', 'file://filehost:6781/data/folder2/*' 'file://filehost:6781/data/folder3/*.csv' ) FORMAT 'CSV' (HEADER); In the preceding example, data is loaded from three different file server locations; also, as you can see, the wild card notation for each of the locations can be different. Now, in case where the files are located on HDFS, the following notation needs to be used (in the following example, the file is '|' delimited): CREATE EXTERNAL TABLE test_load_file ( id int, name text, date date, description text ) LOCATION ( 'gphdfs://hdfshost:8081/data/filename.txt' ) FORMAT 'TEXT' (DELIMITER '|'); Summary In this article, we have learned about Greenplum UAP and also Greenplum Database. This article also gives information about the core components of Greenplum UAP. Resources for Article: Further resources on this subject: Making Big Data Work for Hadoop and Solr [Article] Big Data Analysis [Article] Core Data iOS: Designing a Data Model and Building Data Objects [Article]

0
0
2388

Packt

30 Oct 2013

4 min read

Working with axes (Should know)

Packt

30 Oct 2013

4 min read

(For more resources related to this topic, see here.) Getting ready We start with the same boilerplate that we used when creating basic charts. How to do it... The following code creates some sample data that grows exponentially. We then use the transform and tickSize setting on the Y axis to adjust how our data is displayed: ... <script> var data = [], i; for (i = 1; i <= 50; i++) { data.push([i, Math.exp(i / 10, 2)]); } $('#sampleChart').plot( [ data ], { yaxis: { transform: function (v) { return v == 0 ? v : Math.log(v); }, tickSize: 50 } } ); </script> ... Flot draws a chart with a logarithmic Y axis, so that our exponential data is easier to read: Next, we use Flot's ability to display multiple axes on the same chart as follows: ... var sine = []; for (i = 0; i < Math.PI * 2; i += 0.1) { sine.push([i, Math.sin(i)]); } var cosine = []; for (i = 0; i < Math.PI * 2; i += 0.1) { cosine.push([i, Math.cos(i) * 20]); } $('#sampleChart').plot( [ {label: 'sine', data: sine}, { label: 'cosine', data: cosine, yaxis: 2 } ], { yaxes: [ {}, { position: 'right' } ] } ); ... Flot draws the two series overlapping each other. The Y axis for the sine series is drawn on the left by default and the Y axis for the cosine series is drawn on the right as specified: How it works... The transform setting expects a function that takes a value, which is the y coordinate of our data, and returns a transformed value. In this case, we calculate the logarithm of our original data value so that our exponential data is displayed on a linear scale. We also use the tickSize setting to ensure that our labels do not overlap after the axis has been transformed. The yaxis setting under the series object is a number that specifies which axis the series should be associated with. When we specify the number 2, Flot automatically draws a second axis on the chart. We then use the yaxes setting to specify that the second axis should be positioned on the right of the chart. In this case, the sine data ranges from -1.0 to 1.0, whereas the cosine data ranges from -20 to 20. The cosine axis is drawn on the right and is independent of the sine axis. There's more... Flot doesn't have a built-in ability to interact with axes, but it does give you all the information you need to construct a solution. Making axes interactive Here, we use Flot's getAxes method to add interactivity to our axes as follows: ... var showFahrenheit = false, temperatureFormatter = function (val, axis) { if (showFahrenheit) { val = val * 9 / 5 + 32; } return val.toFixed(1); }, drawPlot = function () { var plot = $.plot( '#sampleChart', [[[0, 0], [1, 3], [3, 1]]], { yaxis: { tickFormatter: temperatureFormatter } } ); var plotPlaceholder = plot.getPlaceholder(); $.each(plot.getAxes(), function (i, axis) { var box = axis.box; var axisTarget = $('<div />'); axisTarget. css({ position: 'absolute', left: box.left, top: box.top, width: box.width, height: box.height }). click(function () { showFahrenheit = !showFahrenheit; drawPlot(); }). appendTo(plotPlaceholder); }); }; drawPlot(); ... First, note that we use a different way of creating a plot. Instead of calling the plot method on a jQuery collection that matches the placeholder element, we use the plot method directly from the jQuery object. This gives us immediate access to the Flot object, which we use to get the axes of our chart. You could have also used the following data method to gain access to the Flot object: var plot = $('#sampleChart').plot(...).data('plot'); Once we have the Flot object, we use the getAxes method to retrieve a list of axis objects. We use jQuery's each method to iterate over each axis and we create a div element that acts as a target for interaction. We set the div element's CSS so that it is in the same position and size as the axis' bounding box, and we attach an event handler to the click event before appending the div element to the plot's placeholder element. In this case, the event handler toggles a Boolean flag and redraws the plot. The flag determines whether the axis labels are displayed in Fahrenheit or Celsius, by changing the result of the function specified in the tickFormatter setting. Summary Now, we will be able to customize a chart's axes, transform the shape of a graph by using a logarithmic scale, display multiple data series with their own independent axes, and make the axes interactive. Resources for Article: Further resources on this subject: Getting started with your first jQuery plugin [Article] OpenCart Themes: Styling Effects of jQuery Plugins [Article] The Basics of WordPress and jQuery Plugin [Article]

0
0
1214

Wrapping OpenCV

Building a To-do List with Ajax

Installing Gideros

Dynamic POM

Downloading PyroCMS and it's pre-requisites

Building Ladder Diagram programs (Simple)

Specialized Machine Learning Topics

Installing Apache Karaf

Web app penetration testing in Kali

Using Location Data with PhoneGap

Trending Topics

Mocking static methods (Simple)

Creating an image gallery

Performance Testing and Load Balancing

Highlights of Greenplum

Working with axes (Should know)

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access