Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-installing-red-hat-cloudforms-red-hat-openstack
Packt
04 Sep 2015
8 min read
Save for later

Installing Red Hat CloudForms on Red Hat OpenStack

Packt
04 Sep 2015
8 min read
In this article by Sangram Rath, the author of the book Hybrid Cloud Management with Red Hat CloudForms, this article takes you through the steps required to install, configure, and use Red Hat CloudForms on Red Hat Enterprise Linux OpenStack. However, you should be able to install it on OpenStack running on any other Linux distribution. The following topics are covered in this article: System requirements Deploying the Red Hat CloudForms Management Engine appliance Configuring the appliance Accessing and navigating the CloudForms web console (For more resources related to this topic, see here.) System requirements Installing the Red Hat CloudForms Management Engine Appliance requires an existing virtual or cloud infrastructure. The following are the latest supported platforms: OpenStack Red Hat Enterprise Virtualization VMware vSphere The system requirements for installing CloudForms are different for different platforms. Since this book talks about installing it on OpenStack, we will see the system requirements for OpenStack. You need a minimum of: Four VCPUs 6 GB RAM 45 GB disk space The flavor we select to launch the CloudForms instance must meet or exceed the preceding requirements. For a list of system requirements for other platforms, refer to the following links: System requirements for Red Hat Enterprise Virtualization: https://access.redhat.com/documentation/en-US/Red_Hat_CloudForms/3.1/html/Installing_CloudForms_on_Red_Hat_Enterprise_Virtualization/index.html System requirements for installing CloudForms on VMware vSphere: https://access.redhat.com/documentation/en-US/Red_Hat_CloudForms/3.1/html/Installing_CloudForms_on_VMware_vSphere/index.html Additional OpenStack requirements Before we can launch a CloudForms instance, we need to ensure that some additional requirements are met: Security group: Ensure that a rule is created to allow traffic on port 443 in the security group that will be used to launch the appliance. Flavor: Based on the system requirements for running the CloudForms appliance, we can either use an existing flavor, such as m1.large, or create a new flavor for the CloudForms Management Engine Appliance. To create a new flavor, click on the Create Flavor button under the Flavor option in Admin and fill in the required parameters, especially these three: At least four VCPUs At least 6144 MB of RAM At least 45 GB of disk space Key pair: Although, at the VNC console, you can just use the default username and password to log in to the appliance, it is good to have access to a key pair as well, if required, for remote SSH. Deploying the Red Hat CloudForms Management Engine appliance Now that we are aware of the resource and security requirements for Red Hat CloudForms, let's look at how to obtain a copy of the appliance and run it. Obtaining the appliance The CloudForms Management appliance for OpenStack can be downloaded from your Red Hat customer portal under the Red Hat CloudForms product page. You need access to a Red Hat CloudForms subscription to be able to do so. At the time of writing this book, the direct download link for this is https://rhn.redhat.com/rhn/software/channel/downloads/Download.do?cid=20037. For more information on obtaining the subscription and appliance, or to request a trial, visit http://www.redhat.com/en/technologies/cloud-computing/cloudforms. Note If you are unable to get access to Red Hat CloudForms, ManageIQ (the open source version) can also be used for hands-on experience. Creating the appliance image in OpenStack Before launching the appliance, we need to create an image in OpenStack for the appliance, since OpenStack requires instances to be launched from an image. You can create a new Image under Project with the following parameters (see the screenshot given for assistance): Enter a name for the image. Enter the image location in Image Source (HTTP URL). Set the Format as QCOW2. Optionally, set the Minimum Disk size. Optionally, set Minimum Ram. Make it Public if required and Create An Image. Note that if you have a newer release of OpenStack, there may be some additional options, but the preceding are what need to be filled in—most importantly the download URL of the Red Hat CloudForms appliance. Wait for the Status field to reflect as Active before launching the instance, as shown in this screenshot: Launching the appliance instance In OpenStack, under Project, select Instances and then click on Launch Instance. In the Launch Instance wizard enter the following instance information in the Details tab: Select an Availabilty Zone. Enter an Instance Name. Select Flavor. Set Instance Count. Set Instance Boot Source as Boot from image. Select CloudForms Management Engine Appliance under Image Name. The final result should appear similar to the following figure: Under the Access & Security tab, ensure that the correct Key Pair and Security Group tab are selected, like this: For Networking, select the proper networks that will provide the required IP addresses and routing, as shown here: Other options, such as Post-Creation and Advanced Options, are optional and can be left blank. Click on Launch when ready to start creating the instance. Wait for the instance state to change to Running before proceeding to the next step. Note If you are accessing the CloudForms Management Engine from the Internet, a Floating IP address needs to be associated with the instance. This can be done from Project, under Access & Security and then the Floating IPs tab. The Red Hat CloudForms web console The web console provides a graphical user interface for working with the CloudForms Management Engine Appliance. The web console can be accessed from a browser on any machine that has network access to the CloudForms Management Engine server. System requirements The system requirements for accessing the Red Hat CloudForms web console are: A Windows, Linux, or Mac computer A modern browser, such as Mozilla Firefox, Google Chrome, and Internet Explorer 8 or above Adobe Flash Player 9 or above The CloudForms Management Engine Appliance must already be installed and activated in your enterprise environment Accessing the Red Hat CloudForms Management Engine web console Type the hostname or floating IP assigned to the instance prefixed by https in a supported browser to access the appliance. Enter default username as admin and the password as smartvm to log in to the appliance, as shown in this screenshot: You should log in to only one tab in each browser, as the console settings are saved for the active tab only. The CloudForms Management Engine also does not guarantee that the browser's Back button will produce the desired results. Use the breadcrumbs provided in the console. Navigating the web console The web console has a primary top-level menu that provides access to feature sets such as Insight, Control, and Automate, along with menus used to add infrastructure and cloud providers, create service catalogs and view or raise requests. The secondary menu appears below the top primary menu, and its options change based on the primary menu option selected. In certain cases, a third-sublevel menu may also appear for additional options based on the selection in the secondary menu. The feature sets available in Red Hat CloudForms are categorized under eight menu items: Cloud Intelligence: This provides a dashboard view of your hybrid cloud infrastructure for the selected parameters. Whatever is displayed here can be configured as a widget. It also provides additional insights into the hybrid cloud in the form of reports, chargeback configuration and information, timeline views, and an RSS feeds section. Services: This provides options for creating templates and service catalogs that help in provisioning multitier workloads across providers. It also lets you create and approve requests for these service catalogs. Clouds: This option in the top menu lets you add cloud providers; define availability zones; and create tenants, flavors, security groups, and instances. Infrastructure: This option, in a way similar to clouds, lets you add infrastructure providers; define clusters; view, discover, and add hosts; provision VMs; work with data stores and repositories; view requests; and configure the PXE. Control: This section lets you define compliance and control policies for the infrastructure providers using events, conditions, and actions based on the conditions. You can further combine these policies into policy profiles. Another important feature is alerting the administrators, which is configured from here. You can also simulate these policies, import and export them, and view logs. Automate: This menu option lets you manage life cycle tasks such as provisioning and retirement, and automation of resources. You can create provisioning dialogs to provision hosts and virtual machines and service dialogs to provision service catalogs. Dialog import/export, logs, and requests for automation are all managed from this menu option. Optimize: This menu option provides utilization, planning, and bottleneck summaries for the hybrid cloud environment. You can also generate reports for these individual metrics. Configure: Here, you can customize the look of the dashboard; view queued, running tasks and check errors and warnings for VMs and the UI. It let's you configure the CloudForms Management Engine appliance settings such as database, additional worker appliances, SmartProxy, and white labelling. One can also perform tasks maintenance tasks such as updates and manual modification of the CFME server configuration files. Summary In this article, we deployed the Red Hat CloudForms Management Engine Appliance in an OpenStack environment, and you learned where to configure the hostname, network settings, and time zone. We then used the floating IP of the instance to access the appliance from a web browser, and you learned where the different feature sets are and how to navigate around. Resources for Article: Further resources on this subject: Introduction to Microsoft Azure Cloud Services[article] Apache CloudStack Architecture[article] Using OpenStack Swift [article]
Read more
  • 0
  • 0
  • 14012

article-image-creating-web-server
Packt
03 Sep 2015
3 min read
Save for later

Creating a Web Server

Packt
03 Sep 2015
3 min read
 In this article by Marco Schwartz, author of the book Intel Galileo Networking Cookbook, we will cover the recipe, Reading pins via a web server. (For more resources related to this topic, see here.) Reading pins via a web server We are now going to see how to use a web server for useful things. For example, we will see here how to use a web server to read the pins of the Galileo board, and then how to display these readings on a web page. Getting ready For this chapter, you won't need to do much with your Galileo board, as we just want to see if we can read the state of a pin from a web server. I simply connected pin number 7 of the Galileo board to the VCC pin, as shown in this picture: How to do it... We are now going to see how to read the state from pin number 7, and display this state on a web page. This is the complete code: // Required modules var m = require("mraa"); var util = require('util'); var express = require('express'); var app = express(); // Set input on pin 7 var myDigitalPin = new m.Gpio(7); myDigitalPin.dir(m.DIR_IN); // Routes app.get('/read', function (req, res) { var myDigitalValue = myDigitalPin.read(); res.send("Digital pin 7 value is: " + myDigitalValue); }); // Start server var server = app.listen(3000, function () { console.log("Express app started!"); }); You can now simply copy this code and paste it inside a blank Node.js project. Also make sure that the package.json file includes the Express module. Then, as usual, upload, build, and run the application using Intel XDK. You should see the confirmation message inside the XDK console. Then, use a browser to access your board on port 3000, at the /read route. You should see the following message, which is the reading from pin number 7: If you can see this, congratulations, you can now read the state of the pins on your board, and display this on your web server! How it works... In this recipe, we combined two things that we saw in previous recipes. We again used the mraa module to read from pins, here from pin number 7 of the board. You can find out more about the mraa module at: https://github.com/intel-iot-devkit/mraa Then, we combined this with a web server using the Express framework, and we defined a new route called /read that reads the state of the pin, and sends it back so that it can be displayed inside a web browser, with this code: app.get('/read', function (req, res) { var myDigitalValue = myDigitalPin.read(); res.send("Digital pin 7 value is: " + myDigitalValue); }); See also You can now check the next recipe to see how to control a pin from the Node.js server running on the Galileo board. Summary In this recipe, we saw how to read the state from pin number 7, and display this state on a web page. If you liked this article please buy the book Intel Galileo Networking Cookbook, Packt Publishing, to learn over 45 Galileo recipes. Resources for Article: Further resources on this subject: Arduino Development[article] Integrating with Muzzley[article] Getting Started with Intel Galileo [article]
Read more
  • 0
  • 0
  • 11060

Packt
03 Sep 2015
10 min read
Save for later

Learning RSLogix 5000 – Buffering I/O Module Input and Output Values

Packt
03 Sep 2015
10 min read
 In the following article by Austin Scott, the author of Learning RSLogix 5000 Programming, you will be introduced to the high performance, asynchronous nature of the Logix family of controllers and the requirement for the buffering I/O module data it drives. You will learn various techniques for the buffering I/O module values in RSLogix 5000 and Studio 5000 Logix Designer. You will also learn about the IEC Languages that do not require the input or output module buffering techniques to be applied to them. In order to understand the need for buffering, let's start by exploring the evolution of the modern line of Rockwell Automation Controllers. (For more resources related to this topic, see here.) ControlLogix controllers The ControlLogix controller was first launched in 1997 as a replacement for Allen Bradley's previous large-scale control platform. The PLC-5. ControlLogix represented a significant technological step forward, which included a 32-bit ARM-6 RISC-core microprocessor and the ABrisc Boolean processor combined with a bus interface on the same silicon chip. At launch, the Series 5 ControlLogix controllers (also referred to as L5 and ControlLogix 5550, which has now been superseded by the L6 and L7 series controllers) were able to execute code three times faster than PLC-5. The following is an illustration of the original ControlLogix L5 Controller: ControlLogix Logix L5 Controller The L5 controller is considered to be a PAC (Programmable Automation Controller) rather than a traditional PLC (Programmable Logic Controller), due to its modern design, power, and capabilities beyond a traditional PLC (such as motion control, advanced networking, batching and sequential control). ControlLogix represented a significant technological step forward for Rockwell Automation, but this new technology also presented new challenges for automation professionals. ControlLogix was built using a modern asynchronous operating model rather than the more traditional synchronous model used by all the previous generations of controllers. The asynchronous operating model requires a different approach to real-time software development in RSLogix 5000 (now known in version 20 and higher as Studio 5000 Logix Designer). Logix operating cycle The entire Logix family of controllers (ControlLogix and CompactLogix) have diverged from the traditional synchronous PLC scan architecture in favor of a more efficient asynchronous operation. Like most modern computer systems, asynchronous operation allows the Logix controller to handle multiple Tasks at the same time by slicing the processing time between each task. The continuous update of information in an asynchronous processor creates some programming challenges, which we will explore in this article. The following diagram illustrates the difference between synchronous and asynchronous operation. Synchronous versus Asynchronous Processor Operation Addressing module I/O data Individual channels on a module can be referenced in your Logix Designer / RSLogix 5000 programs using it's address. An address gives the controller directions to where it can find a particular piece of information about a channel on one of your modules. The following diagram illustrates the components of an address in RsLogix 5000 or Studio 5000 Logix Designer: The components of an I/O Module Address in Logix Module I/O tags can be viewed using the Controller Tags window, as the following screen shot illustrates. I/O Module Tags in Studio 5000 Logix Designer Controller Tags Window Using the module I/O tags, input and output module data can be directly accessed anywhere within a logic routine. However, it is recommended that we buffer module I/O data before we evaluate it in Logic. Otherwise, due to the asynchronous tag value updates in our I/O modules, the state of our process values could change part way through logic execution, thus creating unpredictable results. In the next section, we will introduce the concept of module I/O data buffering. Buffering module I/O data In the olden days of PLC5s and SLC500s, before we had access to high-performance asynchronous controllers like the ControlLogix, SoftLogix and CompactLogix families, program execution was sequential (synchronous) and very predictable. In asynchronous controllers, there are many activities happening at the same time. Input and output values can change in the middle of a program scan and put the program in an unpredictable state. Imagine a program starting a pump in one line of code and closing a valve directly in front of that pump in the next line of code, because it detected a change in process conditions. In order to address this issue, we use a technique call buffering and, depending on the version of Logix you are developing on, there are a few different methods of achieving this. Buffering is a technique where the program code does not directly access the real input or output tags on the modules during the execution of a program. Instead, the input and output module tags are copied at the beginning of a programs scan to a set of base tags that will not change state during the program's execution. Think of buffering as taking a snapshot of the process conditions and making decisions on those static values rather than live values that are fluctuating every millisecond. Today, there is a rule in most automation companies that require programmers to write code that "Buffers I/O" data to base tags that will not change during a programs execution. The two widely accepted methods of buffering are: Buffering to base tags Program parameter buffering (only available in the Logix version 24 and higher) Do not underestimate the importance of buffering a program's I/O. I worked on an expansion project for a process control system where the original programmers had failed to implement buffering. Once a month, the process would land in a strange state, which the program could not recover from. The operators had attributed these problem to "Gremlins" for years, until I identified and corrected the issue. Buffering to base tags Logic can be organized into manageable pieces and executed based on different intervals and conditions. The buffering to base tags practice takes advantage of Logix's ability to organize code into routines. The default ladder logic routine that is created in every new Logix project is called MainRoutine. The recommended best practice for buffering tags in ladder logic is to create three routines: One for reading input values and buffering them One for executing logic One for writing the output values from the buffered values The following ladder logic excerpt is from MainRoutine of a program that implements Input and Output Buffering: MainRoutine Ladder Logic Routine with Input and Output Buffering Subroutine Calls The following ladder logic is taken from the BufferInputs routine and demonstrates the buffering of digital input module tag values to Local tags prior to executing our PumpControl routine: Ladder Logic Routine with Input Module Buffering After our input module values have been buffered to Local tags, we can execute our processlogic in our PumpControl routine without having to worry about our values changing in the middle of the routine's execution. The following ladder logic code determines whether all the conditions are met to run a pump: Pump Control Ladder Logic Routine Finally, after all of our Process Logic has finished executing, we can write the resulting values to our digital output modules. The following ladder logic BufferOutputs, routine copies the resulting RunPump value to the digital output module tag. Ladder Logic Routine with Output Module Buffering We have now buffered our module inputs and module outputs in order to ensure they do not change in the middle of a program execution and potentially put our process into an undesired state. Buffering Structured Text I/O module values Just like ladder logic, Structured Text I/O module values should be buffered at the beginning of a routine or prior to executing a routine in order to prevent the values from changing mid-execution and putting the process into a state you could not have predicted. Following is an example of the ladder logic buffering routines written in Structured Text (ST)and using the non-retentive assignment operator: (* I/O Buffering in Structured Text Input Buffering *) StartPump [:=] Local:2:I.Data[0].0; HighPressure [:=] Local:2:I.Data[0].1; PumpStartManualOverride [:=] Local:2:I.Data[0].2; (* I/O Buffering in Structured Text Output Buffering *) Local:3:O.Data[0].0 [:=] RunPump; Function Block Diagram (FBD) and Sequential Function Chart (SFC) I/O module buffering Within Rockwell Automation's Logix platform, all of the supported IEC languages (ladder logic, structured text, function block, and sequential function chart) will compile down to the same controller bytecode language. The available functions and development interface in the various Logix programming languages are vastly different. Function Block Diagrams (FBD) and Sequential Function Charts (SFC) will always automatically buffer input values prior to executing Logic. Once a Function Block Diagram or a Sequential Function Chart has completed the execution of their logic, they will write all Output Module values at the same time. There is no need to perform Buffering on FBD or SFC routines, as it is automatically handled. Buffering using program parameters A program parameter is a powerful new feature in Logix that allows the association of dynamic values to tags and programs as parameters. The importance of program parameters is clear by the way they permeate the user interface in newer versions of Logix Designer (version 24 and higher). Program parameters are extremely powerful, but the key benefit to us for using them is that they are automatically buffered. This means that we could have effectively created the same result in one ladder logic rung rather than the eight we created in the previous exercise. There are four types of program parameters: Input: This program parameter is automatically buffered and passed into a program on each scan cycle. Output: This program parameter is automatically updated at the end of a program (as a result of executing that program) on each scan cycle. It is similar to the way we buffered our output module value in the previous exercise. InOut: This program parameter is updated at the start of the program scan and the end of the program scan. It is also important to note that, unlike the input and output parameters; the InOut parameter is passed as a pointer in memory. A pointer shares a piece of memory with other processes rather than creating a copy of it, which means that it is possible for an InOut parameter to change its value in the middle of a program scan. This makes InOut program parameters unsuitable for buffering when used on their own. Public: This program parameterbehaves like a normal controller tag and can be connected to input, output, and InOut parameters. it is similar to the InOut parameter, public parameters that are updated globally as their values are changed. This makes program parameters unsuitable for buffering, when used on their own. Primarily public program parameters are used for passing large data structures between programs on a controller. In Logix Designer version 24 and higher, a program parameter can be associated with a local tag using the Parameters and Local Tags in the Control Organizer (formally called "Program Tags"). The module input channel can be associated with a base tag within your program scope using the Parameter Connections. Add the module input value as a parameter connection. The previous screenshot demonstrates how we would associate the input module channel with our StartPump base tag using the Parameter Connection value. Summary In this article, we explored the asynchronous nature of the Logix family of controllers. We learned the importance of buffering input module and output module values for ladder logic routines and structured text routines. We also learned that, due to the way Function Block Diagrams (FBD/FB) and Sequential Function Chart (SFC) Routines execute, there is no need to buffer input module or output module tag values. Finally, we introduced the concept of buffering tags using program parameters in version 24 and high of Studio 5000 Logix Designer. Resources for Article: Further resources on this subject: vROps – Introduction and Architecture[article] Ensuring Five-star Rating in the MarketPlace[article] Building Ladder Diagram programs (Simple) [article]
Read more
  • 0
  • 0
  • 11961

Packt
03 Sep 2015
22 min read
Save for later

ArcGIS – Advanced ArcObjects

Packt
03 Sep 2015
22 min read
 In this article by Hussein Nasser, author of the book ArcGIS By Example we will discuss the following topics: Geodatabase editing Preparing the data and project Creating excavation features Viewing and editing excavation information (For more resources related to this topic, see here.) Geodatabase editing YharanamCo is a construction contractor experienced in executing efficient and economical excavations for utility and telecom companies. When YharanamCo's board of directors heard of ArcGIS technology, they wanted to use their expertise with the power of ArcGIS to come up with a solution that helps them cut costs even more. Soil type is not the only factor in the excavation, there are many factors including the green factor where you need to preserve the trees and green area while excavating for visual appeal. Using ArcGIS, YharanamCo can determine the soil type and green factor and calculate the cost of an excavation. The excavation planning manager is the application you will be writing on top of ArcGIS. This application will help YharanamCo to create multiple designs and scenarios for a given excavation. This way they can compare the cost for each one and consider how many trees they could save by going through another excavation route. YharanamCo has provided us with the geodatabase of a soil and trees data for one of their new projects for our development. So far we learned how to view and query the geodatabase and we were able to achieve that by opening what we called a workspace. However, changing the underlying data requires establishing an editing session. All edits that are performed during an edit sessions are queued, and the moment the session is saved, these edits are committed to the geodatabase. Geodatabase editing supports atomic transactions, which are referred to as operations in the geodatabase. Atomic transaction is a list of database operations that either all occur together or none. This is to ensure consistency and integrity. After this short introduction to geodatabase editing, we will prepare our data and project. Preparing the data and project Before we dive into the coding part, we need to do some preparation for our new project and our data. Preparing the Yharnam geodatabase and map The YharnamCo team has provided us with the geodatabase and map document, so we will simply copy the necessary files to your drive. Follow these steps to start your preparation of the data and map: Copy the entire yharnam folder in the supporting to C:\ArcGISByExample\. Run yharnam.mxd under C:\ArcGISByExample\yharnam\Data\yharnam.mxd. This should point to the geodatabase, which is located under C:\ArcGISByExample\yharnam\Data\yharnam.gdb, as illustrated in the following screenshot: Note there are three types of trees: Type1, Type2, and Type3. Also note there are two types of soil: rocky and sand. Close ArcMap and choose not to save any changes. Preparing the Yharnam project We will now start our project. First we need to create our Yharnam Visual Studio extending ArcObjects project. To do so, follow these steps: From the Start menu, run Visual Studio Express 2013 as administrator. Go to the File menu and then click on New Project. Expand the Templates node | Visual Basic | ArcGIS, and then click on Extending ArcObjects. You will see the list of projects displayed on the right. Select the Class Library (ArcMap) project. In the Name field, type Yharnam, and in the location, browse to C:\ArcGISByExample\yharnam\Code. If the Code folder is not there, create it. Click on OK. In the ArcGIS Project Wizard, you will be asked to select the references libraries you will need in your project. I always recommend selecting all referencing, and then at the end of your project, remove the unused ones. So, go ahead and right-click on Desktop ArcMap and click on Select All, as shown in the following screenshot: Click on Finish to create the project. This will take a while to add all references to your project. Once your project is created, you will see that one class called Class1 is added, which we won't need, so right-click on it and choose Delete. Then, click on OK to confirm. Go to File and click on Save All. Exit the Visual Studio application. You have finished preparing your Visual Studio with extending ArcObjects support. Move to the next section to write some code. Adding the new excavation tool The new excavation tool will be used to draw a polygon on the map, which represents the geometry of the excavation. Then, this will create a corresponding excavation feature using this geometry: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Click on the Yharnam project from Solution Explorer to activate it. From the Project menu, click on Add Class. Expand the ArcGIS node and then click on the Extending ArcObjects node. Select Base Tool and name it tlNewExcavation.vb. Click on Add to open ArcGIS New Item Wizard Options. From ArcGIS New Item Wizard Options, select Desktop ArcMap tool since we will be programming against ArcMap. Click on OK. Take note of the Yharnam.tlNewExcavation Progid as we will be using this in the next section to add the tool to the toolbar. If necessary, double-click on tlNewExcavation.vb to edit it. In the New method, update the properties of the command as follows. This will update the name and caption and other properties of the command. There is a piece of code that loads the command icon. Leave that unchanged: Public Sub New() MyBase.New() ' TODO: Define values for the public properties ' TODO: Define values for the public properties MyBase.m_category = "Yharnam" 'localizable text MyBase.m_caption = "New Excavation" 'localizable text MyBase.m_message = "New Excavation" 'localizable text MyBase.m_toolTip = "New Excavation" 'localizable text MyBase.m_name = "Yharnam_NewExcavation" Try 'TODO: change resource name if necessary Dim bitmapResourceName As String = Me.GetType().Name + ".bmp" MyBase.m_bitmap = New Bitmap(Me.GetType(), bitmapResourceName) MyBase.m_cursor = New System.Windows.Forms.Cursor(Me.GetType(), Me.GetType().Name + ".cur") Catch ex As Exception System.Diagnostics.Trace.WriteLine(ex.Message, "Invalid Bitmap") End Try End Sub Adding the excavation editor tool The excavation editor is a tool that will let us click an excavation feature on the map and display the excavation information such as depth, area, and so on. It will also allow us to edit some of the information. We will now add a tool to our project. To do that, follow these steps: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Click on the Yharnam project from Solution Explorer to activate it. From the Project menu, click on Add Class. Expand the ArcGIS node and then click on the Extending ArcObjects node. Select Base Tool and name it tlExcavationEditor.vb. Click on Add to open ArcGIS New Item Wizard Options. From ArcGIS New Item Wizard Options, select Desktop ArcMap Tool since we will be programming against ArcMap. Click on OK. Take note of the Yharnam.tlExcavationEditor Progid as we will be using this in the next section to add the tool to the toolbar. If necessary, double-click on tlExcavationEditor.vb to edit it. In the New method, update the properties of the command as follows. This will update the name and caption and other properties of the command: MyBase.m_category = "Yharnam" 'localizable text MyBase.m_caption = "Excavation Editor" 'localizable text MyBase.m_message = "Excavation Editor" 'localizable text MyBase.m_toolTip = "Excavation Editor" 'localizable text MyBase.m_name = "Yharnam_ExcavationEditor" In order to display the excavation information, we will need a form. To add the Yharnam Excavation Editor form, point to Project and then click on Add Windows Form. Name the form frmExcavationEditor.vb and click on Add. Use the form designer to add and set the controls shown in the following table: Control Name Properties Label lblDesignID Text: Design ID Label lblExcavationOID Text: Excavation ObjectID Label lblExcavationArea Text: Excavation Area Label lblExcavationDepth Text: Excavation Depth Label lblTreeCount Text: Number of Trees Label lblTotalCost Text: Total Excavation Cost Text txtDesignID Read-Only: True Text txtExcavationOID Read-Only: True Text txtExcavationArea Read-Only: True Text txtExcavationDepth Read-Only: False Text txtTreeCount Read-Only: True Text txtTotalCost Read-Only: True Button btnSave Text: Save Your form should look like the following screenshot: One last change before we build our solution: we need to change the default icons of our tools. To do that, double-click on tlExcavationEditor.bmp to open the picture editor and replace the picture with yharnam.bmp, which can be found under C:\ArcGISByExample\yharnam\icons\excavation_editor.bmp, and save tlExcavationEditor.bmp. Change tlNewExcavation.bmp to C:\ArcGISByExample\yharnam\icons\new_excavation.bmp. Save your project and move to the next step to assign the command to the toolbar. Adding the excavation manager toolbar Now that we have our two tools, we will add a toolbar to group them together. Follow these steps to add the Yharnam Excavation Planning Manager Toolbar to your project: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Click on the Yharnam project from Solution Explorer to activate it. From the Project menu, click on Add Class. Expand the ArcGIS node and then click on the Extending ArcObjects node. Select Base Toolbar and name it tbYharnam.vb. Click on Add to open ArcGIS New Item Wizard Options. From ArcGIS New Item Wizard Options, select Desktop ArcMap since we will be programming against ArcMap. Click on OK. The property Caption is what is displayed when the toolbar loads. It currently defaults to MY VB.Net Toolbar, so change it to Yharnam Excavation Planning Manager Toolbar as follows: Public Overrides ReadOnly Property Caption() As String Get 'TODO: Replace bar caption Return "YharnamExcavation Planning Manager" End Get End Property Your toolbar is currently empty, which means it doesn't have buttons or tools. Go to the New method and add your tools prog ID, as shown in the following code: Public Sub New() AddItem("Yharnam.tlNewExcavation") AddItem("Yharnam.tlExcavationEditor") End Sub Now it is time to test our new toolbar. Go to Build and then click on Build Solution; make sure ArcMap is not running. If you get an error, make sure you have run the Visual Studio as administrator. For a list of all ArcMap commands, you can refer to http://bit.ly/b04748_arcmapids. Check the commands with namespace esriArcMapUI. Run yharnam.mxd in C:\ArcGISByExample\yharnam\Data\yharnam.mxd. From ArcMap, go to the Customize menu, then to Toolbars, and then select Yharnam Excavation Planning Manager Toolbar, which we just created. You should see the toolbar pop up on ArcMap with the two added commands, as shown in the following screenshot: Close ArcMap and choose not to save any changes. In the next section, we will do the real work of editing. Creating excavation features Features are nothing but records in a table. However, these special records cannot be created without a geometry shape attribute. To create a feature, we need first to learn how to draw and create geometries. We will be using the rubber band ArcObjects interface to create a polygon geometry. We can use it to create other types of geometries as well, but since our excavations are polygons, we will use the polygon rubber band. Using the rubber band to draw geometries on the map In this exercise, we will use the rubber band object to create a polygon geometry by clicking on multiple points on the map. We will import libraries as we need them. Follow these steps: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Double-click on tlNewExcavation.vb and write the following code in onMouseDown that is when we click on the map: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) Dim pRubberBand As IRubberBand = New RubberPolygonClass End Sub You will get an error under rubber band and that is because the library that this class is located in is not imported. Simply hover over the error and import the library; in this case, it is Esri.ArcGIS.Display, as illustrated in the following screenshot: We now have to call the TrackNew method on the pRubberband object that will allow us to draw. This requires two pieces of parameters. First, the screen on which you are drawing and the symbol you are drawing with, which have the color, size, and so on. By now we are familiar with how we can get these objects. The symbol needs to be of type FillShapeSymbol since we are dealing with polygons. We will go with a simple black symbol for starters. Write the following code: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) Dim pDocument As IMxDocument = m_application.Document Dim pRubberBand As IRubberBand = New RubberPolygonClass Dim pFillSymbol As ISimpleFillSymbol = New SimpleFillSymbol Dim pPolygon as IGeometry=pRubberBand.TrackNew(pDocument.ActiveView.ScreenDisplay, pFillSymbol) End Sub Build your solution. If it fails, make sure you have run the solution as administrator. Run yharnam.mxd. Click on the New Excavation tool to activate it, and then click on three different locations on the map. You will see that a polygon is forming as you click; double-click to finish drawing, as illustrated in the following screenshot: The polygon disappears when you finish drawing and the reason is that we didn't actually persist the polygon into a feature or a graphic. As a start, we will draw the polygon on the screen and we will also change the color of the polygon to red. Write the following code to do so: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) Dim pDocument As IMxDocument = m_application.Document Dim pRubberBand As IRubberBand = New RubberPolygonClass Dim pFillSymbol As ISimpleFillSymbol = New SimpleFillSymbol Dim pColor As IColor = New RgbColor pColor.RGB = RGB(255, 0, 0) pFillSymbol.Color = pColor Dim pPolygon As IGeometry = pRubberBand.TrackNew(pDocument.ActiveView.ScreenDisplay, pFillSymbol) Dim pDisplay As IScreenDisplay = pDocument.ActiveView.ScreenDisplay pDisplay.StartDrawing(pDisplay.hDC, ESRI.ArcGIS.Display.esriScreenCache.esriNoScreenCache) pDisplay.SetSymbol(pFillSymbol) pDisplay.DrawPolygon(pPolygon) pDisplay.FinishDrawing() End Sub Build your solution and run yharnam.mxd. Activate the New Excavation tool and draw an excavation; you should see a red polygon is displayed on the screen after you finish drawing, as shown in the following screenshot: Close ArcMap and choose not to save the changes. Converting geometries into features Now that we have learned how to draw a polygon, we will convert that polygon into an excavation feature. Follow these steps to do so: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Double-click on tlNewExcavation.vb to edit the code. Remove the code that draws the polygon on the map. Your new code should look like the following: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) Dim pDocument As IMxDocument = m_application.Document Dim pRubberBand As IRubberBand = New RubberPolygonClass Dim pFillSymbol As ISimpleFillSymbol = New SimpleFillSymbol Dim pPolygon As IGeometry = pRubberBand.TrackNew(pDocument.ActiveView.ScreenDisplay, pFillSymbol) End Sub First we need to open the Yharnam geodatabase located on C:\ArcGISByExample\yharnam\Data\Yharnam.gdb by establishing a workspace connection and then we will open the Excavation feature class. Write the following two functions in tlNewExcavation.vb, getYharnamWorkspace, and getExcavationFeatureclass: Public Function getYharnamWorkspace() As IWorkspace Dim pWorkspaceFactory As IWorkspaceFactory = New FileGDBWorkspaceFactory Return pWorkspaceFactory.OpenFromFile("C:\ArcGISByExample\yharnam\Data\Yharnam.gdb", m_application.hWnd) End Function Public Function getExcavationFeatureClass(pWorkspace As IWorkspace) As IFeatureClass Dim pFWorkspace As IFeatureWorkspace = pWorkspace Return pFWorkspace.OpenFeatureClass("Excavation") End Function To create the feature, we need first to start an editing session and a transaction, and wrap and code between the start and the end of the session. To use editing, we utilize the IWorkspaceEdit interface. Write the following code in the onMouseDown method: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) Dim pDocument As IMxDocument = m_application.Document Dim pRubberBand As IRubberBand = New RubberPolygonClass Dim pFillSymbol As ISimpleFillSymbol = New SimpleFillSymbol Dim pPolygon As IGeometry = pRubberBand.TrackNew(pDocument.ActiveView.ScreenDisplay, pFillSymbol) Dim pWorkspaceEdit As IWorkspaceEdit = getYharnamWorkspace() pWorkspaceEdit.StartEditing(True) pWorkspaceEdit.StartEditOperation() pWorkspaceEdit.StopEditOperation() pWorkspaceEdit.StopEditing(True) End Sub Now we will use the CreateFeature method in order to create a new feature and then populate it with the attributes. The only attribute that we care about now is the geometry or the shape. The shape is actually the polygon we just drew. Write the following code to create the feature: Dim pWorkspaceEdit As IWorkspaceEdit = getYharnamWorkspace() pWorkspaceEdit.StartEditing(True) pWorkspaceEdit.StartEditOperation() Dim pExcavationFeatureClass As IFeatureClass = getExcavationFeatureClass(pWorkspaceEdit) Dim pFeature As IFeature = pExcavationFeatureClass.CreateFeature() pFeature.Shape = pPolygon pFeature.Store() pWorkspaceEdit.StopEditOperation() pWorkspaceEdit.StopEditing(True) Build and run yharnam.mxd. Click on the New Excavation tool and draw a polygon on the map. Refresh the map. You will see that a new excavation feature is added to the map, as shown in the following screenshot: Close ArcMap and choose not to save any changes. Reopen yharnam.mxd and you will see that the features you created are still there because they are stored in the geodatabase. Close ArcMap and choose not to save any changes. We have learned how to create features. Now we will learn how to edit excavations as well. View and edit the excavation information We have created some excavation features on the map; however, these are merely polygons and we need to extract useful information from them, and display and edit these excavations. For that, we will use the Excavation Editor tool to click on an excavation and display the Excavation Editor form with the excavation information. Then we will give the ability to edit this information. Follow these steps: If necessary, open Visual Studio Express in administrator mode; we need to do this since our project is actually writing to the registry this time, so it needs administrator permissions. To do that, right-click on Visual Studio and click on Run as administrator. Go to File, then click on Open Project, browse to the Yharnam project from C:\ArcGISByExample\yharnam\Code, and click on Open. Right-click on frmExcavationEditor.vb and click on View Code to view the class code. Add the ArcMapApplication property as shown in the following code so that we can set it from the tool, we will need this at a later stage: Private _application As IApplication Public Property ArcMapApplication() As IApplication Get Return _application End Get Set(ByVal value As IApplication) _application = value End Set End Property Add another method called PopulateExcavation that takes a feature. This method will populate the form fields with the information we get from the excavation feature. We will pass the feature from the Excavation Editor tool at a later stage: Public Sub PopulateExcavation(pFeature As IFeature) End Sub According to the following screenshot of the Excavation feature class from ArcCatalog, we can populate three fields from the excavation attributes, these are the design ID, the depth of excavation, and the object ID of the feature: Write the following code to populate the design ID, the depth, and the object ID. Note that we used the isDBNull function to check if there is any value stored in those fields. Note that we don't have to do that check for the OBJECTID field since it should never be null: Public Sub PopulateExcavation(pFeature As IFeature) Dim designID As Long = 0 Dim dDepth As Double = 0 If IsDBNull(pFeature.Value(pFeature.Fields.FindField("DESIGNID"))) = False Then designID = pFeature.Value(pFeature.Fields.FindField("DESIGNID")) End If If IsDBNull(pFeature.Value(pFeature.Fields.FindField("DEPTH"))) = False Then dDepth = pFeature.Value(pFeature.Fields.FindField("DEPTH")) End If txtDesignID.Text = designID txtExcavationDepth.Text = dDepth txtExcavationOID.Text= pFeature.OID End Sub What left is the excavation area, which is a bit tricky. To do that, we need to get it from the Shape property of the feature by casting it to the IAreaarcobjects interface and use the area property as follows: ….. txtExcavationOID.Text = pFeature.OID Dim pArea As IArea = pFeature.Shape txtExcavationArea.Text = pArea.Area End Sub Now our viewing capability is ready, we need to execute it. Double-click on tlExcavationEditor.vb in order to open the code. We will need the getYharnamWorkspace and getExcavationFeatureClass methods that are in tlNewExcavation.vb. Copy them in tlExcavationEditor.vb. In the onMouseDown event, write the following code to get the feature from the mouse location. This will convert the x, y mouse coordinate into a map point and then does a spatial query to find the excavation under this point. After that, we will basically call our excavation editor form and send it the feature to do the work as follows: Public Overrides Sub OnMouseDown(ByVal Button As Integer, ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer) 'TODO: Add tlExcavationEditor.OnMouseDown implementation Dim pMxdoc As IMxDocument = m_application.Document Dim pPoint As IPoint = pMxdoc.ActiveView.ScreenDisplay.DisplayTransformation.ToMapPoint(X, Y) Dim pSFilter As ISpatialFilter = New SpatialFilter pSFilter.Geometry = pPoint Dim pFeatureClass As IFeatureClass = getExcavationFeatureClass(getYharnamWorkspace()) Dim pFCursor As IFeatureCursor = pFeatureClass.Search(pSFilter, False) Dim pFeature As IFeature = pFCursor.NextFeature If pFeatureIs Nothing Then Return Dim pExcavationEditor As New frmExcavationEditor pExcavationEditor.ArcMapApplication = m_application pExcavationEditor.PopulateExcavation(pFeature) pExcavationEditor.Show() End Sub Build and run yharnam.mxd. Click on the Excavation Editor tool and click on one of the excavations you drew before. You should see that the Excavation Editor form pops up with the excavation information; no design ID or depth is currently set, as you can see in the following screenshot: Close ArcMap and choose not to save any changes. We will do the final trick to edit the excavation; there is not much to edit here, only the depth. To do that, copy the getYharnamWorkspace and getExcavationFeatureClass methods that are in tlNewExcavation.vb. Copy them in frmExcavationEditor.vb. You will get an error in the m_application.hwnd, so replace it with _application.hwnd, which is the property we set. Right-click on frmExcavationEditor and select View Designer. Double-click on the Save button to generate the btnSave_Click method. The user will enter the new depth for the excavation in the txtExcavationDepthtextbox. We will use this value and store it in the feature. But before that, we need to retrieve that feature using the object ID, start editing, save the feature, and close the session. Write the following code to do so. Note that we have closed the form at the end of the code, so we can open it again to get the new value: Private Sub btnSave_Click(sender As Object, e As EventArgs) Handles btnSave.Click Dim pWorkspaceEdit As IWorkspaceEdit = getYharnamWorkspace() Dim pFeatureClass As IFeatureClass = getExcavationFeatureClass(pWorkspaceEdit) Dim pFeature As IFeature = pFeatureClass.GetFeature(txtExcavationOID.Text) pWorkspaceEdit.StartEditing(True) pWorkspaceEdit.StartEditOperation() pFeature.Value(pFeature.Fields.FindField("DEPTH")) = txtExcavationDepth.Text pFeature.Store() pWorkspaceEdit.StopEditOperation() pWorkspaceEdit.StopEditing(True) Me.Close End Sub Build and run yharnam.mxd. Click on the Excavation Editor tool and click on one of the excavations you drew before. Type a numeric depth value and click on Save; this will close the form. Use the Excavation Editor tool again to open back the excavation and check if your depth value has been stored successfully. Summary In this article, you started writing the excavation planning manager, code named Yharnam. In the first part of the article, you spent time learning to use the geodatabase editing and preparing the project. You then learned how to use the rubber band tool, which allows you to draw geometries on the map. Using this drawn geometry, you edited the workspace and created a new excavation feature with that geometry. You then learned how to view and edit the excavation feature with attributes. Resources for Article: Further resources on this subject: ArcGIS Spatial Analyst[article] Enterprise Geodatabase[article] Server Logs [article]
Read more
  • 0
  • 0
  • 2490

article-image-testing-exceptional-flow
Packt
03 Sep 2015
22 min read
Save for later

Testing Exceptional Flow

Packt
03 Sep 2015
22 min read
 In this article by Frank Appel, author of the book Testing with JUnit, we will learn that special care has to be taken when testing a component's functionality under exception-raising conditions. You'll also learn how to use the various capture and verification possibilities and discuss their pros and cons. As robust software design is one of the declared goals of the test-first approach, we're going to see how tests intertwine with the fail fast strategy on selected boundary conditions. Finally, we're going to conclude with an in-depth explanation of working with collaborators under exceptional flow and see how stubbing of exceptional behavior can be achieved. The topics covered in this article are as follows: Testing patterns Treating collaborators (For more resources related to this topic, see here.) Testing patterns Testing exceptional flow is a bit trickier than verifying the outcome of normal execution paths. The following section will explain why and introduce the different techniques available to get this job done. Using the fail statement "Always expect the unexpected"                                                                                  – Adage based on Heraclitus Testing corner cases often results in the necessity to verify that a functionality throws a particular exception. Think, for example, of a java.util.List implementation. It quits the retrieval attempt of a list's element by means of a non-existing index number with java.lang.ArrayIndexOutOfBoundsException. Working with exceptional flow is somewhat special as without any precautions, the exercise phase would terminate immediately. But this is not what we want since it eventuates in a test failure. Indeed, the exception itself is the expected outcome of the behavior we want to check. From this, it follows that we have to capture the exception before we can verify anything. As we all know, we do this in Java with a try-catch construct. The try block contains the actual invocation of the functionality we are about to test. The catch block again allows us to get a grip on the expected outcome—the exception thrown during the exercise. Note that we usually keep our hands off Error, so we confine the angle of view in this article to exceptions. So far so good, but we have to bring up to our minds that in case no exception is thrown, this has to be classified as misbehavior. Consequently, the test has to fail. JUnit's built-in assertion capabilities provide the org.junit.Assert.fail method, which can be used to achieve this. The method unconditionally throws an instance of java.lang.AssertionError if called. The classical approach of testing exceptional flow with JUnit adds a fail statement straight after the functionality invocation within the try block. The idea behind is that this statement should never be reached if the SUT behaves correctly. But if not, the assertion error marks the test as failed. It is self-evident that capturing should narrow down the expected exception as much as possible. Do not catch IOException if you expect FileNotFoundException, for example. Unintentionally thrown exceptions must pass the catch block unaffected, lead to a test failure and, therefore, give you a good hint for troubleshooting with their stack trace. We insinuated that the fetch-count range check of our timeline example would probably be better off throwing IllegalArgumentException on boundary violations. Let's have a look at how we can change the setFetchCountExceedsLowerBound test to verify different behaviors with the try-catch exception testing pattern (see the following listing): @Test public void setFetchCountExceedsLowerBound() { int tooSmall = Timeline.FETCH_COUNT_LOWER_BOUND - 1; try { timeline.setFetchCount( tooSmall ); fail(); } catch( IllegalArgumentException actual ) { String message = actual.getMessage(); String expected = format( Timeline.ERROR_EXCEEDS_LOWER_BOUND, tooSmall ); assertEquals( expected, message ); assertTrue( message.contains( valueOf( tooSmall ) ) ); } } It can be clearly seen how setFetchCount, the functionality under test, is called within the try block, directly followed by a fail statement. The caught exception is narrowed down to the expected type. The test avails of the inline fixture setup to initialize the exceeds-lower-bound value in the tooSmall local variable because it is used more than once. The verification checks that the thrown message matches an expected one. Our test calculates the expectation with the aid of java.lang.String.format (static import) based on the same pattern, which is also used internally by the timeline to produce the text. Once again, we loosen encapsulation a bit to ensure that the malicious value gets mentioned correctly. Purists may prefer only the String.contains variant, which, on the other hand would be less accurate. Although this works fine, it looks pretty ugly and is not very readable. Besides, it blurs a bit the separation of the exercise and verification phases, and so it is no wonder that there have been other techniques invented for exception testing. Annotated expectations After the arrival of annotations in the Java language, JUnit got a thorough overhauling. We already mentioned the @Test type used to mark a particular method as an executable test. To simplify exception testing, it has been given the expected attribute. This defines that the anticipated outcome of a unit test should be an exception and it accepts a subclass of Throwable to specify its type. Running a test of this kind captures exceptions automatically and checks whether the caught type matches the specified one. The following snippet shows how this can be used to validate that our timeline constructor doesn't accept null as the injection parameter: @Test( expected = IllegalArgumentException.class ) public void constructWithNullAsItemProvider() { new Timeline( null, mock( SessionStorage.class ) ); } Here, we've got a test, the body statements of which merge setup and exercise in one line for compactness. Although the verification result is specified ahead of the method's signature definition, of course, it gets evaluated at last. This means that the runtime test structure isn't twisted. But it is a bit of a downside from the readability point of view as it breaks the usual test format. However, the approach bears a real risk when using it in more complex scenarios. The next listing shows an alternative of setFetchCountExceedsLowerBound using the expected attribute: @Test( expected = IllegalArgumentException.class ) public void setFetchCountExceedsLowerBound() { Timeline timeline = new Timeline( null, null ); timeline.setFetchCount( Timeline.FETCH_COUNT_LOWER_BOUND - 1 ); } On the face of it, this might look fine because the test run would succeed apparently with a green bar. But given that the timeline constructor already throws IllegalArgumentException due to the initialization with null, the virtual point of interest is never reached. So any setFetchCount implementation will pass this test. This renders it not only useless, but it even lulls you into a false sense of security! Certainly, the approach is most hazardous when checking for runtime exceptions because they can be thrown undeclared. Thus, they can emerge practically everywhere and overshadow the original test intention unnoticed. Not being able to validate the state of the thrown exception narrows down the reasonable operational area of this concept to simple use cases, such as the constructor parameter verification mentioned previously. Finally, here are two more remarks on the initial example. First, it might be debatable whether IllegalArgumentException is appropriate for an argument-not-null-check from a design point of view. But as this discussion is as old as the hills and probably will never be settled, we won't argue about that. IllegalArgumentException was favored over NullPointerException basically because it seemed to be an evident way to build up a comprehensible example. To specify a different behavior of the tested use case, one simply has to define another Throwable type as the expected value. Second, as a side effect, the test shows how a generated test double can make our life much easier. You've probably already noticed that the session storage stand-in created on the fly serves as a dummy. This is quite nice as we don't have to implement one manually and as it decouples the test from storage-related signatures, which may break the test in future when changing. But keep in mind that such a created-on-the-fly dummy lacks the implicit no-operation-check. Hence, this approach might be too fragile under some circumstances. With annotations being too brittle for most usage scenarios and the try-fail-catch pattern being too crabbed, JUnit provides a special test helper called ExpectedException, which we'll take a look at now. Verification with the ExpectedException rule The third possibility offered to verify exceptions is the ExpectedException class. This type belongs to a special category of test utilities. For the moment, it is sufficient to know that rules allow us to embed a test method into custom pre- and post-operations at runtime. In doing so, the expected exception helper can catch the thrown instance and perform the appropriate verifications. A rule has to be defined as a nonstatic public field, annotated with @Rule, as shown in the following TimelineTest excerpt. See how the rule object gets set up implicitly here with a factory method: public class TimelineTest { @Rule public ExpectedException thrown = ExpectedException.none(); [...] @Test public void setFetchCountExceedsUpperBound() { int tooLargeValue = FETCH_COUNT_UPPER_BOUND + 1; thrown.expect( IllegalArgumentException.class ); thrown.expectMessage( valueOf( tooLargeValue ) ); timeline.setFetchCount( tooLargeValue ); } [...] } Compared to the try-fail-catch approach, the code is easier to read and write. The helper instance supports several methods to specify the anticipated outcome. Apart from the static imports of constants used for compactness, this specification reproduces pretty much the same validations as the original test. ExpectedException#expectedMessage expects a substring of the actual message in case you wonder, and we omitted the exact formatting here for brevity. In case the exercise phase of setFetchCountExceedsUpperBound does not throw an exception, the rule ensures that the test fails. In this context, it is about time we mentioned the utility's factory method none. Its name indicates that as long as no expectations are configured, the helper assumes that a test run should terminate normally. This means that no artificial fail has to be issued. This way, a mix of standard and exceptional flow tests can coexist in one and the same test case. Even so, the test helper has to be configured prior to the exercise phase, which still leaves room for improvement with respect to canonizing the test structure. As we'll see next, the possibility of Java 8 to compact closures into lambda expressions enables us to write even leaner and cleaner structured exceptional flow tests. Capturing exceptions with closures When writing tests, we strive to end up with a clear representation of separated test phases in the correct order. All of the previous approaches for testing exceptional flow did more or less a poor job in this regard. Looking once more at the classical try-fail-catch pattern, we recognize, however, that it comes closest. It strikes us that if we put some work into it, we can extract exception capturing into a reusable utility method. This method would accept a functional interface—the representation of the exception-throwing functionality under test—and return the caught exception. The ThrowableCaptor test helper puts the idea into practice: public class ThrowableCaptor { @FunctionalInterface public interface Actor { void act() throws Throwable; } public static Throwable thrownBy( Actor actor ) { try { actor.act(); } catch( Throwable throwable ) { return throwable; } return null; } } We see the Actor interface that serves as a functional callback. It gets executed within a try block of the thrownBy method. If an exception is thrown, which should be the normal path of execution, it gets caught and returned as the result. Bear in mind that we have omitted the fail statement of the original try-fail-catch pattern. We consider the capturer as a helper for the exercise phase. Thus, we merely return null if no exception is thrown and leave it to the afterworld to deal correctly with the situation. How capturing using this helper in combination with a lambda expression works is shown by the next variant of setFetchCountExceedsUpperBound, and this time, we've achieved the clear phase separation we're in search of: @Test public void setFetchCountExceedsUpperBound() { int tooLarge = FETCH_COUNT_UPPER_BOUND + 1; Throwable actual = thrownBy( ()-> timeline.setFetchCount( tooLarge ) ); String message = actual.getMessage(); assertNotNull( actual ); assertTrue( actual instanceof IllegalArgumentException ); assertTrue( message.contains( valueOf( tooLarge ) ) ); assertEquals( format( ERROR_EXCEEDS_UPPER_BOUND, tooLarge ), message ); } Please note that we've added an additional not-null-check compared to the verifications of the previous version. We do this as a replacement for the non-existing failure enforcement. Indeed, the following instanceof check would fail implicitly if actual was null. But this would also be misleading since it overshadows the true failure reason. Stating that actual must not be null points out clearly the expected post condition that has not been met. One of the libraries presented there will be AssertJ. The latter is mainly intended to improve validation expressions. But it also provides a test helper, which supports the closure pattern you've just learned to make use of. Another choice to avoid writing your own helper could be the library Fishbowl, [FISBOW]. Now that we understand the available testing patterns, let's discuss a few system-spanning aspects when dealing with exceptional flow in practice. Treating collaborators Considerations we've made about how a software system can be built upon collaborating components, foreshadows that we have to take good care when modelling our overall strategy for exceptional flow. Because of this, we'll start this section with an introduction of the fail fast strategy, which is a perfect match to the test-first approach. The second part of the section will show you how to deal with checked exceptions thrown by collaborators. Fail fast Until now, we've learned that exceptions can serve in corner cases as an expected outcome, which we need to verify with tests. As an example, we've changed the behavior of our timeline fetch-count setter. The new version throws IllegalArgumentException if the given parameter is out of range. While we've explained how to test this, you may have wondered whether throwing an exception is actually an improvement. On the contrary, you might think, doesn't the exception make our program more fragile as it bears the risk of an ugly error popping up or even of crashing the entire application? Aren't those things we want to prevent by all means? So, wouldn't it be better to stick with the old version and silently ignore arguments that are out of range? At first sight, this may sound reasonable, but doing so is ostrich-like head-in-the-sand behavior. According to the motto: if we can't see them, they aren't there, and so, they can't hurt us. Ignoring an input that is obviously wrong can lead to misbehavior of our software system later on. The reason for the problem is probably much harder to track down compared to an immediate failure. Generally speaking, this practice disconnects the effects of a problem from its cause. As a consequence, you often have to deal with stack traces leading to dead ends or worse. Consider, for example, that we'd initialize the timeline fetch-count as an invariant employed by a constructor argument. Moreover, the value we use would be negative and silently ignored by the component. In addition, our application would make some item position calculations based on this value. Sure enough, the calculation results would be faulty. If we're lucky, an exception would be thrown, when, for example, trying to access a particular item based on these calculations. However, the given stack trace would reveal nothing about the reason that originally led to the situation. However, if we're unlucky, the misbehavior will not be detected until the software has been released to end users. On the other hand, with the new version of setFetchCount, this kind of translocated problem can never occur. A failure trace would point directly to the initial programming mistake, hence avoiding follow-up issues. This means failing immediately and visibly increases robustness due to short feedback cycles and pertinent exceptions. Jim Shore has given this design strategy the name fail fast, [SHOR04]. Shore points out that the heart of fail fast are assertions. Similar to the JUnit assert statements, an assertion fails on a condition that isn't met. Typical assertions might be not-null-checks, in-range-checks, and so on. But how do we decide if it's necessary to fail fast? While assertions of input arguments are apparently a potential use case scenario, checking of return values or invariants may also be so. Sometimes, such conditions are described in code comments, such as // foo should never be null because..., which is a clear indication that suggests to replace the note with an appropriate assertion. See the next snippet demonstrating the principle: public void doWithAssert() { [...] boolean condition = ...; // check some invariant if( !condition ) { throw new IllegalStateException( "Condition not met." ) } [...] } But be careful not to overdo things because in most cases, code will fail fast by default. So, you don't have to include a not-null-check after each and every variable assignment for example. Such paranoid programming styles decrease readability for no value-add at all. A last point to consider is your overall exception-handling strategy. The intention of assertions is to reveal programming or configuration mistakes as early as possible. Because of this, we strictly make use of runtime exception types only. Catching exceptions at random somewhere up the call stack of course thwarts the whole purpose of this approach. So, beware of the absurd try-catch-log pattern that you often see scattered all over the code of scrubs, and which is demonstrated in the next listing as a deterrent only: private Data readSomeData() { try { return source.readData(); } catch( Exception hardLuck ) { // NEVER DO THIS! hardLuck.printStackTrace(); } return null; } The sample code projects exceptional flow to null return values and disguises the fact that something seriously went wrong. It surely does not get better using a logging framework or even worse, by swallowing the exception completely. Analysis of an error by means of stack trace logs is cumbersome and often fault-prone. In particular, this approach usually leads to logs jammed with ignored traces, where one more or less does not attract attention. In such an environment, it's like looking for a needle in a haystack when trying to find out why a follow-up problem occurs. Instead, use the central exception handling mechanism at reasonable boundaries. You can create a bottom level exception handler around a GUI's message loop. Ensure that background threads report problems appropriately or secure event notification mechanisms for example. Otherwise, you shouldn't bother with exception handling in your code. As outlined in the next paragraph, securing resource management with try-finally should most of the time be sufficient. The stubbing of exceptional behavior Every now and then, we come across collaborators, which declare checked exceptions in some or all of their method signatures. There is a debate going on for years now whether or not checked exceptions are evil, [HEVEEC]. However, in our daily work, we simply can't elude them as they pop up in adapters around third-party code or get burnt in legacy code we aren't able to change. So, what are the options we have in these situations? "It is funny how people think that the important thing about exceptions is handling them. That's not the important thing about exceptions. In a well-written application there's a ratio of ten to one, in my opinion, of try finally to try catch."                                                                            – Anders Hejlsberg, [HEVEEC] Cool. This means that we also declare the exception type in question on our own method signature and let someone else up on the call stack solve the tricky things, right? Although it makes life easier for us for at the moment, acting like this is probably not the brightest idea. If everybody follows that strategy, the higher we get on the stack, the more exception types will occur. This doesn't scale well and even worse, it exposes details from the depths of the call hierarchy. Because of this, people sometimes simplify things by declaring java.lang.Exception as thrown type. Indeed, this gets them rid of the throws declaration tail. But it's also a pauper's oath as it reduces the Java type concept to absurdity. Fair enough. So, we're presumably better off when dealing with checked exceptions as soon as they occur. But hey, wouldn't this contradict Hejlsberg's statement? And what shall we do with the gatecrasher, meaning is there always a reasonable handling approach? Fortunately there is, and it absolutely conforms with the quote and the preceding fail fast discussion. We envelope the caught checked exception into an appropriate runtime exception, which we afterwards throw instead. This way, every caller of our component's functionality can use it without worrying about exception handling. If necessary, it is sufficient to use a try-finally block to ensure the disposal or closure of open resources for example. As described previously, we leave exception handling to bottom line handlers around the message loop or the like. Now that we know what we have to do, the next question is how can we achieve this with tests? Luckily, with the knowledge about stubs, you're almost there. Normally handling a checked exception represents a boundary condition. We can regard the thrown exception as an indirect input to our SUT. All we have to do is let the stub throw an expected exception (precondition) and check if the envelope gets delivered properly (postcondition). For better understanding, let's comprehend the steps in our timeline example. We consider for this section that our SessionStorage collaborator declares IOException on its methods for any reason whatsoever. The storage interface is shown in the next listing. public interface SessionStorage { void storeTop( Item top ) throws IOException; Item readTop() throws IOException; } Next, we'll have to write a test that reflects our thoughts. At first, we create an IOException instance that will serve as an indirect input. Looking at the next snippet, you can see how we configure our storage stub to throw this instance on a call to storeTop. As the method does not return anything, the Mockito stubbing pattern looks a bit different than earlier. This time, it starts with the expectation definition. In addition, we use Mockito's any matcher, which defines the exception that should be thrown for those calls to storeTop, where the given argument is assignment-compatible with the specified type token. After this, we're ready to exercise the fetchItems method and capture the actual outcome. We expect it to be an instance of IllegalStateException just to keep things simple. See how we verify that the caught exception wraps the original cause and that the message matches a predefined constant on our component class: @Test public void fetchItemWithExceptionOnStoreTop() throws IOException { IOException cause = new IOException(); doThrow( cause ).when( storage ).storeTop( any( Item.class ) ); Throwable actual = thrownBy( () -> timeline.fetchItems() ); assertNotNull( actual ); assertTrue( actual instanceof IllegalStateException ); assertSame( cause, actual.getCause() ); assertEquals( Timeline.ERROR_STORE_TOP, actual.getMessage() ); } With the test in place, the implementation is pretty easy. Let's assume that we have the item storage extracted to a private timeline method named storeTopItem. It gets called somewhere down the road of fetchItem and again calls a private method, getTopItem. Fixing the compile errors, we end up with a try-catch block because we have to deal with IOException thrown by storeTop. Our first error handling should be empty to ensure that our test case actually fails. The following snippet shows the ultimate version, which will make the test finally pass: static final String ERROR_STORE_TOP = "Unable to save top item"; [...] private void storeTopItem() { try { sessionStorage.storeTop( getTopItem() ); } catch( IOException cause ) { throw new IllegalStateException( ERROR_STORE_TOP, cause ); } } Of course, real-world situations can sometimes be more challenging, for example, when the collaborator throws a mix of checked and runtime exceptions. At times, this results in tedious work. But if the same type of wrapping exception can always be used, the implementation can often be simplified. First, re-throw all runtime exceptions; second, catch exceptions by their common super type and re-throw them embedded within a wrapping runtime exception (the following listing shows the principle): private void storeTopItem() { try { sessionStorage.storeTop( getTopItem() ); } catch( RuntimeException rte ) { throw rte; } catch( Exception cause ) { throw new IllegalStateException( ERROR_STORE_TOP, cause ); } } Summary In this article, you learned how to validate the proper behavior of an SUT with respect to exceptional flow. You experienced how to apply the various capture and verification options, and we discussed their strengths and weaknesses. Supplementary to the test-first approach, you were taught the concepts of the fail fast design strategy and recognized how adapting it increases the overall robustness of applications. Last but not least, we explained how to handle collaborators that throw checked exceptions and how to stub their exceptional bearing. Resources for Article: Further resources on this subject: Progressive Mockito[article] Using Mock Objects to Test Interactions[article] Ensuring Five-star Rating in the MarketPlace [article]
Read more
  • 0
  • 0
  • 2661

article-image-understanding-tdd
Packt
03 Sep 2015
31 min read
Save for later

Understanding TDD

Packt
03 Sep 2015
31 min read
 In this article by Viktor Farcic and Alex Garcia, the authors of the book Test-Driven Java Development, we will go through TDD in a simple procedure of writing tests before the actual implementation. It's an inversion of a traditional approach where testing is performed after the code is written. (For more resources related to this topic, see here.) Red-green-refactor Test-driven development is a process that relies on the repetition of a very short development cycle. It is based on the test-first concept of extreme programming (XP) that encourages simple design with a high level of confidence. The procedure that drives this cycle is called red-green-refactor. The procedure itself is simple and it consists of a few steps that are repeated over and over again: Write a test. Run all tests. Write the implementation code. Run all tests. Refactor. Run all tests. Since a test is written before the actual implementation, it is supposed to fail. If it doesn't, the test is wrong. It describes something that already exists or it was written incorrectly. Being in the green state while writing tests is a sign of a false positive. Tests like these should be removed or refactored. While writing tests, we are in the red state. When the implementation of a test is finished, all tests should pass and then we will be in the green state. If the last test failed, implementation is wrong and should be corrected. Either the test we just finished is incorrect or the implementation of that test did not meet the specification we had set. If any but the last test failed, we broke something and changes should be reverted. When this happens, the natural reaction is to spend as much time as needed to fix the code so that all tests are passing. However, this is wrong. If a fix is not done in a matter of minutes, the best thing to do is to revert the changes. After all, everything worked not long ago. Implementation that broke something is obviously wrong, so why not go back to where we started and think again about the correct way to implement the test? That way, we wasted minutes on a wrong implementation instead of wasting much more time to correct something that was not done right in the first place. Existing test coverage (excluding the implementation of the last test) should be sacred. We change the existing code through intentional refactoring, not as a way to fix recently written code. Do not make the implementation of the last test final, but provide just enough code for this test to pass. Write the code in any way you want, but do it fast. Once everything is green, we have confidence that there is a safety net in the form of tests. From this moment on, we can proceed to refactor the code. This means that we are making the code better and more optimum without introducing new features. While refactoring is in place, all tests should be passing all the time. If, while refactoring, one of the tests failed, refactor broke an existing functionality and, as before, changes should be reverted. Not only that at this stage we are not changing any features, but we are also not introducing any new tests. All we're doing is making the code better while continuously running all tests to make sure that nothing got broken. At the same time, we're proving code correctness and cutting down on future maintenance costs. Once refactoring is finished, the process is repeated. It's an endless loop of a very short cycle. Speed is the key Imagine a game of ping pong (or table tennis). The game is very fast; sometimes it is hard to even follow the ball when professionals play the game. TDD is very similar. TDD veterans tend not to spend more than a minute on either side of the table (test and implementation). Write a short test and run all tests (ping), write the implementation and run all tests (pong), write another test (ping), write implementation of that test (pong), refactor and confirm that all tests are passing (score), and then repeat—ping, pong, ping, pong, ping, pong, score, serve again. Do not try to make the perfect code. Instead, try to keep the ball rolling until you think that the time is right to score (refactor). Time between switching from tests to implementation (and vice versa) should be measured in minutes (if not seconds). It's not about testing T in TDD is often misunderstood. Test-driven development is the way we approach the design. It is the way to force us to think about the implementation and to what the code needs to do before writing it. It is the way to focus on requirements and implementation of just one thing at a time—organize your thoughts and better structure the code. This does not mean that tests resulting from TDD are useless—it is far from that. They are very useful and they allow us to develop with great speed without being afraid that something will be broken. This is especially true when refactoring takes place. Being able to reorganize the code while having the confidence that no functionality is broken is a huge boost to the quality. The main objective of test-driven development is testable code design with tests as a very useful side product. Testing Even though the main objective of test-driven development is the approach to code design, tests are still a very important aspect of TDD and we should have a clear understanding of two major groups of techniques as follows: Black-box testing White-box testing The black-box testing Black-box testing (also known as functional testing) treats software under test as a black-box without knowing its internals. Tests use software interfaces and try to ensure that they work as expected. As long as functionality of interfaces remains unchanged, tests should pass even if internals are changed. Tester is aware of what the program should do, but does not have the knowledge of how it does it. Black-box testing is most commonly used type of testing in traditional organizations that have testers as a separate department, especially when they are not proficient in coding and have difficulties understanding it. This technique provides an external perspective on the software under test. Some of the advantages of black-box testing are as follows: Efficient for large segments of code Code access, understanding the code, and ability to code are not required Separation between user's and developer's perspectives Some of the disadvantages of black-box testing are as follows: Limited coverage, since only a fraction of test scenarios is performed Inefficient testing due to tester's lack of knowledge about software internals Blind coverage, since tester has limited knowledge about the application If tests are driving the development, they are often done in the form of acceptance criteria that is later used as a definition of what should be developed. Automated black-box testing relies on some form of automation such as behavior-driven development (BDD). The white-box testing White-box testing (also known as clear-box testing, glass-box testing, transparent-box testing, and structural testing) looks inside the software that is being tested and uses that knowledge as part of the testing process. If, for example, an exception should be thrown under certain conditions, a test might want to reproduce those conditions. White-box testing requires internal knowledge of the system and programming skills. It provides an internal perspective on the software under test. Some of the advantages of white-box testing are as follows: Efficient in finding errors and problems Required knowledge of internals of the software under test is beneficial for thorough testing Allows finding hidden errors Programmers introspection Helps optimizing the code Due to the required internal knowledge of the software, maximum coverage is obtained Some of the disadvantages of white-box testing are as follows: It might not find unimplemented or missing features Requires high-level knowledge of internals of the software under test Requires code access Tests are often tightly coupled to the implementation details of the production code, causing unwanted test failures when the code is refactored. White-box testing is almost always automated and, in most cases, has the form of unit tests. When white-box testing is done before the implementation, it takes the form of TDD. The difference between quality checking and quality assurance The approach to testing can also be distinguished by looking at the objectives they are trying to accomplish. Those objectives are often split between quality checking (QC) and quality assurance (QA). While quality checking is focused on defects identification, quality assurance tries to prevent them. QC is product-oriented and intends to make sure that results are as expected. On the other hand, QA is more focused on processes that assure that quality is built-in. It tries to make sure that correct things are done in the correct way. While quality checking had a more important role in the past, with the emergence of TDD, acceptance test-driven development (ATDD), and later on behavior-driven development (BDD), focus has been shifting towards quality assurance. Better tests No matter whether one is using black-box, white-box, or both types of testing, the order in which they are written is very important. Requirements (specifications and user stories) are written before the code that implements them. They come first so they define the code, not the other way around. The same can be said for tests. If they are written after the code is done, in a certain way, that code (and the functionalities it implements) is defining tests. Tests that are defined by an already existing application are biased. They have a tendency to confirm what code does, and not to test whether client's expectations are met, or that the code is behaving as expected. With manual testing, that is less the case since it is often done by a siloed QC department (even though it's often called QA). They tend to work on tests' definition in isolation from developers. That in itself leads to bigger problems caused by inevitably poor communication and the police syndrome where testers are not trying to help the team to write applications with quality built-in, but to find faults at the end of the process. The sooner we find problems, the cheaper it is to fix them. Tests written in the TDD fashion (including its flavors such as ATDD and BDD) are an attempt to develop applications with quality built-in from the very start. It's an attempt to avoid having problems in the first place. Mocking In order for tests to run fast and provide constant feedback, code needs to be organized in such a way that the methods, functions, and classes can be easily replaced with mocks and stubs. A common word for this type of replacements of the actual code is test double. Speed of the execution can be severely affected with external dependencies; for example, our code might need to communicate with the database. By mocking external dependencies, we are able to increase that speed drastically. Whole unit tests suite execution should be measured in minutes, if not seconds. Designing the code in a way that it can be easily mocked and stubbed, forces us to better structure that code by applying separation of concerns. More important than speed is the benefit of removal of external factors. Setting up databases, web servers, external APIs, and other dependencies that our code might need, is both time consuming and unreliable. In many cases, those dependencies might not even be available. For example, we might need to create a code that communicates with a database and have someone else create a schema. Without mocks, we would need to wait until that schema is set. With or without mocks, the code should be written in a way that we can easily replace one dependency with another. Executable documentation Another very useful aspect of TDD (and well-structured tests in general) is documentation. In most cases, it is much easier to find out what the code does by looking at tests than the implementation itself. What is the purpose of some methods? Look at the tests associated with it. What is the desired functionality of some part of the application UI? Look at the tests associated with it. Documentation written in the form of tests is one of the pillars of TDD and deserves further explanation. The main problem with (traditional) software documentation is that it is not up to date most of the time. As soon as some part of the code changes, the documentation stops reflecting the actual situation. This statement applies to almost any type of documentation, with requirements and test cases being the most affected. The necessity to document code is often a sign that the code itself is not well written.Moreover, no matter how hard we try, documentation inevitably gets outdated. Developers shouldn't rely on system documentation because it is almost never up to date. Besides, no documentation can provide as detailed and up-to-date description of the code as the code itself. Using code as documentation, does not exclude other types of documents. The key is to avoid duplication. If details of the system can be obtained by reading the code, other types of documentation can provide quick guidelines and a high-level overview. Non-code documentation should answer questions such as what the general purpose of the system is and what technologies are used by the system. In many cases, a simple README is enough to provide the quick start that developers need. Sections such as project description, environment setup, installation, and build and packaging instructions are very helpful for newcomers. From there on, code is the bible. Implementation code provides all needed details while test code acts as the description of the intent behind the production code. Tests are executable documentation with TDD being the most common way to create and maintain it. Assuming that some form of Continuous Integration (CI) is in use, if some part of test-documentation is incorrect, it will fail and be fixed soon afterwards. CI solves the problem of incorrect test-documentation, but it does not ensure that all functionality is documented. For this reason (among many others), test-documentation should be created in the TDD fashion. If all functionality is defined as tests before the implementation code is written and execution of all tests is successful, then tests act as a complete and up-to-date information that can be used by developers. What should we do with the rest of the team? Testers, customers, managers, and other non coders might not be able to obtain the necessary information from the production and test code. As we saw earlier, two most common types of testing are black-box and white-box testing. This division is important since it also divides testers into those who do know how to write or at least read code (white-box testing) and those who don't (black-box testing). In some cases, testers can do both types. However, more often than not, they do not know how to code so the documentation that is usable for developers is not usable for them. If documentation needs to be decoupled from the code, unit tests are not a good match. That is one of the reasons why BDD came in to being. BDD can provide documentation necessary for non-coders, while still maintaining the advantages of TDD and automation. Customers need to be able to define new functionality of the system, as well as to be able to get information about all the important aspects of the current system. That documentation should not be too technical (code is not an option), but it still must be always up to date. BDD narratives and scenarios are one of the best ways to provide this type of documentation. Ability to act as acceptance criteria (written before the code), be executed frequently (preferably on every commit), and be written in natural language makes BDD stories not only always up to date, but usable by those who do not want to inspect the code. Documentation is an integral part of the software. As with any other part of the code, it needs to be tested often so that we're sure that it is accurate and up to date. The only cost-effective way to have accurate and up-to-date information is to have executable documentation that can be integrated into your continuous integration system. TDD as a methodology is a good way to move towards this direction. On a low level, unit tests are a best fit. On the other hand, BDD provides a good way to work on a functional level while maintaining understanding accomplished using natural language. No debugging We (authors of this article) almost never debug applications we're working on! This statement might sound pompous, but it's true. We almost never debug because there is rarely a reason to debug an application. When tests are written before the code and the code coverage is high, we can have high confidence that the application works as expected. This does not mean that applications written using TDD do not have bugs—they do. All applications do. However, when that happens, it is easy to isolate them by simply looking for the code that is not covered with tests. Tests themselves might not include some cases. In that situation, the action is to write additional tests. With high code coverage, finding the cause of some bug is much faster through tests than spending time debugging line by line until the culprit is found. With all this in mind, let's go through the TDD best practices. Best practices Coding best practices are a set of informal rules that the software development community has learned over time, which can help improve the quality of software. While each application needs a level of creativity and originality (after all, we're trying to build something new or better), coding practices help us avoid some of the problems others faced before us. If you're just starting with TDD, it is a good idea to apply some (if not all) of the best practices generated by others. For easier classification of test-driven development best practices, we divided them into four categories: Naming conventions Processes Development practices Tools As you'll see, not all of them are exclusive to TDD. Since a big part of test-driven development consists of writing tests, many of the best practices presented in the following sections apply to testing in general, while others are related to general coding best practices. No matter the origin, all of them are useful when practicing TDD. Take the advice with a certain dose of skepticism. Being a great programmer is not only about knowing how to code, but also about being able to decide which practice, framework or style best suits the project and the team. Being agile is not about following someone else's rules, but about knowing how to adapt to circumstances and choose the best tools and practices that suit the team and the project. Naming conventions Naming conventions help to organize tests better, so that it is easier for developers to find what they're looking for. Another benefit is that many tools expect that those conventions are followed. There are many naming conventions in use, and those presented here are just a drop in the ocean. The logic is that any naming convention is better than none. Most important is that everyone on the team knows what conventions are being used and are comfortable with them. Choosing more popular conventions has the advantage that newcomers to the team can get up to speed fast since they can leverage existing knowledge to find their way around. Separate the implementation from the test code Benefits: It avoids accidentally packaging tests together with production binaries; many build tools expect tests to be in a certain source directory. Common practice is to have at least two source directories. Implementation code should be located in src/main/java and test code in src/test/java. In bigger projects, the number of source directories can increase but the separation between implementation and tests should remain as is. Build tools such as Gradle and Maven expect source directories separation as well as naming conventions. You might have noticed that the build.gradle files that we used throughout this article did not have explicitly specified what to test nor what classes to use to create a .jar file. Gradle assumes that tests are in src/test/java and that the implementation code that should be packaged into a jar file is in src/main/java. Place test classes in the same package as implementation Benefits: Knowing that tests are in the same package as the code helps finding code faster. As stated in the previous practice, even though packages are the same, classes are in the separate source directories. All exercises throughout this article followed this convention. Name test classes in a similar fashion to the classes they test Benefits: Knowing that tests have a similar name to the classes they are testing helps in finding the classes faster. One commonly used practice is to name tests the same as the implementation classes, with the suffix Test. If, for example, the implementation class is TickTackToe, the test class should be TickTackToeTest. However, in all cases, with the exception of those we used throughout the refactoring exercises, we prefer the suffix Spec. It helps to make a clear distinction that test methods are primarily created as a way to specify what will be developed. Testing is a great subproduct of those specifications. Use descriptive names for test methods Benefits: It helps in understanding the objective of tests. Using method names that describe tests is beneficial when trying to figure out why some tests failed or when the coverage should be increased with more tests. It should be clear what conditions are set before the test, what actions are performed and what is the expected outcome. There are many different ways to name test methods and our preferred method is to name them using the Given/When/Then syntax used in the BDD scenarios. Given describes (pre)conditions, When describes actions, and Then describes the expected outcome. If some test does not have preconditions (usually set using @Before and @BeforeClass annotations), Given can be skipped. Let's take a look at one of the specifications we created for our TickTackToe application:   @Test public void whenPlayAndWholeHorizontalLineThenWinner() { ticTacToe.play(1, 1); // X ticTacToe.play(1, 2); // O ticTacToe.play(2, 1); // X ticTacToe.play(2, 2); // O String actual = ticTacToe.play(3, 1); // X assertEquals("X is the winner", actual); } Just by reading the name of the method, we can understand what it is about. When we play and the whole horizontal line is populated, then we have a winner. Do not rely only on comments to provide information about the test objective. Comments do not appear when tests are executed from your favorite IDE nor do they appear in reports generated by CI or build tools. Processes TDD processes are the core set of practices. Successful implementation of TDD depends on practices described in this section. Write a test before writing the implementation code Benefits: It ensures that testable code is written; ensures that every line of code gets tests written for it. By writing or modifying the test first, the developer is focused on requirements before starting to work on the implementation code. This is the main difference compared to writing tests after the implementation is done. The additional benefit is that with the tests written first, we are avoiding the danger that the tests work as quality checking instead of quality assurance. We're trying to ensure that quality is built in as opposed to checking later whether we met quality objectives. Only write new code when the test is failing Benefits: It confirms that the test does not work without the implementation. If tests are passing without the need to write or modify the implementation code, then either the functionality is already implemented or the test is defective. If new functionality is indeed missing, then the test always passes and is therefore useless. Tests should fail for the expected reason. Even though there are no guarantees that the test is verifying the right thing, with fail first and for the expected reason, confidence that verification is correct should be high. Rerun all tests every time the implementation code changes Benefits: It ensures that there is no unexpected side effect caused by code changes. Every time any part of the implementation code changes, all tests should be run. Ideally, tests are fast to execute and can be run by the developer locally. Once code is submitted to version control, all tests should be run again to ensure that there was no problem due to code merges. This is specially important when more than one developer is working on the code. Continuous integration tools such as Jenkins (http://jenkins-ci.org/), Hudson (http://hudson-ci.org/), Travis (https://travis-ci.org/), and Bamboo (https://www.atlassian.com/software/bamboo) should be used to pull the code from the repository, compile it, and run tests. All tests should pass before a new test is written Benefits: The focus is maintained on a small unit of work; implementation code is (almost) always in working condition. It is sometimes tempting to write multiple tests before the actual implementation. In other cases, developers ignore problems detected by existing tests and move towards new features. This should be avoided whenever possible. In most cases, breaking this rule will only introduce technical debt that will need to be paid with interest. One of the goals of TDD is that the implementation code is (almost) always working as expected. Some projects, due to pressures to reach the delivery date or maintain the budget, break this rule and dedicate time to new features, leaving the task of fixing the code associated with failed tests for later. These projects usually end up postponing the inevitable. Refactor only after all tests are passing Benefits: This type of refactoring is safe. If all implementation code that can be affected has tests and they are all passing, it is relatively safe to refactor. In most cases, there is no need for new tests. Small modifications to existing tests should be enough. The expected outcome of refactoring is to have all tests passing both before and after the code is modified. Development practices Practices listed in this section are focused on the best way to write tests. Write the simplest code to pass the test Benefits: It ensures cleaner and clearer design; avoids unnecessary features. The idea is that the simpler the implementation, the better and easier it is to maintain the product. The idea adheres to the keep it simple stupid (KISS) principle. This states that most systems work best if they are kept simple rather than made complex; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided. Write assertions first, act later Benefits: This clarifies the purpose of the requirements and tests early. Once the assertion is written, the purpose of the test is clear and the developer can concentrate on the code that will accomplish that assertion and, later on, on the actual implementation. Minimize assertions in each test Benefits: This avoids assertion roulette; allows execution of more asserts. If multiple assertions are used within one test method, it might be hard to tell which of them caused a test failure. This is especially common when tests are executed as part of the continuous integration process. If the problem cannot be reproduced on a developer's machine (as may be the case if the problem is caused by environmental issues), fixing the problem may be difficult and time consuming. When one assert fails, execution of that test method stops. If there are other asserts in that method, they will not be run and information that can be used in debugging is lost. Last but not least, having multiple asserts creates confusion about the objective of the test. This practice does not mean that there should always be only one assert per test method. If there are other asserts that test the same logical condition or unit of functionality, they can be used within the same method. Let's go through few examples: @Test public final void whenOneNumberIsUsedThenReturnValueIsThatSameNumber() { Assert.assertEquals(3, StringCalculator.add("3")); } @Test public final void whenTwoNumbersAreUsedThenReturnValueIsTheirSum() { Assert.assertEquals(3+6, StringCalculator.add("3,6")); } The preceding code contains two specifications that clearly define what the objective of those tests is. By reading the method names and looking at the assert, there should be clarity on what is being tested. Consider the following for example: @Test public final void whenNegativeNumbersAreUsedThenRuntimeExceptionIsThrown() { RuntimeException exception = null; try { StringCalculator.add("3,-6,15,-18,46,33"); } catch (RuntimeException e) { exception = e; } Assert.assertNotNull("Exception was not thrown", exception); Assert.assertEquals("Negatives not allowed: [-6, -18]", exception.getMessage()); } This specification has more than one assert, but they are testing the same logical unit of functionality. The first assert is confirming that the exception exists, and the second that its message is correct. When multiple asserts are used in one test method, they should all contain messages that explain the failure. This way debugging the failed assert is easier. In the case of one assert per test method, messages are welcome, but not necessary since it should be clear from the method name what the objective of the test is. @Test public final void whenAddIsUsedThenItWorks() { Assert.assertEquals(0, StringCalculator.add("")); Assert.assertEquals(3, StringCalculator.add("3")); Assert.assertEquals(3+6, StringCalculator.add("3,6")); Assert.assertEquals(3+6+15+18+46+33, StringCalculator.add("3,6,15,18,46,33")); Assert.assertEquals(3+6+15, StringCalculator.add("3,6n15")); Assert.assertEquals(3+6+15, StringCalculator.add("//;n3;6;15")); Assert.assertEquals(3+1000+6, StringCalculator.add("3,1000,1001,6,1234")); } This test has many asserts. It is unclear what the functionality is, and if one of them fails, it is unknown whether the rest would work or not. It might be hard to understand the failure when this test is executed through some of the CI tools. Do not introduce dependencies between tests Benefits: The tests work in any order independently, whether all or only a subset is run Each test should be independent from the others. Developers should be able to execute any individual test, a set of tests, or all of them. Often, due to the test runner's design, there is no guarantee that tests will be executed in any particular order. If there are dependencies between tests, they might easily be broken with the introduction of new ones. Tests should run fast Benefits: These tests are used often. If it takes a lot of time to run tests, developers will stop using them or run only a small subset related to the changes they are making. The benefit of fast tests, besides fostering their usage, is quick feedback. The sooner the problem is detected, the easier it is to fix it. Knowledge about the code that produced the problem is still fresh. If the developer already started working on the next feature while waiting for the completion of the execution of the tests, he might decide to postpone fixing the problem until that new feature is developed. On the other hand, if he drops his current work to fix the bug, time is lost in context switching. Tests should be so quick that developers can run all of them after each change without getting bored or frustrated. Use test doubles Benefits: This reduces code dependency and test execution will be faster. Mocks are prerequisites for fast execution of tests and ability to concentrate on a single unit of functionality. By mocking dependencies external to the method that is being tested, the developer is able to focus on the task at hand without spending time in setting them up. In the case of bigger teams, those dependencies might not even be developed. Also, the execution of tests without mocks tends to be slow. Good candidates for mocks are databases, other products, services, and so on. Use set-up and tear-down methods Benefits: This allows set-up and tear-down code to be executed before and after the class or each method. In many cases, some code needs to be executed before the test class or before each method in a class. For this purpose, JUnit has @BeforeClass and @Before annotations that should be used as the setup phase. @BeforeClass executes the associated method before the class is loaded (before the first test method is run). @Before executes the associated method before each test is run. Both should be used when there are certain preconditions required by tests. The most common example is setting up test data in the (hopefully in-memory) database. At the opposite end are @After and @AfterClass annotations, which should be used as the tear-down phase. Their main purpose is to destroy data or a state created during the setup phase or by the tests themselves. As stated in one of the previous practices, each test should be independent from the others. Moreover, no test should be affected by the others. Tear-down phase helps to maintain the system as if no test was previously executed. Do not use base classes in tests Benefits: It provides test clarity. Developers often approach test code in the same way as implementation. One of the common mistakes is to create base classes that are extended by tests. This practice avoids code duplication at the expense of tests clarity. When possible, base classes used for testing should be avoided or limited. Having to navigate from the test class to its parent, parent of the parent, and so on in order to understand the logic behind tests introduces often unnecessary confusion. Clarity in tests should be more important than avoiding code duplication. Tools TDD, coding and testing in general, are heavily dependent on other tools and processes. Some of the most important ones are as follows. Each of them is too big a topic to be explored in this article, so they will be described only briefly. Code coverage and Continuous integration (CI) Benefits: It gives assurance that everything is tested Code coverage practice and tools are very valuable in determining that all code, branches, and complexity is tested. Some of the tools are JaCoCo (http://www.eclemma.org/jacoco/), Clover (https://www.atlassian.com/software/clover/overview), and Cobertura (http://cobertura.github.io/cobertura/). Continuous Integration (CI) tools are a must for all except the most trivial projects. Some of the most used tools are Jenkins (http://jenkins-ci.org/), Hudson (http://hudson-ci.org/), Travis (https://travis-ci.org/), and Bamboo (https://www.atlassian.com/software/bamboo). Use TDD together with BDD Benefits: Both developer unit tests and functional customer facing tests are covered. While TDD with unit tests is a great practice, in many cases, it does not provide all the testing that projects need. TDD is fast to develop, helps the design process, and gives confidence through fast feedback. On the other hand, BDD is more suitable for integration and functional testing, provides better process for requirement gathering through narratives, and is a better way of communicating with clients through scenarios. Both should be used, and together they provide a full process that involves all stakeholders and team members. TDD (based on unit tests) and BDD should be driving the development process. Our recommendation is to use TDD for high code coverage and fast feedback, and BDD as automated acceptance tests. While TDD is mostly oriented towards white-box, BDD often aims at black-box testing. Both TDD and BDD are trying to focus on quality assurance instead of quality checking. Summary You learned that it is a way to design the code through short and repeatable cycle called red-green-refactor. Failure is an expected state that should not only be embraced, but enforced throughout the TDD process. This cycle is so short that we move from one phase to another with great speed. While code design is the main objective, tests created throughout the TDD process are a valuable asset that should be utilized and severely impact on our view of traditional testing practices. We went through the most common of those practices such as white-box and black-box testing, tried to put them into the TDD perspective, and showed benefits that they can bring to each other. You discovered that mocks are a very important tool that is often a must when writing tests. Finally, we discussed how tests can and should be utilized as executable documentation and how TDD can make debugging much less necessary. Now that we are armed with theoretical knowledge, it is time to set up the development environment and get an overview and comparison of different testing frameworks and tools. Now we will walk you through all the TDD best practices in detail and refresh the knowledge and experience you gained throughout this article. Resources for Article: Further resources on this subject: RESTful Services JAX-RS 2.0[article] Java Refactoring in NetBeans[article] Developing a JavaFX Application for iOS [article]
Read more
  • 0
  • 0
  • 2942
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-meeting-sap-lumira
Packt
02 Sep 2015
12 min read
Save for later

Meeting SAP Lumira

Packt
02 Sep 2015
12 min read
In this article by Dmitry Anoshin, author of the book SAP Lumira Essentials, Dmitry talks about living in a century of information technology. There are a lot of electronic devices around us which generate lots of data. For example, you can surf the Internet, visit a couple of news portals, order new Nike Air Max shoes from a web store, write a couple of messages to your friend, and chat on Facebook. Your every action produces data. We can multiply that action by the amount of people who have access to the internet or just use a cell phone, and we get really BIG DATA. Of course, you have a question: how big is it? Now, it starts from terabytes or even petabytes. The volume is not the only issue; moreover, we struggle with the variety of data. As a result, it is not enough to analyze only the structured data. We should dive deep in to unstructured data, such as machine data which are generated by various machines. (For more resources related to this topic, see here.) Nowadays, we should have a new core competence—dealing with Big Data—, because these vast data volumes won't be just stored, they need to be analysed and mined for information that management can use in order to make right business decisions. This helps to make the business more competitive and efficient. Unfortunately, in modern organizations there are still many manual steps needed in order to get data and try to answer your business questions. You need the help of your IT guys, or need to wait until new data is available in your enterprise data warehouse. In addition, you are often working with an inflexible BI tool, which can only refresh a report or export it in to Excel. You definitely need a new approach, which gives you a competitive advantage, dramatically reduces errors, and accelerates business decisions. So, we can highlight some of the key points for this kind of analytics: Integrating data from heterogeneous systems Giving more access to data Using sophisticated analytics Reducing manual coding Simplifying processes Reducing time to prepare data Focusing on self-service Leveraging powerful computing resources We could continue this list with many other bullet points. If you are a fan of traditional BI tools, you may think that it is almost impossible. Yes, you are right, it is impossible. That's why we need to change the rules of the game. As the business world changes, you must change as well. Maybe you have guessed what this means, but if not, I can help you. I will focus on a new approach of doing data analytics, which is more flexible and powerful. It is called data discovery. Of course, we need the right way in order to overcome all the challenges of the modern world. That's why we have chosen SAP Lumira—one of the most powerful data discovery tools in the modern market. But before diving deep into this amazing tool, let's consider some of the challenges of data discovery that are in our path, as well as data discovery advantages. Data discovery challenges Let's imagine that you have several terabytes of data. Unfortunately, it is raw unstructured data. In order to get business insight from this data you have to spend a lot of time in order to prepare and clean the data. In addition, you are restricted by the capabilities of your machine. That's why a good data discovery tool usually is combined of software and hardware. As a result, this gives you more power for exploratory data analysis. Let's imagine that this entire Big Data store is in Hadoop or any NoSQL data store. You have to at least be at good programmer in order to do analytics on this data. Here we can find other benefit of a good data discovery tool: it gives a powerful tool to business users, who are not as technical and maybe don't even know SQL. Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resilience of these clusters comes from the software's ability to detect and handle failures at the application layer. A NoSQL data store is a next generation database, mostly addressing some of the following points: non-relational, distributed, open-source, and horizontally scalable. Data discovery versus business intelligence You may be confused about data discovery and business intelligence technologies; it seems they are very close to each other or even BI tools can do all what data discovery can do. And why do we need a separate data discovery tool, such as, SAP Lumira? In order to better understand the difference between the two technologies, you can look at the table below:   Enterprise BI Data discovery Key users All users Advanced analysts Approach Vertically-oriented (top to bottom), semantic layers, requests to existing repositories Vertically-oriented (bottom-up), mushup, putting data in the selected repository Interface Reports, dashboards Visualization Users Reporting Analysis Implementation By IT consultants By business users Let's consider the pros and cons of data discovery: Pros: Rapidly analyze data with a short shelf life Ideal for small teams Best for tactical analysis Great for answering on-off questions quickly Cons: Difficult to handle for enterprise organizations Difficult for junior users Lack of scalability As a result, it is clear that BI and data discovery handles their own tasks and complement each other. The role of data discovery Most organizations have a data warehouse. It was planned to supporting daily operations and to help make business decisions. But sometimes organizations need to meet new challenges. For example, Retail Company wants to improve their customer experience and decide to work closely with the customer database. Analysts try to segment customers into cohorts and try to analyse customer's behavior. They need to handle all customer data, which is quite big. In addition, they can use external data in order to learn more about their customers. If they start to use a corporate BI tool, every interaction, such as adding new a field or filter, can take 10-30 minutes. Another issue is adding a new field to an existing report. Usually, it is impossible without the help of IT staff, due to security or the complexities of the BI Enterprise solution. This is unacceptable in a modern business. Analysts want get an answer to their business questions immediately, and they prefer to visualize data because, as you know, human perception of visualization is much higher than text. In addition, these analysts may be independent from IT. They have their data discovery tool and they can connect to any data sources in the organization and check their crazy hypotheses. There are hundreds of examples where BI and DWH is weak, and data discovery is strong. Introducing SAP Lumira Starting from this point, we will focus on learning SAP Lumira. First of all, we need to understand what SAP Lumira is exactly. SAP Lumira is a family of data discovery tools which give us an opportunity to create amazing visualizations or even tell fantastic stories based on our big or small data. We can connect most of the popular data sources, such as Relational Database Management Systems (RDBMSs), flat files, excel spreadsheets or SAP applications. We are able to create datasets with measures, dimensions, hierarchies, or variables. In addition, Lumira allows us to prepare, edit, and clean our data before it is processed. SAP Lumira offers us a huge arsenal of graphical charts and tables to visualize our data. In addition, we can create data stories or even infographics based on our data by grouping charts, single cells, or tables together on boards to create presentation- style dashboards. Moreover, we can add images or text in order to add details. The following are the three main products in the Lumira family offered by SAP: SAP Lumira Desktop SAP Lumira Server SAP Lumira Cloud Lumira Desktop can be either a personal edition or a standard edition. Both of them give you the opportunity to analyse data on your local machine. You can even share your visualizations or insights via PDF or XLS. Lumira Server is also in two variations—Edge and Server. As you know, SAP BusinessObjects also has two types of license for the same software, Edge and Enterprise, and they differ only in terms of the number of users and the type of license. The Edge version is smaller; for example, it can cover the needs of a team or even the whole department. Lumira Cloud is Software as a Service (SaaS). It helps to quickly visualize large volumes of data without having to sacrifice performance or security. It is especially designed to speed time to insight. In addition, it saves time and money with flexible licensing options. Data connectors We met SAP Lumira for the first time and we played with the interface, and the reader could adjust the general settings of SAP Lumira. In addition, we can find this interesting menu in the middle of the window: There are several steps which help us to discover our data and gain business insights. In this article we start from first step by exploring data in SAP Lumira to create a document and acquire a dataset, which can include part or all of the original values from a data source. This is through Acquire Data. Let's click on Acquire Data. This new window will come up: There are four areas on this window. They are: A list of possible data sources (1): Here, the user can connect to his data source. Recently used objects (2): The user can open his previous connections or files. Ordinary buttons (3), such as Previous, Next, Create, and Cancel. This small chat box (4) we can find at almost every page. SAP Lumira cares about the quality of the product and gives the opportunity to the user to make a screen print and send feedback to SAP. Let's go deeper and consider more closely every connection in the table below: Data Source Description Microsoft Excel Excel data sheets Flat file CSV, TXT, LOG, PRN, or TSV SAP HANA There are two possible ways: Offline (downloading data) and Online (connected to SAP HANA) SAP BusinessObjects universe UNV or UNX SQL Databases Query data via SQL from relational databases SAP Business warehouse Downloaded data from a BEx Query or an InfoProvider Let's try to connect some data sources and extract some data from them. Microsoft spreadsheets Let's start with the easiest exercise. For example, our manager of inventory asked us to analyse flop products, which are not popular, and he sent us two excel spreadsheets, Unicorn_flop_products.xls and Unicorn_flop_price.xls. There are two different worksheets because prices and product attributes are in different systems. Both files have a unique field—SKU. As a result, it is possible to merge them by this field and analyse them as one data set. SKU or stock keeping unit is a distinct item for sale, such as a product or service, and them attributes associated with the item distinguish it from other items. For a product, these attributes include, but are not limited to, manufacturer, product description, material, size, color, packaging, and warranty terms. When a business takes inventory, it counts the quantity of each SKU. Connecting to the SAP BO universe Universe is a core thing in the SAP BusinessObjects BI platform. It is the semantic layer that isolates business users from the technical complexities of the databases where their corporate information is stored. For the ease of the end user, universes are made up of objects and classes that map to data in the database, using everyday terms that describe their business environment. Introducing Unicorn Fashion universe The Unicorn Fashion company uses the SAP BusinessObjects BI platform (BIP) as its primary BI tool. There is another Unicorn Fashion universe, which was built based on the unicorn datamart. It has a similar structure and joins as datamart. The following image shows the Unicorn Fashion universe: It unites two business processes: Sales (orange) and Stock (green) and has the following structure in business layer: Product: This specifies the attributes of an SKU, such as brand, category, ant, and so on Price: This denotes the different pricing of the SKU Sales: This specifies the sales business process Order: This denotes the order number, the shipping information, and orders measures Sales Date: This specifies the attributes of order date, such as month, year, and so on Sales Measures: This denotes various aggregated measures, such as shipped items, revenue waterfall, and so on Stock: This specifies the information about the quantity on stock Stock Date: This denotes the attributes of stock date, such as month, year, and so on Summary A step-by-step guide of learning SAP Lumira essentials starting from overview of SAP Lumira family products. We will demonstrate various data discovery techniques using real world scenarios of online ecommerce retailer. Moreover, we have detail recipes of installations, administration and customization of SAP Lumira. In addition, we will show how to work with data starting from acquiring data from various data sources, then preparing it and visualize through rich functionality of SAP Lumira. Finally, it teaches how to present data via data story or infographic and publish it across your organization or world wide web. Learn data discovery techniques, build amazing visualizations, create fantastic stories and share these visualizations through electronic medium with one of the most powerful tool – SAP Lumira. Moreover, we will focus on extracting data from different sources such as plain text, Microsoft Excel spreadsheets, SAP BusinessObjects BI Platform, SAP HANA and SQL databases. Finally, it will teach how to publish result of your painstaking work on various mediums, such as SAP BI Clients, SAP Lumira Cloud and so on. Resources for Article: Further resources on this subject: Creating Mobile Dashboards [article] Report Data Filtering [article] Creating Our First Universe [article]
Read more
  • 0
  • 0
  • 2322

article-image-code-real-world
Packt
02 Sep 2015
22 min read
Save for later

From Code to the Real World

Packt
02 Sep 2015
22 min read
 In this article by Jorge R. Castro, author of Building a Home Security System with Arduino, you will learn to read a technical specification sheet, the ports on the Uno and how they work, the necessary elements for a proof of concept. Finally, we will extend the capabilities of our script in Python and rely on NFC technology. We will cover the following points: ProtoBoards and wiring Signals (digital and analog) Ports Datasheets (For more resources related to this topic, see here.) ProtoBoards and wiring There are many ways of working on and testing electrical components, one of the ways developers use is using ProtoBoards (also known as breadboards). Breadboards help eliminate the need to use soldering while testing out components and prototype circuit schematics, it lets us reuse components, enables rapid development, and allows us to correct mistakes and even improve the circuit design. It is a rectangle case composed of internally interconnected rows and columns; it is possible to couple several together. The holes on the face of the board are designed to in way that you can insert the pins of components to mount it. ProtoBoards may be the intermediate step before making our final PCB design, helping eliminate errors and reduce the associated cost. For more information visit http://en.wikipedia.org/wiki/Breadboard. We know have to know more about the wrong wiring techniques when using a breadboard. There are rules that have always been respected, and if not followed, it might lead to shorting out elements that can irreversibly damage our circuit.   A ProtoBoard and an Arduino Uno The ProtoBoard is shown in the preceding image; we can see there are mainly two areas - the vertical columns (lines that begin with the + and - polarity sign), and the central rows (with a coordinate system of letters and numbers). Both + and - usually connect to the electrical supply of our circuit and to the ground. It is advised not to connect components directly in the vertical columns, instead to use a wire to connect to the central rows of the breadboard. The central row number corresponds to a unique track. If we connected one of the leads of a resistor in the first pin of the first row then other lead should be connected to any other pin in another row, and not in the same row. There is an imaginary axis that divides the breadboard vertically in symmetrical halves, which implies that they are electrically independent. We will use this feature to use certain Integrated Circuits (IC), and they will be mounted astride the division.   Protoboard image obtained using Fritzing. Carefully note that in the preceding picture, the top half of the breadboard shows the components mounted incorrectly, while the lower half shows the components mounted the correct way. A chip is set correctly between the two areas, this usage is common for IC's. Fritzing is a tool that helps us create a quick circuit design of our project. Visit http://fritzing.org/download/ for more details on Fritzing. Analog and digital ports Now that we know about the correct usage of a breadboard, we now have another very important concept - ports. Our board will be composed of various inputs and outputs, the number of which varies with the kind of model we have. But what are ports? The Arduino UNO board has ports on its sides. They are connections that allow it to interact with sensors, actuators, and other devices (even with another Arduino board). The board also has ports that support digital and analog signals. The advantage is that these ports are bidirectional, and the programmers are able to define the behavior of these ports. In the following code, the first part shows that we set the status of the ports that we are going to use. This is the setup function: //CODE void setup(){ // Will run once, and use it to // "prepare" I/0 pins of Arduino pinMode(10, OUTPUT); // put the pin 10 as an output pinMode(11, INPUT); // put the pin 11 as an input } Lets take a look at what the digital and analog ports are on the Arduino UNO. Analog ports Analog signals are signals that continuously vary with time These can be explained as voltages that have continuously varying intermediate values. For example, it may vary from 2V to 3.3V to 3.0V, or 3.333V, which means that the voltages varied progressively with time, and are all different values from each other; the figures between the two values are infinite (theoretical value). This is an interesting property, and that is what we want. For example, if we want to measure the temperature of a room, the temperature measure takes values with decimals, need an analog signal. Otherwise, we will lose data, for example, in decimal values since we are not able to store infinite decimal numbers we perform a mathematical rounding (truncating the number). There is a process called discretization, and it is used to convert analog signals to digital. In reality, in the world of microcontrollers, the difference between two values is not infinite. The Arduino UNO's ports have a range of values between 0 and 1023 to describe an analog input signal. Certain ports, marked as PWM or by a ~, can create output signals that vary between values 0 and 255. Digital ports A digital signal consists of just two values - 0s and 1s. Many electronic devices internally have a range currently established, where voltage values from 3.5 to 5V are considered as a logic 1, and voltages from 0 to 2.5V are considered as a logic 0. To better understand this point, let's see an example of a button that triggers an alarm. The only useful cases for an alarm are when the button is pressed to ring the alarm and when it's not pressed; there are only two states observed. So unlike the case of the temperature sensor, where we can have a range of linear values, here only two values exist. Logically, we use these properties for different purposes. The Arduino IDE has some very illustrative examples. You can load them onto your board to study the concepts of analog and digital signals by navigating to File | Examples | Basic | Blink. Sensors There is a wide range of sensors, and without pretending to be an accurate guide of all possible sensors, we will try to establish some small differences. All physical properties or sets of them has an associated sensor, so we can measure the force (presence of people walking on the floor), movement (physical displacement even in the absence of light), smoke, gas, pictures (yes, the cameras also are sensors) noise (sound recording), antennas (radio waves, WiFi, and NFC), and an incredible list that would need a whole book to explain all its fantastic properties. It is also interesting to note that the sensors can also be specific and concrete; however, they only measure very accurate or nonspecific properties, being able to perceive a set of values but more accurately. If you go to an electronics store or a sales portal buy a humidity sensor (and generally any other electronic item), you will see a sensitive price range. In general, expensive sensors indicate that they are more reliable than their cheaper counterparts, the price also indicates the kinds of conditions that it can work in, and also the duration of operation. Expensive sensors can last more than the cheaper variants. When we started looking for a component, we looked to the datasheets of a particular component (the next point will explain what this document is), you will not be surprised to find two very similar models mentioned in the datasheets. It contains operating characteristics and different prices. Some components are professionally deployed (for instance in the technological and military sectors), while others are designed for general consumer electronics. Here, we will focus on those with average quality and foremost an economic price. We proceed with an example that will serve as a liaison with the following paragraph. To do this, we will use a temperature sensor (TMP-36) and an analog port on the Arduino UNO. The following image shows the circuit schematic:   Our design - designed using Fritzing The following is the code for the preceding circuit schematic: //CODE //########################### //Author : Jorge Reyes Castro //Arduino Home Security Book //########################### // A3 <- Input // Sensor TMP36 //########################### //Global Variable int outPut = 0; float outPutToVol = 0.0; float volToTemp = 0.0; void setup(){ Serial.begin(9600); // Start the SerialPort 9600 bauds } void loop(){ outPut = analogRead(A3); // Take the value from the A3 incoming port outPutToVol = 5.0 *(outPut/1024.0); // This is calculated with the values volToTemp = 100.0 *(outPutToVol - 0.5); // obtained from the datasheet Serial.print("_____________________n"); mprint("Output of the sensor ->", outPut); mprint("Conversion to voltage ->", outPutToVol); mprint("Conversion to temperature ->", volToTemp); delay(1000); // Wait 1 Sec. } // Create our print function // smarter than repeat the code and it will be reusable void mprint(char* text, double value){ // receives a pointer to the text and a numeric value of type double Serial.print(text); Serial.print("t"); // tabulation Serial.print(value); Serial.print("n"); // new line } Now, open a serial port from the IDE or just execute the previous Python script. We can see the output from the Arduino. Enter the following command to run the code as python script: $ python SerialPython.py The output will look like this: ____________________ Output of the sensor 145.00 Conversion to voltage 0.71 Conversion to temperature 20.80 _____________________ By using Python script, which is not simple, we managed to extract some data from a sensor after the signal is processed by the Arduino UNO. It is then extracted and read by the serial interface (script in Python) from the Arduino UNO. At this point, we will be able to do what we want with the retrieved data, we can store them in a database, represent them in a function, create a HTML document, and more. As you may have noticed we make a mathematical calculation. However, from where do we get this data? The answer is in the data sheet. Component datasheets Whenever we need to work with or handle an electrical component, we have to study their properties. For this, there are official documents, of variable length, in which the manufacturer carefully describes the characteristics of that component. First of all, we, have to identify that a component can either use the unique ID or model name that it was given when it was manufactured. TMP 36GZ We have very carefully studied the component surface and thus extracted the model. Then, using an Internet browser, we can quickly find the datasheet. Go to http://www.google.com and search for: TMP36GZ + datasheet, or visit http://dlnmh9ip6v2uc.cloudfront.net/datasheets/Sensors/Temp/TMP35_36_37.pdf to get the datasheet. Once you have the datasheet, you can see that it has various components which look similar, but with very different presentations (this can be crucial for a project, as an error in the choice of encapsulated (coating/presentation) can lead to a bitter surprise). Just as an example, if we confuse the presentation of our circuit, we might end up buying components used in cellphones, and those can hardly be soldered by a standard user as they smaller than your pinkie fingernail). Therefore, an important property to which you must pay attention is the coating/presentation of the components. In the initial pages of the datasheet you see that it shows you the physical aspects of the component. It then show you the technical characteristics, such as the working current, the output voltage, and other characteristics which we must study carefully in order to know whether they will be consistent with our setup. In this case, we are working with a range of input voltage (V) of between 2.7 V to 5.5 V is perfect. Our Arduino has an output of 5V. Furthermore, the temperature range seems appropriate. We do not wish to measure anything below 0V and 5V. Let's study the optimal behavior of the components in temperature ranges in more detail. The behavior of the components is stable only between the desired ranges. This is a very important aspect because although the measurement can be done at the ends supported by the sensor, the error will be much higher and enormously increases the deviation to small variations in ambient temperature. Therefore, always choose or build a circuit with components having the best performance as shown in the optimal region of the graph of the datasheet. Also, always respect and observe that the input voltages do not exceed the specified limit, otherwise it will be irreversibly damaged. To know more about ADC visit http://en.wikipedia.org/wiki/Analog-to-digital_converter. Once we have the temperature sensor, we have to extract the data you need. Therefore, we make a simple calculation supported in the datasheet. As we can see, we feed 5V to the sensor, and it return values between 0 and 1023. So, we use a simple formula that allows us to convert the values obtained in voltage (analog value): Voltage = (Value from the sensor) x (5/1024) The value 1024 is correct, as we have said that the data can range from 0 (including zero as a value) to 1023. This data can be strange to some, but in the world of computer and electronics, zero is seen as a very important value. Therefore, be careful when making calculations. Once we obtain a value in volts, we proceed to convert this measurement in a data in degrees. For this, by using the formula from the datasheet, we can perform the conversion quickly. We use a variable that can store data with decimal point (double or float), otherwise we will be truncating the result and losing valuable information (this may seem strange, but this bug very common). The formula for conversion is Cº = (Voltage -0.5) x 100.0. We now have all the data we need and there is a better implementation of this sensor, in which we can eliminate noise and disturbance from the data. You can dive deeper into these issues if desired. With the preceding explanation, it will not be difficult to achieve. Near Field Communication Near Field Communication (NFC) technology is based on the RFID technology. Basically, the principle consists of two devices fused onto one board, one acting as an antenna to transmit signals and the other that acts as a reader/reciever. It exchanges information using electromagnetic fields. They deal with small pieces of information. It is enough for us to make this extraordinary property almost magical. For more information on RFID and NFC, visit the following links: http://en.wikipedia.org/wiki/Near_field_communication http://en.wikipedia.org/wiki/Radio-frequency_identification Today, we find this technology in identification and access cards (a pioneering example is the Document ID used in countries like Spain), tickets for public transport, credit cards, and even mobile phones (with ability to make payment via NFC). For this project, we will use a popular module PN532 Adafruit RFID/NFC board. Feel free to use any module that suits your needs. I selected this for its value, price, and above all, for its chip. The popular PN532 is largely supported by the community, and there are vast number of publications, tools, and libraries that allow us to integrate it in many projects. For instance, you can use it in conjunction with an Arduino a Raspberry Pi, or directly with your computer (via a USB to serial adapter). Also, the PN532 board comes with a free Mifare 1k card, which we can use for our project. You can also buy tag cards or even key rings that house a small circuit inside, on which information is stored, even encrypted information. One of the benefits, besides the low price of a set of labels, is that we can block access to certain areas of information or even all the information. This allows us in maintaining our valuable data safely or avoid the cloned/duplicate of this security tag. There are a number of standards that allow us to classify the different types of existing cards, one of which is the Mifare 1K card (You can use the example presented here and make small changes to adapt it to other NFC cards). For more information on MIFARE cards, visit http://en.wikipedia.org/wiki/MIFARE. As the name suggests, this model can store up to 1KB (Kilobyte) information, and is of the EEPROM type (can be rewritten). Furthermore, it possesses a serial number unique to each card (this is very interesting because it is a perfect fingerprint that will allow us to distinguish between two cards with the same information). Internally, it is divided into 16 sectors (0-15), and is subdivided into 4 blocks (0-3) with at least 16 bytes of information. For each block, we can set a password that prevents reading your content if it does not pose. (At this point, we are not going to manipulate the key because the user can accidentally, encrypt a card, leaving it unusable). By default, all cards usually come with a default password (ffffffffffff). The reader should consult the link http://www.adafruit.com/product/789, to continue with the examples, or if you select another antenna, search the internet for the same features. The link also has tutorials to make the board compatible with other devices (Raspberry Pi), the datasheet, and the library can be downloaded from Github to use this module. We will have a look at this last point more closely later; just download the libraries for now and follow my steps. It is one of the best advantages of open hardware, that the designs can be changed and improved by anyone. Remember, when handling electrical components, we have to be careful and avoid electrostatic sources. As you have already seen, the module is inserted directly on the Arduino. If soldered pins do not engage directly above then you have to use Dupont male-female connectors to be accessible other unused ports. Once we have the mounted module, we will use the library that is used to control this device (top link) and install it. It is very important that you rename the library to download, eliminate possible blanks or the like. However, the import process may throw an error. Once we have this ready and have our Mifare 1k card close, let's look at a small example that will help us to better understand all this technology and get the UID (Unique Identifier) of our tag: // ########################### // Author : Jorge Reyes Castro // Arduino Home Security Book // ########################### #include <Wire.h> #include <Adafruit_NFCShield_I2C.h> // This is the Adafruit Library //MACROS #define IRQ (2) #define RESET (3) Adafruit_NFCShield_I2C nfc(IRQ, RESET); //Prepare NFC MODULE //SETUP void setup(void){ Serial.begin(115200); //Open Serial Port Serial.print("###### Serial Port Ready ######n"); nfc.begin(); //Start NFC nfc.SAMConfig(); if(!Serial){ //If the serial port don´ work wait delay(500); } } //LOOP void loop(void) { uint8_t success; uint8_t uid[] = { 0, 0, 0, 0}; //Buffer to save the read value //uint8_t ok[] = { X, X, X, X }; //Put your serial Number. * 1 uint8_t uidLength; success = nfc.readPassiveTargetID(PN532_MIFARE_ISO14443A, uid, &uidLength); // READ // If TAG PRESENT if (success) { Serial.println("Found cardn"); Serial.print(" UID Value: t"); nfc.PrintHex(uid, uidLength); //Convert Bytes to Hex. int n = 0; Serial.print("Your Serial Number: t"); // This function show the real value // 4 position runs extracting data while (n < 4){ // Copy and remenber to use in the next example * 1 Serial.print(uid[n]); Serial.print("t"); n++; } Serial.println(""); } delay(1500); //wait 1'5 sec } Now open a serial port from the IDE or just execute the previous Python script. We can see the output from the Arduino. Enter the following command to run the Python code: $ python SerialPython.py The output will look like this: ###### Serial Port Ready ###### Found card UID Value: 0xED 0x05 0xED 0x9A Your Serial Number: 237 5 237 154 So we have our own identification number, but what are those letters that appear above our number? Well, it is hexadecimal, another widely used way to represent numbers and information in the computer. It represents a small set of characters and more numbers in decimal notation than we usually use. (Remember our card with little space. We use hexadecimal to be more useful to save memory). An example is the number 15, we need two characters in tenth, while in decimal just one character is needed f (0xf, this is how the hexadecimal code, preceded by 0x, this is a thing to remember as this would be of great help later on). Open a console and the Python screen. Run the following code to obtain hexadecimal numbers, and will see transformation to decimal (you can replace them with the values previously put in the code): For more information about numbering systems, see http://en.wikipedia.org/wiki/Hexadecimal. $ python >>> 0xED 237 >>> 0x05 5 >>> 0x9A 154 You can see that they are same numbers.   RFID/NFC module and tag Once we are already familiar with the use of this technology, we proceed to increase the difficultly and create our little access control. Be sure to record the serial number of your access card (in decimal). We will make this little montage focus on the use of NFC cards, no authentication key. As mentioned earlier, there is a variant of this exercise and I suggest that the reader complete this on their part, thus increasing the robustness of our assembly. In addition, we add a LED and buzzer to help us improve the usability. Access control The objective is simple: we just have to let people in who have a specific card with a specific serial number. If you want to use encrypted keys, you can change the ID of an encrypted message inside the card with a key known only to the creator of the card. When a successful identification occurs, a green flash lead us to report that someone has had authorized access to the system. In addition, the user is notified of the access through a pleasant sound, for example, will invite you to push the door to open it. Otherwise, an alarm is sounded repeatedly, alerting us of the offense and activating an alert command center (a red flash that is repeated and consistent, which captures the attention of the watchman.) Feel free to add more elements. The diagram shall be as follows:   Our scheme – Image obtained using Fritzing The following is a much clearer representation of the preceding wiring diagram as it avoids the part of the antenna for easy reading:   Our design - Image obtained using Fritzing Once we clear the way to take our design, you can begin to connect all elements and then create our code for the project. The following is the code for the NFC access: //########################### //Author : Jorge Reyes Castro //Arduino Home Security Book //########################### #include <Wire.h> #include <Adafruit_NFCShield_I2C.h> // this is the Adafruit Library //MACROS #define IRQ (2) #define RESET (3) Adafruit_NFCShield_I2C nfc(IRQ, RESET); //Prepare NFC MODULE void setup(void) { pinMode(5, OUTPUT); // PIEZO pinMode(9, OUTPUT); // LED RED pinMode(10, OUTPUT); // LED GREEN okHAL(); Serial.begin(115200); //Open Serial Port Serial.print("###### Serial Port Ready ######n"); nfc.begin(); //Start NFC nfc.SAMConfig(); if(!Serial){ // If the Serial Port don't work wait delay(500); } } void loop(void) { uint8_t success; uint8_t uid[] = { 0, 0, 0, 0}; //Buffer to storage the ID from the read tag uint8_t ok[] = { 237, 5, 237, 154}; //Put your serial Number. uint8_t uidLength; success = nfc.readPassiveTargetID(PN532_MIFARE_ISO14443A, uid, &uidLength); //READ if (success) { okHAL(); // Sound "OK" Serial.println("Found cardn"); Serial.print(" UID Value: t"); nfc.PrintHex(uid, uidLength); // The Value from the UID in HEX int n = 0; Serial.print("Your Serial Number: t"); // The Value from the UID in DEC while (n < 4){ Serial.print(uid[n]); Serial.print("t"); n++; } Serial.println(""); Serial.println(" Wait..n"); // Verification int m = 0, l = 0; while (m < 5){ if(uid[m] == ok[l]){ // Compare the elements one by one from the obtained UID card that was stored previously by us. } else if(m == 4){ Serial.println("###### Authorized ######n"); // ALL OK authOk(); okHAL(); } else{ Serial.println("###### Unauthorized ######n"); // NOT EQUALS ALARM !!! authError(); errorHAL(); m = 6; } m++; l++; } } delay(1500); } //create a function that allows us to quickly create sounds // alarm ( "time to wait" in ms, "tone/power") void alarm(unsigned char wait, unsigned char power ){ analogWrite(5, power); delay(wait); analogWrite(5, 0); } // HAL OK void okHAL(){ alarm(200, 250); alarm(100, 50); alarm(300, 250); } // HAL ERROR void errorHAL(){ int n = 0 ; while(n< 3){ alarm(200, 50); alarm(100, 250); alarm(300, 50); n++; } } // These functions activated led when we called. // (Look at the code of the upper part where they are used) // RED - FAST void authError(){ int n = 0 ; while(n< 20){ digitalWrite(9, HIGH); delay(500); digitalWrite(9, LOW); delay(500); n++; } } // GREEN - SLOW void authOk(){ int n = 0 ; while(n< 5){ digitalWrite(10, HIGH); delay(2000); digitalWrite(10, LOW); delay(500); n++; } } // This code can be reduced to increase efficiency and speed of execution, // but has been created with didactic purposes and therefore increased readability Once you have the copy, compile it, and throw it on your board. Ensure that our creation now has a voice, whenever we start communicating or when someone tries to authenticate. In addition, we have added two simple LEDs that give us much information about our design. Finally, we have our whole system, perhaps it will be time to improve our Python script and try the serial port settings, creating a red window or alarm sound on the computer that is receiving the data. It is the turn of reader to assimilate all of the concepts seen in this day and modify them to delve deeper and deeper into the wonderful world of programming. Summary This has been an intense article, full of new elements. We learned to handle technical documentation fluently, covered the different type of signals and their main differences, found the perfect component for our needs, and finally applied it to a real project, without forgetting the NFC. This has just been introduced, laying the foundation for the reader to be able to modify and study it deeper it in the future. Resources for Article: Further resources on this subject: Getting Started with Arduino[article] Arduino Development[article] The Arduino Mobile Robot [article]
Read more
  • 0
  • 0
  • 9439

article-image-big-data
Packt
02 Sep 2015
24 min read
Save for later

Big Data

Packt
02 Sep 2015
24 min read
 In this article by Henry Garner, author of the book Clojure for Data Science, we'll be working with a relatively modest dataset of only 100,000 records. This isn't big data (at 100 MB, it will fit comfortably in the memory of one machine), but it's large enough to demonstrate the common techniques of large-scale data processing. Using Hadoop (the popular framework for distributed computation) as its case study, this article will focus on how to scale algorithms to very large volumes of data through parallelism. Before we get to Hadoop and distributed data processing though, we'll see how some of the same principles that enable Hadoop to be effective at a very large scale can also be applied to data processing on a single machine, by taking advantage of the parallel capacity available in all modern computers. (For more resources related to this topic, see here.) The reducers library The count operation we implemented previously is a sequential algorithm. Each line is processed one at a time until the sequence is exhausted. But there is nothing about the operation that demands that it must be done in this way. We could split the number of lines into two sequences (ideally of roughly equal length) and reduce over each sequence independently. When we're done, we would just add together the total number of lines from each sequence to get the total number of lines in the file: If each Reduce ran on its own processing unit, then the two count operations would run in parallel. All the other things being equal, the algorithm would run twice as fast. This is one of the aims of the clojure.core.reducers library—to bring the benefit of parallelism to algorithms implemented on a single machine by taking advantage of multiple cores. Parallel folds with reducers The parallel implementation of reduce implemented by the reducers library is called fold. To make use of a fold, we have to supply a combiner function that will take the results of our reduced sequences (the partial row counts) and return the final result. Since our row counts are numbers, the combiner function is simply +. Reducers are a part of Clojure's standard library, they do not need to be added as an external dependency. The adjusted example, using clojure.core.reducers as r, looks like this: (defn ex-5-5 [] (->> (io/reader "data/soi.csv") (line-seq) (r/fold + (fn [i x] (inc i))))) The combiner function, +, has been included as the first argument to fold and our unchanged reduce function is supplied as the second argument. We no longer need to pass the initial value of zero—fold will get the initial value by calling the combiner function with no arguments. Our preceding example works because +, called with no arguments, already returns zero: (defn ex-5-6 [] (+)) ;; 0 To participate in folding then, it's important that the combiner function have two implementations: one with zero arguments that returns the identity value and another with two arguments that combines the arguments. Different folds will, of course, require different combiner functions and identity values. For example, the identity value for multiplication is 1. We can visualize the process of seeding the computation with an identity value, iteratively reducing over the sequence of xs and combining the reductions into an output value as a tree: There may be more than two reductions to combine, of course. The default implementation of fold will split the input collection into chunks of 512 elements. Our 166,000-element sequence will therefore generate 325 reductions to be combined. We're going to run out of page real estate quite quickly with a tree representation diagram, so let's visualize the process more schematically instead—as a two-step reduce and combine process. The first step performs a parallel reduce across all the chunks in the collection. The second step performs a serial reduce over the intermediate results to arrive at the final result: The preceding representation shows reduce over several sequences of xs, represented here as circles, into a series of outputs, represented here as squares. The squares are combined serially to produce the final result, represented by a star. Loading large files with iota Calling fold on a lazy sequence requires Clojure to realize the sequence into memory and then chunk the sequence into groups for parallel execution. For situations where the calculation performed on each row is small, the overhead involved in coordination outweighs the benefit of parallelism. We can improve the situation slightly by using a library called iota (https://github.com/thebusby/iota). The iota library loads files directly into the data structures suitable for folding over with reducers that can handle files larger than available memory by making use of memory-mapped files. With iota in the place of our line-seq function, our line count simply becomes: (defn ex-5-7 [] (->> (iota/seq "data/soi.csv") (r/fold + (fn [i x] (inc i))))) So far, we've just been working with the sequences of unformatted lines, but if we're going to do anything more than counting the rows, we'll want to parse them into a more useful data structure. This is another area in which Clojure's reducers can help make our code more efficient. Creating a reducers processing pipeline We already know that the file is comma-separated, so let's first create a function to turn each row into a vector of fields. All fields except the first two contain numeric data, so let's parse them into doubles while we're at it: (defn parse-double [x] (Double/parseDouble x)) (defn parse-line [line] (let [[text-fields double-fields] (->> (str/split line #",") (split-at 2))] (concat text-fields (map parse-double double-fields)))) We're using the reducers version of map to apply our parse-line function to each of the lines from the file in turn: (defn ex-5-8 [] (->> (iota/seq "data/soi.csv") (r/drop 1) (r/map parse-line) (r/take 1) (into []))) ;; [("01" "AL" 0.0 1.0 889920.0 490850.0 ...)] The final into function call converts the reducers' internal representation (a reducible collection) into a Clojure vector. The previous example should return a sequence of 77 fields, representing the first row of the file after the header. We're just dropping the column names at the moment, but it would be great if we could make use of these to return a map representation of each record, associating the column name with the field value. The keys of the map would be the column headings and the values would be the parsed fields. The clojure.core function zipmap will create a map out of two sequences—one for the keys and one for the values: (defn parse-columns [line] (->> (str/split line #",") (map keyword))) (defn ex-5-9 [] (let [data (iota/seq "data/soi.csv") column-names (parse-columns (first data))] (->> (r/drop 1 data) (r/map parse-line) (r/map (fn [fields] (zipmap column-names fields))) (r/take 1) (into [])))) This function returns a map representation of each row, a much more user-friendly data structure: [{:N2 1505430.0, :A19300 181519.0, :MARS4 256900.0 ...}] A great thing about Clojure's reducers is that in the preceding computation, calls to r/map, r/drop and r/take are composed into a reduction that will be performed in a single pass over the data. This becomes particularly valuable as the number of operations increases. Let's assume that we'd like to filter out zero ZIP codes. We could extend the reducers pipeline like this: (defn ex-5-10 [] (let [data (iota/seq "data/soi.csv") column-names (parse-columns (first data))] (->> (r/drop 1 data) (r/map parse-line) (r/map (fn [fields] (zipmap column-names fields))) (r/remove (fn [record] (zero? (:zipcode record)))) (r/take 1) (into [])))) The r/remove step is now also being run together with the r/map, r/drop and r/take calls. As the size of the data increases, it becomes increasingly important to avoid making multiple iterations over the data unnecessarily. Using Clojure's reducers ensures that our calculations are compiled into a single pass. Curried reductions with reducers To make the process clearer, we can create a curried version of each of our previous steps. To parse the lines, create a record from the fields and filter zero ZIP codes. The curried version of the function is a reduction waiting for a collection: (def line-formatter (r/map parse-line)) (defn record-formatter [column-names] (r/map (fn [fields] (zipmap column-names fields)))) (def remove-zero-zip (r/remove (fn [record] (zero? (:zipcode record))))) In each case, we're calling one of reducers' functions, but without providing a collection. The response is a curried version of the function that can be applied to the collection at a later time. The curried functions can be composed together into a single parse-file function using comp: (defn load-data [file] (let [data (iota/seq file) col-names (parse-columns (first data)) parse-file (comp remove-zero-zip (record-formatter col-names) line-formatter)] (parse-file (rest data)))) It's only when the parse-file function is called with a sequence that the pipeline is actually executed. Statistical folds with reducers With the data parsed, it's time to perform some descriptive statistics. Let's assume that we'd like to know the mean number of returns (column N1) submitted to the IRS by ZIP code. One way of doing this—the way we've done several times throughout the book—is by adding up the values and dividing it by the count. Our first attempt might look like this: (defn ex-5-11 [] (let [data (load-data "data/soi.csv") xs (into [] (r/map :N1 data))] (/ (reduce + xs) (count xs)))) ;; 853.37 While this works, it's comparatively slow. We iterate over the data once to create xs, a second time to calculate the sum, and a third time to calculate the count. The bigger our dataset gets, the larger the time penalty we'll pay. Ideally, we would be able to calculate the mean value in a single pass over the data, just like our parse-file function previously. It would be even better if we can perform it in parallel too. Associativity Before we proceed, it's useful to take a moment to reflect on why the following code wouldn't do what we want: (defn mean ([] 0) ([x y] (/ (+ x y) 2))) Our mean function is a function of two arities. Without arguments, it returns zero, the identity for the mean computation. With two arguments, it returns their mean: (defn ex-5-12 [] (->> (load-data "data/soi.csv") (r/map :N1) (r/fold mean))) ;; 930.54 The preceding example folds over the N1 data with our mean function and produces a different result from the one we obtained previously. If we could expand out the computation for the first three xs, we might see something like the following code: (mean (mean (mean 0 a) b) c) This is a bad idea, because the mean function is not associative. For an associative function, the following holds true: Addition is associative, but multiplication and division are not. So the mean function is not associative either. Contrast the mean function with the following simple addition: (+ 1 (+ 2 3)) This yields an identical result to: (+ (+ 1 2) 3) It doesn't matter how the arguments to + are partitioned. Associativity is an important property of functions used to reduce over a set of data because, by definition, the results of a previous calculation are treated as inputs to the next. The easiest way of converting the mean function into an associative function is to calculate the sum and the count separately. Since the sum and the count are associative, they can be calculated in parallel over the data. The mean function can be calculated simply by dividing one by the other. Multiple regression with gradient descent The normal equation uses matrix algebra to very quickly and efficiently arrive at the least squares estimates. Where all data fits in memory, this is a very convenient and concise equation. Where the data exceeds the memory available to a single machine however, the calculation becomes unwieldy. The reason for this is matrix inversion. The calculation of  is not something that can be accomplished on a fold over the data—each cell in the output matrix depends on many others in the input matrix. These complex relationships require that the matrix be processed in a nonsequential way. An alternative approach to solve linear regression problems, and many other related machine learning problems, is a technique called gradient descent. Gradient descent reframes the problem as the solution to an iterative algorithm—one that does not calculate the answer in one very computationally intensive step, but rather converges towards the correct answer over a series of much smaller steps. The gradient descent update rule Gradient descent works by the iterative application of a function that moves the parameters in the direction of their optimum values. To apply this function, we need to know the gradient of the cost function with the current parameters. Calculating the formula for the gradient involves calculus that's beyond the scope of this book. Fortunately, the resulting formula isn't terribly difficult to interpret:  is the partial derivative, or the gradient, of our cost function J(β) for the parameter at index j. Therefore, we can see that the gradient of the cost function with respect to the parameter at index j is equal to the difference between our prediction and the true value of y multiplied by the value of x at index j. Since we're seeking to descend the gradient, we want to subtract some proportion of the gradient from the current parameter values. Thus, at each step of gradient descent, we perform the following update: Here, := is the assigment operator and α is a factor called the learning rate. The learning rate controls how large an adjustment we wish make to the parameters at each iteration as a fraction of the gradient. If our prediction ŷ nearly matches the actual value of y, then there would be little need to change the parameters. In contrast, a larger error will result in a larger adjustment to the parameters. This rule is called the Widrow-Hoff learning rule or the Delta rule. The gradient descent learning rate As we've seen, gradient descent is an iterative algorithm. The learning rate, usually represented by α, dictates the speed at which the gradient descent converges to the final answer. If the learning rate is too small, convergence will happen very slowly. If it is too large, gradient descent will not find values close to the optimum and may even diverge from the correct answer: In the preceding chart, a small learning rate leads to a show convergence over many iterations of the algorithm. While the algorithm does reach the minimum, it does so over many more steps than is ideal and, therefore, may take considerable time. By contrast, in following diagram, we can see the effect of a learning rate that is too large. The parameter estimates are changed so significantly between iterations that they actually overshoot the optimum values and diverge from the minimum value: The gradient descent algorithm requires us to iterate repeatedly over our dataset. With the correct version of alpha, each iteration should successively yield better approximations of the ideal parameters. We can choose to terminate the algorithm when either the change between iterations is very small or after a predetermined number of iterations. Feature scaling As more features are added to the linear model, it is important to scale features appropriately. Gradient descent will not perform very well if the features have radically different scales, since it won't be possible to pick a learning rate to suit them all. A simple scaling we can perform is to subtract the mean value from each of the values and divide it by the standard-deviation. This will tend to produce values with zero mean that generally vary between -3 and 3: ( defn feature-scales [features] (->> (prepare-data) (t/map #(select-keys % features)) (t/facet) (t/fuse {:mean (m/mean) :sd (m/standard-deviation)}))) The feature-factors function in the preceding code uses t/facet to calculate the mean value and standard deviation of all the input features: (defn ex-5-24 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2]] (->> (feature-scales features) (t/tesser (chunks data))))) ;; {:MARS2 {:sd 533.4496892658647, :mean 317.0412009748016}...} If you run the preceding example, you'll see the different means and standard deviations returned by the feature-scales function. Since our feature scales and input records are represented as maps, we can perform the scale across all the features at once using Clojure's merge-with function: (defn scale-features [factors] (let [f (fn [x {:keys [mean sd]}] (/ (- x mean) sd))] (fn [x] (merge-with f x factors)))) Likewise, we can perform the all-important reversal with unscale-features: (defn unscale-features [factors] (let [f (fn [x {:keys [mean sd]}] (+ (* x sd) mean))] (fn [x] (merge-with f x factors)))) Let's scale our features and take a look at the very first feature. Tesser won't allow us to execute a fold without a reduce, so we'll temporarily revert to using Clojure's reducers: (defn ex-5-25 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2] factors (->> (feature-scales features) (t/tesser (chunks data)))] (->> (load-data "data/soi.csv") (r/map #(select-keys % features )) (r/map (scale-features factors)) (into []) (first)))) ;; {:MARS2 -0.14837567114357617, :NUMDEP 0.30617757526890155, ;; :AGI_STUB -0.714280814223704, :A00200 -0.5894942801950217, ;; :A02300 0.031741856083514465} This simple step will help gradient descent perform optimally on our data. Feature extraction Although we've used maps to represent our input data in this article, it's going to be more convenient when running gradient descent to represent our features as a matrix. Let's write a function to transform our input data into a map of xs and y. The y axis will be a scalar response value and xs will be a matrix of scaled feature values. We're adding a bias term to the returned matrix of features: (defn feature-matrix [record features] (let [xs (map #(% record) features)] (i/matrix (cons 1 xs)))) (defn extract-features [fy features] (fn [record] {:y (fy record) :xs (feature-matrix record features)})) Our feature-matrix function simply accepts an input of a record and the features to convert into a matrix. We call this from within extract-features, which returns a function that we can call on each input record: (defn ex-5-26 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2] factors (->> (feature-scales features) (t/tesser (chunks data)))] (->> (load-data "data/soi.csv") (r/map (scale-features factors)) (r/map (extract-features :A02300 features)) (into []) (first)))) ;; {:y 433.0, :xs A 5x1 matrix ;; ------------- ;; 1.00e+00 ;; -5.89e-01 ;; -7.14e-01 ;; 3.06e-01 ;; -1.48e-01 ;; } The preceding example shows the data converted into a format suitable to perform gradient descent: a map containing the y response variable and a matrix of values, including the bias term. Applying a single step of gradient descent The objective of calculating the cost is to determine the amount by which to adjust each of the coefficients. Once we've calculated the average cost, as we did previously, we need to update the estimate of our coefficients β. Together, these steps represent a single iteration of gradient descent: We can return the updated coefficients in a post-combiner step that makes use of the average cost, the value of alpha, and the previous coefficients. Let's create a utility function update-coefficients, which will receive the coefficients and alpha and return a function that will calculate the new coefficients, given a total model cost: (defn update-coefficients [coefs alpha] (fn [cost] (->> (i/mult cost alpha) (i/minus coefs)))) With the preceding function in place, we have everything we need to package up a batch gradient descent update rule: (defn gradient-descent-fold [{:keys [fy features factors coefs alpha]}] (let [zeros-matrix (i/matrix 0 (count features) 1)] (->> (prepare-data) (t/map (scale-features factors)) (t/map (extract-features fy features)) (t/map (calculate-error (i/trans coefs))) (t/fold (matrix-mean (inc (count features)) 1)) (t/post-combine (update-coefficients coefs alpha))))) (defn ex-5-31 [] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) data (chunks (iota/seq "data/soi.csv")) factors (->> (feature-scales features) (t/tesser data)) options {:fy :A02300 :features features :factors factors :coefs coefs :alpha 0.1}] (->> (gradient-descent-fold options) (t/tesser data)))) ;; A 6x1 matrix ;; ------------- ;; -4.20e+02 ;; -1.38e+06 ;; -5.06e+07 ;; -9.53e+02 ;; -1.42e+06 ;; -4.86e+05 The resulting matrix represents the values of the coefficients after the first iteration of gradient descent. Running iterative gradient descent Gradient descent is an iterative algorithm, and we will usually need to run it many times to convergence. With a large dataset, this can be very time-consuming. To save time, we've included a random sample of soi.csv in the data directory called soi-sample.csv. The smaller size allows us to run iterative gradient descent in a reasonable timescale. The following code runs gradient descent for 100 iterations, plotting the values of the parameters between each iteration on an xy-plot: (defn descend [options data] (fn [coefs] (->> (gradient-descent-fold (assoc options :coefs coefs)) (t/tesser data)))) (defn ex-5-32 [] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) data (chunks (iota/seq "data/soi-sample.csv")) factors (->> (feature-scales features) (t/tesser data)) options {:fy :A02300 :features features :factors factors :coefs coefs :alpha 0.1} iterations 100 xs (range iterations) ys (->> (iterate (descend options data) coefs) (take iterations))] (-> (c/xy-plot xs (map first ys) :x-label "Iterations" :y-label "Coefficient") (c/add-lines xs (map second ys)) (c/add-lines xs (map #(nth % 2) ys)) (c/add-lines xs (map #(nth % 3) ys)) (c/add-lines xs (map #(nth % 4) ys)) (i/view)))) If you run the example, you should see a chart similar to the following: In the preceding chart, you can see how the parameters converge to relatively stable the values over the course of 100 iterations. Scaling gradient descent with Hadoop The length of time each iteration of batch gradient descent takes to run is determined by the size of your data and by how many processors your computer has. Although several chunks of data are processed in parallel, the dataset is large and the processors are finite. We've achieved a speed gain by performing calculations in parallel, but if we double the size of the dataset, the runtime will double as well. Hadoop is one of several systems that has emerged in the last decade which aims to parallelize work that exceeds the capabilities of a single machine. Rather than running code across multiple processors, Hadoop takes care of running a calculation across many servers. In fact, Hadoop clusters can, and some do, consist of many thousands of servers. Hadoop consists of two primary subsystems— the Hadoop Distributed File System (HDFS)—and the job processing system, MapReduce. HDFS stores files in chunks. A given file may be composed of many chunks and chunks are often replicated across many servers. In this way, Hadoop can store quantities of data much too large for any single server and, through replication, ensure that the data is stored reliably in the event of hardware failure too. As the name implies, the MapReduce programming model is built around the concept of map and reduce steps. Each job is composed of at least one map step and may optionally specify a reduce step. An entire job may consist of several map and reduce steps chained together. In the respect that reduce steps are optional, Hadoop has a slightly more flexible approach to distributed calculation than Tesser. Gradient descent on Hadoop with Tesser and Parkour Tesser's Hadoop capabilities are available in the tesser.hadoop namespace, which we're including as h. The primary public API function in the Hadoop namespace is h/fold. The fold function expects to receive at least four arguments, representing the configuration of the Hadoop job, the input file we want to process, a working directory for Hadoop to store its intermediate files, and the fold we want to run, referenced as a Clojure var. Any additional arguments supplied will be passed as arguments to the fold when it is executed. The reason for using a var to represent our fold is that the function call initiating the fold may happen on a completely different computer than the one that actually executes it. In a distributed setting, the var and arguments must entirely specify the behavior of the function. We can't, in general, rely on other mutable local state (for example, the value of an atom, or the value of variables closing over the function) to provide any additional context. Parkour distributed sources and sinks The data which we want our Hadoop job to process may exist on multiple machines too, stored distributed in chunks on HDFS. Tesser makes use of a library called Parkour (https://github.com/damballa/parkour/) to handle accessing potentially distributed data sources. Although Hadoop is designed to be run and distributed across many servers, it can also run in local mode. Local mode is suitable for testing and enables us to interact with the local filesystem as if it were HDFS. Another namespace we'll be using from Parkour is the parkour.conf namespace. This will allow us to create a default Hadoop configuration and operate it in local mode: (defn ex-5-33 [] (->> (text/dseq "data/soi.csv") (r/take 2) (into []))) In the preceding example, we use Parkour's text/dseq function to create a representation of the IRS input data. The return value implements Clojure's reducers protocol, so we can use r/take on the result. Running a feature scale fold with Hadoop Hadoop needs a location to write its temporary files while working on a task, and will complain if we try to overwrite an existing directory. Since we'll be executing several jobs over the course of the next few examples, let's create a little utility function that returns a new file with a randomly-generated name. (defn rand-file [path] (io/file path (str (long (rand 0x100000000))))) (defn ex-5-34 [] (let [conf (conf/ig) input (text/dseq "data/soi.csv") workdir (rand-file "tmp") features [:A00200 :AGI_STUB :NUMDEP :MARS2]] (h/fold conf input workdir #'feature-scales features))) Parkour provides a default Hadoop configuration object with the shorthand (conf/ig). This will return an empty configuration. The default value is enough, we don't need to supply any custom configuration. All of our Hadoop jobs will write their temporary files to a random directory inside the project's tmp directory. Remember to delete this folder later, if you're concerned about preserving disk space. If you run the preceding example now, you should get an output similar to the following: ;; {:MARS2 317.0412009748016, :NUMDEP 581.8504423822615, ;; :AGI_STUB 3.499939975269811, :A00200 37290.58880658831} Although the return value is identical to the values we got previously, we're now making use of Hadoop behind the scenes to process our data. In spite of this, notice that Tesser will return the response from our fold as a single Clojure data structure. Running gradient descent with Hadoop Since tesser.hadoop folds return Clojure data structures just like tesser.core folds, defining a gradient descent function that makes use of our scaled features is very simple: (defn hadoop-gradient-descent [conf input-file workdir] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) input (text/dseq input-file) options {:column-names column-names :features features :coefs coefs :fy :A02300 :alpha 1e-3} factors (h/fold conf input (rand-file workdir) #'feature-scales features) descend (fn [coefs] (h/fold conf input (rand-file workdir) #'gradient-descent-fold (merge options {:coefs coefs :factors factors})))] (take 5 (iterate descend coefs)))) The preceding code defines a hadoop-gradient-descent function that iterates a descend function 5 times. Each iteration of descend calculates the improved coefficients based on the gradient-descent-fold function. The final return value is a vector of coefficients after 5 iterations of a gradient descent. We run the job on the full IRS data in the following example: ( defn ex-5-35 [] (let [workdir "tmp" out-file (rand-file workdir)] (hadoop-gradient-descent (conf/ig) "data/soi.csv" workdir))) After several iterations, you should see an output similar to the following: ;; ([0 0 0 0 0] ;; (20.9839310796048 46.87214911003046 -7.363493937722712 ;; 101.46736841329326 55.67860863427868) ;; (40.918665605227744 56.55169901254631 -13.771345753228694 ;; 162.1908841131747 81.23969785586247) ;; (59.85666340457121 50.559130068258995 -19.463888245285332 ;; 202.32407094149158 92.77424653758085) ;; (77.8477613139478 38.67088624825574 -24.585818946408523 ;; 231.42399118694212 97.75201693843269)) We've seen how we're able to calculate gradient descent using distributed techniques locally. Now, let's see how we can run this on a cluster of our own. Summary In this article, we learned some of the fundamental techniques of distributed data processing and saw how the functions used locally for data processing, map and reduce, are powerful ways of processing even very large quantities of data. We learned how Hadoop can scale unbounded by the capabilities of any single server by running functions on smaller subsets of the data whose outputs are themselves combined to finally produce a result. Once you understand the tradeoffs, this "divide and conquer" approach toward processing data is a simple and very general way of analyzing data on a large scale. We saw both the power and limitations of simple folds to process data using both Clojure's reducers and Tesser. We've also begun exploring how Parkour exposes more of Hadoop's underlying capabilities. Resources for Article: Further resources on this subject: Supervised learning[article] Machine Learning[article] Why Big Data in the Financial Sector? [article]
Read more
  • 0
  • 0
  • 10892

article-image-walking-you-through-classes
Packt
02 Sep 2015
15 min read
Save for later

Walking You Through Classes

Packt
02 Sep 2015
15 min read
In this article by Narayan Prusty, author of Learning ECMAScript 6, you will learn how ES6 introduces classes that provide a much simpler and clearer syntax to creating constructors and dealing with inheritance. JavaScript never had the concept of classes, although it's an object-oriented programming language. Programmers from the other programming language background often found it difficult to understand JavaScript's object-oriented model and inheritance due to lack of classes. In this article, we will learn about the object-oriented JavaScript using the ES6 classes: Creating objects the classical way What are classes in ES6 Creating objects using classes The inheritance in classes The features of classes (For more resources related to this topic, see here.) Understanding the Object-oriented JavaScript Before we proceed with the ES6 classes, let's refresh our knowledge on the JavaScript data types, constructors, and inheritance. While learning classes, we will be comparing the syntax of the constructors and prototype-based inheritance with the syntax of the classes. Therefore, it is important to have a good grip on these topics. Creating objects There are two ways of creating an object in JavaScript, that is, using the object literal, or using a constructor. The object literal is used when we want to create fixed objects, whereas constructor is used when we want to create the objects dynamically on runtime. Let's consider a case where we may need to use the constructors instead of the object literal. Here is a code example: var student = { name: "Eden", printName: function(){ console.log(this.name); } } student.printName(); //Output "Eden" Here, we created a student object using the object literal, that is, the {} notation. This works well when you just want to create a single student object. But the problem arises when you want to create multiple student objects. Obviously, you don't want to write the previous code multiple times to create multiple student objects. This is where constructors come into use. A function acts like a constructor when invoked using the new keyword. A constructor creates and returns an object. The this keyword, inside a function, when invoked as a constructor, points to the new object instance, and once the constructor execution is finished, the new object is automatically returned. Consider this example: function Student(name) { this.name = name; } Student.prototype.printName = function(){ console.log(this.name); } var student1 = new Student("Eden"); var student2 = new Student("John"); student1.printName(); //Output "Eden" student2.printName(); //Output "John" Here, to create multiple student objects, we invoked the constructor multiple times instead of creating multiple student objects using the object literals. To add methods to the instances of the constructor, we didn't use the this keyword, instead we used the prototype property of constructor. We will learn more on why we did it this way, and what the prototype property is, in the next section. Actually, every object must belong to a constructor. Every object has an inherited property named constructor, pointing to the object's constructor. When we create objects using the object literal, the constructor property points to the global Object constructor. Consider this example to understand this behavior: var student = {} console.log(student.constructor == Object); //Output "true" Understanding inheritance Each JavaScript object has an internal [[prototype]] property pointing to another object called as its prototype. This prototype object has a prototype of its own, and so on until an object is reached with null as its prototype. null has no prototype, and it acts as a final link in the prototype chain. When trying to access a property of an object, and if the property is not found in the object, then the property is searched in the object's prototype. If still not found, then it's searched in the prototype of the prototype object. It keeps on going until null is encountered in the prototype chain. This is how inheritance works in JavaScript. As a JavaScript object can have only one prototype, JavaScript supports only a single inheritance. While creating objects using the object literal, we can use the special __proto__ property or the Object.setPrototypeOf() method to assign a prototype of an object. JavaScript also provides an Object.create() method, with which we can create a new object with a specified prototype as the __proto__ lacked browser support, and the Object.setPrototypeOf() method seemed a little odd. Here is code example that demonstrates different ways to set the prototype of an object while creating, using the object literal: var object1 = { name: "Eden", __proto__: {age: 24} } var object2 = {name: "Eden"} Object.setPrototypeOf(object2, {age: 24}); var object3 = Object.create({age: 24}, {name: {value: "Eden"}}); console.log(object1.name + " " + object1.age); console.log(object2.name + " " + object2.age); console.log(object3.name + " " + object3.age); The output is as follows: Eden 24 Eden 24 Eden 24 Here, the {age:24} object is referred as base object, superobject, or parent object as its being inherited. And the {name:"Eden"} object is referred as the derived object, subobject, or the child object, as it inherits another object. If you don't assign a prototype to an object while creating it using the object literal, then the prototype points to the Object.prototype property. The prototype of Object.prototype is null therefore, leading to the end of the prototype chain. Here is an example to demonstrate this: var obj = { name: "Eden" } console.log(obj.__proto__ == Object.prototype); //Output "true" While creating objects using a constructor, the prototype of the new objects always points to a property named prototype of the function object. By default, the prototype property is an object with one property named as constructor. The constructor property points to the function itself. Consider this example to understand this model: function Student() { this.name = "Eden"; } var obj = new Student(); console.log(obj.__proto__.constructor == Student); //Output "true" console.log(obj.__proto__ == Student.prototype); //Output "true" To add new methods to the instances of a constructor, we should add them to the prototype property of the constructor, as we did earlier. We shouldn't add methods using the this keyword in a constructor body, because every instance of the constructor will have a copy of the methods, and this isn't very memory efficient. By attaching methods to the prototype property of a constructor, there is only one copy of each function that all the instances share. To understand this, consider this example: function Student(name) { this.name = name; } Student.prototype.printName = function(){ console.log(this.name); } var s1 = new Student("Eden"); var s2 = new Student("John"); function School(name) { this.name = name; this.printName = function(){ console.log(this.name); } } var s3 = new School("ABC"); var s4 = new School("XYZ"); console.log(s1.printName == s2.printName); console.log(s3.printName == s4.printName); The output is as follows: true false Here, s1 and s2 share the same printName function that reduces the use of memory, whereas s3 and s4 contain two different functions with the name as printName that makes the program use more memory. This is unnecessary, as both the functions do the same thing. Therefore, we add methods for the instances to the prototype property of the constructor. Implementing the inheritance hierarchy in the constructors is not as straightforward as we did for object literals. Because the child constructor needs to invoke the parent constructor for the parent constructor's initialization logic to take place and we need to add the methods of the prototype property of the parent constructor to the prototype property of the child constructor, so that we can use them with the objects of child constructor. There is no predefined way to do all this. The developers and JavaScript libraries have their own ways of doing this. I will show you the most common way of doing it. Here is an example to demonstrate how to implement the inheritance while creating the objects using the constructors: function School(schoolName) { this.schoolName = schoolName; } School.prototype.printSchoolName = function(){ console.log(this.schoolName); } function Student(studentName, schoolName) { this.studentName = studentName; School.call(this, schoolName); } Student.prototype = new School(); Student.prototype.printStudentName = function(){ console.log(this.studentName); } var s = new Student("Eden", "ABC School"); s.printStudentName(); s.printSchoolName(); The output is as follows: Eden ABC School Here, we invoked the parent constructor using the call method of the function object. To inherit the methods, we created an instance of the parent constructor, and assigned it to the child constructor's prototype property. This is not a foolproof way of implementing inheritance in the constructors, as there are lots of potential problems. For example—in case the parent constructor does something else other than just initializing properties, such as DOM manipulation, then while assigning a new instance of the parent constructor, to the prototype property, of the child constructor, can cause problems. Therefore, the ES6 classes provide a better and easier way to inherit the existing constructors and classes. Using classes We saw that JavaScript's object-oriented model is based on the constructors and prototype-based inheritance. Well, the ES6 classes are just new a syntax for the existing model. Classes do not introduce a new object-oriented model to JavaScript. The ES6 classes aim to provide a much simpler and clearer syntax for dealing with the constructors and inheritance. In fact, classes are functions. Classes are just a new syntax for creating functions that are used as constructors. Creating functions using the classes that aren't used as constructors doesn't make any sense, and offer no benefits. Rather, it makes your code difficult to read, as it becomes confusing. Therefore, use classes only if you want to use it for constructing objects. Let's have a look at classes in detail. Defining a class Just as there are two ways of defining functions, function declaration and function expression, there are two ways to define a class: using the class declaration and the class expression. The class declaration For defining a class using the class declaration, you need to use the class keyword, and a name for the class. Here is a code example to demonstrate how to define a class using the class declaration: class Student { constructor(name) { this.name = name; } } var s1 = new Student("Eden"); console.log(s1.name); //Output "Eden" Here, we created a class named Student. Then, we defined a constructor method in it. Finally, we created a new instance of the class—an object, and logged the name property of the object. The body of a class is in the curly brackets, that is, {}. This is where we need to define methods. Methods are defined without the function keyword, and a comma is not used in between the methods. Classes are treated as functions, and internally the class name is treated as the function name, and the body of the constructor method is treated as the body of the function. There can only be one constructor method in a class. Defining more than one constructor will throw the SyntaxError exception. All the code inside a class body is executed in the strict mode, by default. The previous code is the same as this code when written using function: function Student(name) { this.name = name; } var s1 = new Student("Eden"); console.log(s1.name); //Output "Eden" To prove that a class is a function, consider this code: class Student { constructor(name) { this.name = name; } } function School(name) { this.name = name; } console.log(typeof Student); console.log(typeof School == typeof Student); The output is as follows: function true Here, we can see that a class is a function. It's just a new syntax for creating a function. The class expression A class expression has a similar syntax to a class declaration. However, with class expressions, you are able to omit the class name. Class body and behavior remains the same in both the ways. Here is a code example to demonstrate how to define a class using a class expression: var Student = class { constructor(name) { this.name = name; } } var s1 = new Student("Eden"); console.log(s1.name); //Output "Eden" Here, we stored a reference of the class in a variable, and used it to construct the objects. The previous code is the same as this code when written using function: var Student = function(name) { this.name = name; } var s1 = new Student("Eden"); console.log(s1.name); //Output "Eden" The prototype methods All the methods in the body of the class are added to the prototype property of the class. The prototype property is the prototype of the objects created using class. Here is an example that shows how to add methods to the prototype property of a class: class Person { constructor(name, age) { this.name = name; this.age = age; } printProfile() { console.log("Name is: " + this.name + " and Age is: " + this.age); } } var p = new Person("Eden", 12) p.printProfile(); console.log("printProfile" in p.__proto__); console.log("printProfile" in Person.prototype); The output is as follows: Name is: Eden and Age is: 12 true true Here, we can see that the printProfile method was added to the prototype property of the class. The previous code is the same as this code when written using function: function Person(name, age) { this.name = name; this.age = age; } Person.prototype.printProfile = function() { console.log("Name is: " + this.name + " and Age is: " + this.age); } var p = new Person("Eden", 12) p.printProfile(); console.log("printProfile" in p.__proto__); console.log("printProfile" in Person.prototype); The output is as follows: Name is: Eden and Age is: 12 true true The get and set methods In ES5, to add accessor properties to the objects, we had to use the Object.defineProperty() method. ES6 introduced the get and set prefixes for methods. These methods can be added to the object literals and classes for defining the get and set attributes of the accessor properties. When get and set methods are used in a class body, they are added to the prototype property of the class. Here is an example to demonstrate how to define the get and set methods in a class: class Person { constructor(name) { this._name_ = name; } get name(){ return this._name_; } set name(name){ this._name_ = name; } } var p = new Person("Eden"); console.log(p.name); p.name = "John"; console.log(p.name); console.log("name" in p.__proto__); console.log("name" in Person.prototype); console.log(Object.getOwnPropertyDescriptor(p.__proto__, "name").set); console.log(Object.getOwnPropertyDescriptor(Person.prototype, "name").get); console.log(Object.getOwnPropertyDescriptor(p, "_name_").value); The output is as follows: Eden John true true function name(name) { this._name_ = name; } function name() { return this._name_; } John Here, we created an accessor property to encapsulate the _name_ property. We also logged some other information to prove that name is an accessor property, which is added to the prototype property of the class. The generator method To treat a concise method of an object literal as the generator method, or to treat a method of a class as the generator method, we can simply prefix it with the * character. The generator method of a class is added to the prototype property of the class. Here is an example to demonstrate how to define a generator method in class: class myClass { * generator_function() { yield 1; yield 2; yield 3; yield 4; yield 5; } } var obj = new myClass(); let generator = obj.generator_function(); console.log(generator.next().value); console.log(generator.next().value); console.log(generator.next().value); console.log(generator.next().value); console.log(generator.next().value); console.log(generator.next().done); console.log("generator_function" in myClass.prototype); The output is as follows: 1 2 3 4 5 true true Implementing inheritance in classes Earlier in this article, we saw how difficult it was to implement inheritance hierarchy in functions. Therefore, ES6 aims to make it easy by introducing the extends clause, and the super keyword for classes. By using the extends clause, a class can inherit static and non-static properties of another constructor (which may or may not be defined using a class). The super keyword is used in two ways: It's used in a class constructor method to call the parent constructor When used inside methods of a class, it references the static and non-static methods of the parent constructor Here is an example to demonstrate how to implement the inheritance hierarchy in the constructors using the extends clause, and the super keyword: function A(a) { this.a = a; } A.prototype.printA = function(){ console.log(this.a); } class B extends A { constructor(a, b) { super(a); this.b = b; } printB() { console.log(this.b); } static sayHello() { console.log("Hello"); } } class C extends B { constructor(a, b, c) { super(a, b); this.c = c; } printC() { console.log(this.c); } printAll() { this.printC(); super.printB(); super.printA(); } } var obj = new C(1, 2, 3); obj.printAll(); C.sayHello(); The output is as follows: 3 2 1 Hello Here, A is a function constructor; B is a class that inherits A; C is a class that inherits B; and as B inherits A, therefore C also inherits A. As a class can inherit a function constructor, we can also inherit the prebuilt function constructors, such as String and Array, and also the custom function constructors using the classes instead of other hacky ways that we used to use. The previous example also shows how and where to use the super keyword. Remember that inside the constructor method, you need to use super before using the this keyword. Otherwise, an exception is thrown. If a child class doesn't have a constructor method, then the default behavior will invoke the constructor method of the parent class. Summary In this article, we have learned about the basics of the object-oriented programming using ES5. Then, we jumped into ES6 classes, and learned how it makes easy for us to read and write the object-oriented JavaScript code. We also learned miscellaneous features, such as the accessor methods. Resources for Article: Further resources on this subject: An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js[article] Finding Peace in REST[article] Scaling influencers [article]
Read more
  • 0
  • 0
  • 2109
article-image-starting-yarn-basics
Packt
01 Sep 2015
15 min read
Save for later

Starting with YARN Basics

Packt
01 Sep 2015
15 min read
In this article by Akhil Arora and Shrey Mehrotra, authors of the book Learning YARN, we will be discussing how Hadoop was developed as a solution to handle big data in a cost effective and easiest way possible. Hadoop consisted of a storage layer, that is, Hadoop Distributed File System (HDFS) and the MapReduce framework for managing resource utilization and job execution on a cluster. With the ability to deliver high performance parallel data analysis and to work with commodity hardware, Hadoop is used for big data analysis and batch processing of historical data through MapReduce programming. (For more resources related to this topic, see here.) With the exponential increase in the usage of social networking sites such as Facebook, Twitter, and LinkedIn and e-commerce sites such as Amazon, there was the need of a framework to support not only MapReduce batch processing, but real-time and interactive data analysis as well. Enterprises should be able to execute other applications over the cluster to ensure that cluster capabilities are utilized to the fullest. The data storage framework of Hadoop was able to counter the growing data size, but resource management became a bottleneck. The resource management framework for Hadoop needed a new design to solve the growing needs of big data. YARN, an acronym for Yet Another Resource Negotiator, has been introduced as a second-generation resource management framework for Hadoop. YARN is added as a subproject of Apache Hadoop. With MapReduce focusing only on batch processing, YARN is designed to provide a generic processing platform for data stored across a cluster and a robust cluster resource management framework. In this article, we will cover the following topics: Introduction to MapReduce v1 Shortcomings of MapReduce v1 An overview of the YARN components The YARN architecture How YARN satisfies big data needs Projects powered by YARN Introduction to MapReduce v1 MapReduce is a software framework used to write applications that simultaneously process vast amounts of data on large clusters of commodity hardware in a reliable, fault-tolerant manner. It is a batch-oriented model where a large amount of data is stored in Hadoop Distributed File System (HDFS), and the computation on data is performed as MapReduce phases. The basic principle for the MapReduce framework is to move computed data rather than move data over the network for computation. The MapReduce tasks are scheduled to run on the same physical nodes on which data resides. This significantly reduces the network traffic and keeps most of the I/O on the local disk or within the same rack. The high-level architecture of the MapReduce framework has three main modules: MapReduce API: This is the end-user API used for programming the MapReduce jobs to be executed on the HDFS data. MapReduce framework: This is the runtime implementation of various phases in a MapReduce job such as the map, sort/shuffle/merge aggregation, and reduce phases. MapReduce system: This is the backend infrastructure required to run the user's MapReduce application, manage cluster resources, schedule thousands of concurrent jobs, and so on. The MapReduce system consists of two components—JobTracker and TaskTracker. JobTracker is the master daemon within Hadoop that is responsible for resource management, job scheduling, and management. The responsibilities are as follows: Hadoop clients communicate with the JobTracker to submit or kill jobs and poll for jobs' progress JobTracker validates the client request and if validated, then it allocates the TaskTracker nodes for map-reduce tasks execution JobTracker monitors TaskTracker nodes and their resource utilization, that is, how many tasks are currently running, the count of map-reduce task slots available, decides whether the TaskTracker node needs to be marked as blacklisted node, and so on JobTracker monitors the progress of jobs and if a job/task fails, it automatically reinitializes the job/task on a different TaskTracker node JobTracker also keeps the history of the jobs executed on the cluster TaskTracker is a per node daemon responsible for the execution of map-reduce tasks. A TaskTracker node is configured to accept a number of map-reduce tasks from the JobTracker, that is, the total map-reduce tasks a TaskTracker can execute simultaneously. The responsibilities are as follows: TaskTracker initializes a new JVM process to perform the MapReduce logic. Running a task on a separate JVM ensures that the task failure does not harm the health of the TaskTracker daemon. TaskTracker monitors these JVM processes and updates the task progress to the JobTracker on regular intervals. TaskTracker also sends a heartbeat signal and its current resource utilization metric (available task slots) to the JobTracker every few minutes. Shortcomings of MapReducev1 Though the Hadoop MapReduce framework was widely used, the following are the limitations that were found with the framework: Batch processing only: The resources across the cluster are tightly coupled with map-reduce programming. It does not support integration of other data processing frameworks and forces everything to look like a MapReduce job. The emerging customer requirements demand support for real-time and near real-time processing on the data stored on the distributed file systems. Nonscalability and inefficiency: The MapReduce framework completely depends on the master daemon, that is, the JobTracker. It manages the cluster resources, execution of jobs, and fault tolerance as well. It is observed that the Hadoop cluster performance degrades drastically when the cluster size increases above 4,000 nodes or the count of concurrent tasks crosses 40,000. The centralized handling of jobs control flow resulted in endless scalability concerns for the scheduler. Unavailability and unreliability: The availability and reliability are considered to be critical aspects of a framework such as Hadoop. A single point of failure for the MapReduce framework is the failure of the JobTracker daemon. The JobTracker manages the jobs and resources across the cluster. If it goes down, information related to the running or queued jobs and the job history is lost. The queued and running jobs are killed if the JobTracker fails. The MapReduce v1 framework doesn't have any provision to recover the lost data or jobs. Partitioning of resources: A MapReduce framework divides a job into multiple map and reduce tasks. The nodes with running the TaskTracker daemon are considered as resources. The capability of a resource to execute MapReduce jobs is expressed as the number of map-reduce tasks a resource can execute simultaneously. The framework forced the cluster resources to be partitioned into map and reduce task slots. Such partitioning of the resources resulted in less utilization of the cluster resources. If you have a running Hadoop 1.x cluster, you can refer to the JobTracker web interface to view the map and reduce task slots of the active TaskTracker nodes. The link for the active TaskTracker list is as follows: http://JobTrackerHost:50030/machines.jsp?type=active Management of user logs and job resources: The user logs refer to the logs generated by a MapReduce job. Logs for MapReduce jobs. These logs can be used to validate the correctness of a job or to perform log analysis to tune up the job's performance. In MapReduce v1, the user logs are generated and stored on the local file system of the slave nodes. Accessing logs on the slaves is a pain as users might not have the permissions issued. Since logs were stored on the local file system of a slave, in case the disk goes down, the logs will be lost. A MapReduce job might require some extra resources for job execution. In the MapReduce v1 framework, the client copies job resources to the HDFS with the replication of 10. Accessing resources remotely or through HDFS is not efficient. Thus, there's a need for localization of resources and a robust framework to manage job resources. In January 2008, Arun C. Murthy logged a bug in JIRA against the MapReduce architecture, which resulted in a generic resource scheduler and a per job user-defined component that manages the application execution. You can see this at https://issues.apache.org/jira/browse/MAPREDUCE-279 An overview of YARN components YARN divides the responsibilities of JobTracker into separate components, each having a specified task to perform. In Hadoop-1, the JobTracker takes care of resource management, job scheduling, and job monitoring. YARN divides these responsibilities of JobTracker into ResourceManager and ApplicationMaster. Instead of TaskTracker, it uses NodeManager as the worker daemon for execution of map-reduce tasks. The ResourceManager and the NodeManager form the computation framework for YARN, and ApplicationMaster is an application-specific framework for application management.   ResourceManager A ResourceManager is a per cluster service that manages the scheduling of compute resources to applications. It optimizes cluster utilization in terms of memory, CPU cores, fairness, and SLAs. To allow different policy constraints, it has algorithms in terms of pluggable schedulers such as capacity and fair that allows resource allocation in a particular way. ResourceManager has two main components: Scheduler: This is a pure pluggable component that is only responsible for allocating resources to applications submitted to the cluster, applying constraint of capacities and queues. Scheduler does not provide any guarantee for job completion or monitoring, it only allocates the cluster resources governed by the nature of job and resource requirement. ApplicationsManager (AsM): This is a service used to manage application masters across the cluster that is responsible for accepting the application submission, providing the resources for application master to start, monitoring the application progress, and restart, in case of application failure. NodeManager The NodeManager is a per node worker service that is responsible for the execution of containers based on the node capacity. Node capacity is calculated based on the installed memory and the number of CPU cores. The NodeManager service sends a heartbeat signal to the ResourceManager to update its health status. The NodeManager service is similar to the TaskTracker service in MapReduce v1. NodeManager also sends the status to ResourceManager, which could be the status of the node on which it is running or the status of tasks executing on it. ApplicationMaster An ApplicationMaster is a per application framework-specific library that manages each instance of an application that runs within YARN. YARN treats ApplicationMaster as a third-party library responsible for negotiating the resources from the ResourceManager scheduler and works with NodeManager to execute the tasks. The ResourceManager allocates containers to the ApplicationMaster and these containers are then used to run the application-specific processes. ApplicationMaster also tracks the status of the application and monitors the progress of the containers. When the execution of a container gets complete, the ApplicationMaster unregisters the containers with the ResourceManager and unregisters itself after the execution of the application is complete. Container A container is a logical bundle of resources in terms of memory, CPU, disk, and so on that is bound to a particular node. In the first version of YARN, a container is equivalent to a block of memory. The ResourceManager scheduler service dynamically allocates resources as containers. A container grants rights to an ApplicationMaster to use a specific amount of resources of a specific host. An ApplicationMaster is considered as the first container of an application and it manages the execution of the application logic on allocated containers. The YARN architecture In the previous topic, we discussed the YARN components. Here we'll discuss the high-level architecture of YARN and look at how the components interact with each other. The ResourceManager service runs on the master node of the cluster. A YARN client submits an application to the ResourceManager. An application can be a single MapReduce job, a directed acyclic graph of jobs, a java application, or any shell script. The client also defines an ApplicationMaster and a command to start the ApplicationMaster on a node. The ApplicationManager service of resource manager will validate and accept the application request from the client. The scheduler service of resource manager will allocate a container for the ApplicationMaster on a node and the NodeManager service on that node will use the command to start the ApplicationMaster service. Each YARN application has a special container called ApplicationMaster. The ApplicationMaster container is the first container of an application. The ApplicationMaster requests resources from the ResourceManager. The RequestRequest will have the location of the node, memory, and CPU cores required. The ResourceManager will allocate the resources as containers on a set of nodes. The ApplicationMaster will connect to the NodeManager services and request NodeManager to start containers. The ApplicationMaster manages the execution of the containers and will notify the ResourceManager once the application execution is over. Application execution and progress monitoring is the responsibility of ApplicationMaster rather than ResourceManager. The NodeManager service runs on each slave of the YARN cluster. It is responsible for running application's containers. The resources specified for a container are taken from the NodeManager resources. Each NodeManager periodically updates ResourceManager for the set of available resources. The ResourceManager scheduler service uses this resource matrix to allocate new containers to ApplicationMaster or to start execution of a new application. How YARN satisfies big data needs We talked about the MapReduce v1 framework and some limitations of the framework. Let's now discuss how YARN solves these issues: Scalability and higher cluster utilization: Scalability is the ability of a software or product to implement well under an expanding workload. In YARN, the responsibility of resource management and job scheduling / monitoring is divided into separate daemons, allowing YARN daemons to scale the cluster without degrading the performance of the cluster. With a flexible and generic resource model in YARN, the scheduler handles an overall resource profile for each type of application. This structure makes the communication and storage of resource requests efficient for the scheduler resulting in higher cluster utilization. High availability for components: Fault tolerance is a core design principle for any multitenancy platform such as YARN. This responsibility is delegated to ResourceManager and ApplicationMaster. The application specific framework, ApplicationMaster, handles the failure of a container. The ResourceManager handles the failure of NodeManager and ApplicationMaster. Flexible resource model: In MapReduce v1, resources are defined as the number of map and reduce task slots available for the execution of a job. Every resource request cannot be mapped as map/reduce slots. In YARN, a resource-request is defined in terms of memory, CPU, locality, and so on. It results in a generic definition for a resource request by an application. The NodeManager node is the worker node and its capability is calculated based on the installed memory and cores of the CPU. Multiple data processing algorithms: The MapReduce framework is bounded to batch processing only. YARN is developed with a need to perform a wide variety of data processing over the data stored over Hadoop HDFS. YARN is a framework for generic resource management and allows users to execute multiple data processing algorithms over the data. Log aggregation and resource localization: As discussed earlier, accessing and managing user logs is difficult in the Hadoop 1.x framework. To manage user logs, YARN introduced a concept of log aggregation. In YARN, once the application is finished, the NodeManager service aggregates the user logs related to an application and these aggregated logs are written out to a single log file in HDFS. To access the logs, users can use either the YARN command-line options, YARN web interface, or can fetch directly from HDFS. A container might require external resources such as jars, files, or scripts on a local file system. These are made available to containers before they are started. An ApplicationMaster defines a list of resources that are required to run the containers. For efficient disk utilization and access security, the NodeManager ensures the availability of specified resources and their deletion after use. Projects powered by YARN Efficient and reliable resource management is a basic need of a distributed application framework. YARN provides a generic resource management framework to support data analysis through multiple data processing algorithms. There are a lot of projects that have started using YARN for resource management. We've listed a few of these projects here and discussed how YARN integration solves their business requirements: Apache Giraph: Giraph is a framework for offline batch processing of semistructured graph data stored using Hadoop. With the Hadoop 1.x version, Giraph had no control over the scheduling policies, heap memory of the mappers, and locality awareness for the running job. Also, defining a Giraph job on the basis of mappers / reducers slots was a bottleneck. YARN's flexible resource allocation model, locality awareness principle, and application master framework ease the Giraph's job management and resource allocation to tasks. Apache Spark: Spark enables iterative data processing and machine learning algorithms to perform analysis over data available through HDFS, HBase, or other storage systems. Spark uses YARN's resource management capabilities and framework to submit the DAG of a job. The spark user can focus more on data analytics' use cases rather than how spark is integrated with Hadoop or how jobs are executed. Some other projects powered by YARN are as follows: MapReduce: https://issues.apache.org/jira/browse/MAPREDUCE-279 Giraph: https://issues.apache.org/jira/browse/GIRAPH-13 Spark: http://spark.apache.org/ OpenMPI: https://issues.apache.org/jira/browse/MAPREDUCE-2911 HAMA: https://issues.apache.org/jira/browse/HAMA-431 HBase: https://issues.apache.org/jira/browse/HBASE-4329 Storm: http://hortonworks.com/labs/storm/ A page on Hadoop wiki lists a number of projects/applications that are migrating to or using YARN as their resource management tool. You can see this at http://wiki.apache.org/hadoop/PoweredByYarn. Summary This article covered an introduction to YARN, its components, architecture, and different projects powered by YARN. It also explained how YARN solves big data needs. Resources for Article: Further resources on this subject: YARN and Hadoop[article] Introduction to Hadoop[article] Hive in Hadoop [article]
Read more
  • 0
  • 0
  • 12629

article-image-connecting-open-ports
Packt
31 Aug 2015
6 min read
Save for later

Connecting to Open Ports

Packt
31 Aug 2015
6 min read
 Miroslav Vitula, the author of the book Learning zANTI2 for Android Pentesting, penned this article on Connecting to Open Ports, focusing on cracking passwords and setting up a remote desktop connection. Let's delve into the topics. (For more resources related to this topic, see here.) Cracking passwords THC Hydra is one of the best-known login crackers, supports numerous protocols, is flexible, and very fast. Hydra supports more than 30 protocols, including HTTP GET, HTTP HEAD, Oracle, pcAnywhere, rlogin, Telnet, SSH (v1 and v2 as well), and many, many more. As you might guess, THC Hydra is also implemented in zANTI2 and it eventually becomes an integral part of the app for its high functionality and usability. The zANTI2 developers named this section Password Complexity Audit and it is located under Attack Actions after a target is selected: After selecting this option, you've probably noticed there are several types of attack. First, there are multiple dictionaries: Small, Optimized, Big, and a Huge dictionary that contains the highest amount of usernames and passwords. To clarify, a dictionary attack is a method of breaking into a password-protected computer, service, or server by entering every word in a dictionary file as a username/password. Unlike a brute force attack, where any possible combinations are tried, a dictionary attack uses only those possibilities that are deemed most likely to succeed. Files used for dictionary attacks (also called wordlists) can be found anywhere on the Internet, starting from basic ones to huge ones containing more than 900,000,000 words for WPA2 WiFi cracking. zANTI2 also lets you use a custom wordlist for the attack: Apart from dictionary attacks, there is an Incremental option, which is used for brute force attacks. This attempts to guess the right combination using a custom range of letters/numbers: To set up the method properly, ensure the cracking options are correctly set. The area of searched combinations is defined by min-max charset, where min stands for minimum length of the password, max for maximum length, and charset for character set, which in our case will be defined as lowercase letters. The Automatic Mode, as the description says, automatically matches the list of protocols with the open ports on the target. To select a custom protocol manually, simply disable the Automatic Mode and select the protocol you want to perform the attack on: In our case that would be the SSH protocol for cracking a password used to establish the connection on port 22. Since incremental is a brute force method, this might take an extremely long time to find the right combination. For instance, the password zANTI2-hacks would take about 350 thousand years for a desktop PC to crack; there are 77 character combinations and 43 sextillion possible combinations. Therefore, it is generally better to use dictionary attacks for cracking passwords that might be longer than just a few characters. However, if you have a few thousand years to spare, feel free to use the brute force method. If everything went fine, you should now be able to view the access password with the username. You can easily connect to the target by tapping the finished result using one of the installed SSH clients: When connected, it's all yours. All Linux commands can be executed using the app and you now have the power to list directories, change the password, and more. Although connecting to port 22 might sound spicy, there is more to be discovered. A remote desktop connection Microsoft has made a handy feature called remote desktop. As the title suggests, this lets an ordinary user access his home computer when he is away, or be used for managing a server through a network. This is a great sign that we can intercept this connection and exploit an open port to set up a remote desktop connection between our mobile phone and a target. There is, however, one requirement. Since the RDP (Remote Desktop Protocol) port 3389 isn't open by default, a user has to allow connections from other computers. This option can be set in the control panel of Windows, and only then is port 3389 accessible. If the option Allow remote connections to this computer is ticked on the victim's machine, we're good to go. This will leave the 3389 port open and listening for incoming broadcasts, including the ones from malicious attackers. If we run a quick port discovery on the target, the remote desktop port with number 3389 will pop up. This is a good sign for us, indicating that this port is open and listening: Tap the port (ms-wbt-server). You will be asked for login credentials once again. Tap GO. Now, if you haven't got any remote desktop clients installed, zANTI2 will redirect you to Google Play to download one—the Parallels 2X RDP. This application, as you can tell, is capable of establishing remote desktop access from your Android device. It is stable, fast, and works very well. After downloading the application, go back to zANTI2 and connect to the port once again. You will now be redirected directly to the app and a connection will be established immediately. As you can see in the following screenshot, here's my computer—I'm currently working on the article! Apart from a simplified Windows user interface (using a basic XP look with no transparent bars and such), it is basically the same and you can take control over the whole system. The Parallels 2X RDP client offers a comfortable and easy way to move the mouse and use the keyboard. However, while connecting to port 445 a victim has no idea about an intruder accessing the files on his computer; connecting to this port will log the current user out from the current session. However, if the remote desktop is set to allow multiple sessions at once, it is possible for a victim to see what the attacker currently controls. The quality seems to be good, although the resolution  is only 804 x 496 pixels 32-bit color depth. Despite these conditions, it is still easy to access folders, view files, or open applications. As we can see in the practical demonstration, service ports should be accessible only by the authorized systems, not by anyone else. It is also a good way to teach you to secure login credentials on your machine to protect yourself not only from people behind your back but also mainly from people on the network. Summary In this article, we showed how a connection to these ports is established, how to crack password-protected ports, and how to access them afterwards using tools like ConnectBot or the remote desktop client. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android[article] Speeding up Gradle builds for Android[article] Android and UDOO for Home Automation [article]
Read more
  • 0
  • 0
  • 10925

article-image-how-it-all-fits-together
Packt
31 Aug 2015
4 min read
Save for later

How It All Fits Together

Packt
31 Aug 2015
4 min read
 In this article by Jonathan Hayward author of the book Reactive Programming with JavaScript he explains that Google Maps were a big hit when it came out, and it remains quite important, but the new functionality it introduced was pretty much nothing. The contribution Google made with its maps site was taking things previously only available with a steep learning cliff and giving them its easy trademark simplicity. And that was quite a lot. (For more resources related to this topic, see here.) Similar things might be said about ReactJS. No one at Facebook invented functional reactive programming. No one at Facebook appears to have significantly expanded functional reactive programming. But ReactJS markedly lowered the bar to entry. Previously, with respect to functional reactive programming, there were repeated remarks among seasoned C++ programmers; they said, "I guess I'm just stupid, or at least, I don't have a PhD in computational mathematics." And it might be suggested that proficiency in C++ is no mean feat; getting something to work in Python is less of a feat than getting the same thing to work in C++, just as scaling the local park's winter sledding hill is less of an achievement than scaling Mount Everest. Also, ReactJS introduces enough of changes so that competent C++ programmers who do not have any kind of degree in math, computational or otherwise, stand a fair chance of using ReactJS and being productive in it. Perhaps they may be less effective than pure JavaScript programmers who are particularly interested in functional programming. But learning to effectively program C++ is a real achievement, and most good C++ programmers have a fair chance of usefully implementing functional reactive programming with ReactJS. However, the same cannot be said for following the computer math papers on Wikipedia and implementing something in the academic authors' generally preferred language of Haskell. Here we'll explore a very important topic that is ReactJS as just a view—but what a view! ReactJS is just a view, but what a view! Charles Cézanne famously said, "Monet is just an eye, but what an eye!" Monet didn't try to show off his knowledge of structure and anatomy, but just copy what his eye saw. The consensus judgment of his work holds on to both "just an eye," and "what an eye!" And indeed, the details may be indistinct in Monet, who rebelled against artistry that tried to impress with deep knowledge of anatomy and knowledge of structure that is far beyond what jumps out to the eye. ReactJS is a framework rather than a library, which means that you are supposed to build a solution within the structure provided by ReactJS instead of plugging ReactJS into a solution that you structure yourself. The canonical example of a library is jQuery, where you build a solution your way, and call on jQuery as it fits into a structure that you design. However, ReactJS is specialized as a view. It's not that this is necessarily good or bad, but ReactJS is not a complete web development framework, and does not have even the intension of being the only tool you will ever need. It focuses on being a view, and in Facebook's offering, this does not include any form of AJAX call. This is not a monumental oversight in developing ReactJS; the expectation is that you use ReactJS as a View to provide the user interface functionality, and other tools to meet other needs as appropriate. This text hasn't covered using ReactJS together with your favorite tools, but do combine your favorite tools with ReactJS if they are not going to step on each other's feet. ReactJS may or may not collide with other Views, but it is meant to work with non-View technologies. Summary In this article, we looked at ReactJS as a view and also learned that ReactJS is not a complete web development framework. Resources for Article: Further resources on this subject: An Introduction to Reactive Programming[article] Kivy – An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js[article] Object-Oriented JavaScript with Backbone Classes [article]
Read more
  • 0
  • 0
  • 1692
article-image-building-click-go-robot
Packt
28 Aug 2015
16 min read
Save for later

Building a "Click-to-Go" Robot

Packt
28 Aug 2015
16 min read
 In this article by Özen Özkaya and Giray Yıllıkçı, author of the book Arduino Computer Vision Programming, you will learn how to approach computer vision applications, how to divide an application development process into basic steps, how to realize these design steps and how to combine a vision system with the Arduino. Now it is time to connect all the pieces into one! In this article you will learn about building a vision-assisted robot which can go to any point you want within the boundaries of the camera's sight. In this scenario there will be a camera attached to the ceiling and, once you get the video stream from the robot and click on any place in the view, the robot will go there. This application will give you an all-in-one development application. Before getting started, let's try to draw the application scheme and define the potential steps. We want to build a vision-enabled robot which can be controlled via a camera attached to the ceiling and, when we click on any point in the camera view, we want our robot to go to this specific point. This operation requires a mobile robot that can communicate with the vision system. The vision system should be able to detect or recognize the robot and calculate the position and orientation of the robot. The vision system should also give us the opportunity to click on any point in the view and it should calculate the path and the robot movements to get to the destination. This scheme requires a communication line between the robot and the vision controller. In the following illustration, you can see the physical scheme of the application setup on the left hand side and the user application window on the right hand side: After interpreting the application scheme, the next step is to divide the application into small steps by using the computer vision approach. In the data acquisition phase, we'll only use the scene's video stream. There won't be an external sensor on the robot because, for this application, we don't need one. Camera selection is important and the camera distance (the height from the robot plane) should be enough to see the whole area. We'll use the blue and red circles above the robot to detect the robot and calculate its orientation. We don't need smaller details. A resolution of about 640x480 pixels is sufficient for a camera distance of 120 cm. We need an RGB camera stream because we'll use the color properties of the circles. We will use the Logitech C110, which is an affordable webcam. Any other OpenCV compatible webcam will work because this application is not very demanding in terms of vision input. If you need more cable length you can use a USB extension cable. In the preprocessing phase, the first step is to remove the small details from the surface. Blurring is a simple and effective operation for this purpose. If you need to, you can resize your input image to reduce the image size and processing time. Do not forget that, if you resize to too small a resolution, you won't be able to extract useful information. The following picture is of the Logitech C110 webcam: The next step is processing. There are two main steps in this phase. The first step is to detect the circles in the image. The second step is to calculate the robot orientation and the path to the destination point. The robot can then follow the path and reach its destination. In color processing with which we can apply color filters to the image to get the image masks of the red circle and the blue circle, as shown in the following picture. Then we can use contour detection or blob analysis to detect the circles and extract useful features. It is important to keep it simple and logical: Blob analysis detects the bounding boxes of two circles on the robot and, if we draw a line between the centers of the circles, once we calculate the line angle, we will get the orientation of the robot itself. The mid-point of this line will be the center of the robot. If we draw a line from the center of the robot to the destination point we obtain the straightest route. The circles on the robot can also be detected by using the Hough transform for circles but, because it is a relatively slow algorithm and it is hard to extract image statistics from the results, the blob analysis-based approach is better. Another approach is by using the SURF, SIFT or ORB features. But these methods probably won't provide fast real-time behavior, so blob analysis will probably work better. After detecting blobs, we can apply post-filtering to remove the unwanted blobs. We can use the diameter of the circles, the area of the bounding box, and the color information, to filter the unwanted blobs. By using the properties of the blobs (extracted features), it is possible to detect or recognize the circles, and then the robot. To be able to check if the robot has reached the destination or not, a distance calculation from the center of the robot to the destination point would be useful. In this scenario, the robot will be detected by our vision controller. Detecting the center of the robot is sufficient to track the robot. Once we calculate the robot's position and orientation, we can combine this information with the distance and orientation to the destination point and we can send the robot the commands to move it! Efficient planning algorithms can be applied in this phase but, we'll implement a simple path planning approach. Firstly, the robot will orientate itself towards the destination point by turning right or left and then it will go forward to reach the destination. This scenario will work for scenarios without obstacles. If you want to extend the application for a complex environment with obstacles, you should implement an obstacle detection mechanism and an efficient path planning algorithm. We can send the commands such as Left!, Right!, Go!, or Stop! to the robot over a wireless line. RF communication is an efficient solution for this problem. In this scenario, we need two NRF24L01 modules—the first module is connected to the robot controller and the other is connected to the vision controller. The Arduino is the perfect means to control the robot and communicate with the vision controller. The vision controller can be built on any hardware platform such as a PC, tablet, or a smartphone. The vision controller application can be implemented on lots of operating systems as OpenCV is platform-independent. We preferred Windows and a laptop to run our vision controller application. As you can see, we have divided our application into small and easy-to-implement parts. Now it is time to build them all! Building a robot It is time to explain how to build our Click-to-Go robot. Before going any further we would like to boldly say that robotic projects can teach us the fundamental fields of science such as mechanics, electronics, and programming. As we go through the building process of our Click-to-Go robot, you will see that we have kept it as simple as possible. Moreover, instead of buying ready-to-use robot kits, we have built our own simple and robust robot. Of course, if you are planning to buy a robot kit or already have a kit available, you can simply adapt your existing robot into this project. Our robot design is relatively simple in terms of mechanics. We will use only a box-shaped container platform, two gear motors with two individual wheels, a battery to drive the motors, one nRF24L01 Radio Frequency (RF) transceiver module, a bunch of jumper wires, an L293D IC and, of course, one Arduino Uno board module. We will use one more nRF24L01 and one more Arduino Uno for the vision controller communication circuit. Our Click-to-Go robot will be operated by a simplified version of a differential drive. A differential drive can be summarized as a relative speed change on the wheels, which assigns a direction to the robot. In other words, if both wheels spin at the same rate, the robot goes forward. To drive in reverse, the wheels spin in the opposite direction. To turn left, the left wheel turns backwards and the right wheel stays still or turns forwards. Similarly, to turn right, the right wheel turns backwards and the left stays still or turns forwards. You can get curved paths by varying the rotation speeds of the wheels. Yet, to cover every aspect of this comprehensive project, we will drive the wheels of both the motors forward to go forwards. To turn left, the left wheel stays still and the right wheel turns forward. Symmetrically, to turn right, the right motor stays still and the left motor runs forward. We will not use running motors in a reverse direction to go backwards. Instead, we will change the direction of the robot by turning right or left. Building mechanics As we stated earlier, the mechanics of the robot are fairly simple. First of all we need a small box-shaped container to use as both a rigid surface and the storage for the battery and electronics. For this purpose, we will use a simple plywood box. We will attach gear motors in front of the plywood box and any kind of support surface to the bottom of the box. As can be seen in the following picture, we used a small wooden rod to support the back of the robot to level the box: If you think that the wooden rod support is dragging, we recommend adding a small ball support similar to Pololu's ball caster, shown at https://www.pololu.com/product/950. It is not a very expensive component and it significantly improves the mobility of the robot. You may want to drill two holes next to the motor wirings to keep the platform tidy. The easiest way to attach the motors and the support rod is by using two-sided tape. Just make sure that the tape is not too thin. It is much better to use two-sided foamy tape. The topside of the robot can be covered with a black shell to enhance the contrast between the red and blue circles. We will use these circles to ascertain the orientation of the robot during the operation, as mentioned earlier. For now, don't worry too much about this detail. Just be aware that we need to cover the top of the robot with a flat surface. We will explain in detail on how these red and blue circles are used. It is worth mentioning that we used large water bottle lids. It is better to use matt surfaces instead of shiny surfaces to avoid glare in the image. The finished Click-to-Go robot should be similar to the robot shown in the following picture. The robot's head is on the side with the red circle: As we have now covered building the mechanics of our robot we can move on to building the electronics. Building the electronics We will use two separate Arduino Unos for this vision-enabled robot project, one each for the robot and the transmitter system. The electronic setup needs a little bit more attention than the mechanics. The electronic components of the robot and the transmitter units are similar. However, the robot needs more work. We have selected nRF24L01 modules for the wireless communication module,. These modules are reliable and easy to find from both the Internet and local hobby stores. It is possible to use any pair of wireless connectivity modules but, for this project, we will stick with nRF24L01 modules, as shown in this picture: For the driving motors we will need to use a quadruple half-H driver, L293D. Again, every electronic shop should have these ICs. As a reminder, you may need to buy a couple of spare L293D ICs in case you burn the IC by mistake. Following is the picture of the L293D IC: We will need a bunch of jumper wires to connect the components together. It is nice to have a small breadboard for the robot/receiver, to wire the L293D. The transmitter part is very simple so a breadboard is not essential. Robot/receiver and transmitter drawings The drawings of both the receiver and the transmitter have two common modules: Arduino Uno and nRF24L01 connectivity modules. The connections of the nRF24L01 modules on both sides are the same. In addition to these connectivity modules, for the receiver, we need to put some effort into connecting the L293D IC and the battery to power up the motors. In the following picture, we can see a drawing of the transmitter. As it will always be connected to the OpenCV platform via the USB cable, there is no need to feed the system with an external battery: As shown in the following picture of the receiver and the robot, it is a good idea to separate the motor battery from the battery that feeds the Arduino Uno board because the motors may draw high loads or create high loads, which can easily damage the Arduino board's pin outs. Another reason is to keep the Arduino working even if the battery motor has drained. Separating the feeder batteries is a very good practice to follow if you are planning to use more than one 12V battery. To keep everything safe, we fed the Arduino Uno with a 6V battery pack and the motors with a 9V battery: Drawings of receiver systems can be little bit confusing and lead to errors. It is a good idea to open the drawings and investigate how the connections are made by using Fritzing. You can download the Fritzing drawings of this project from https://github.com/ozenozkaya/click_to_go_robot_drawings. To download the Fritzing application, visit the Fritzing download page: http://fritzing.org/download/ Building the robot controller and communications We are now ready to go through the software implementation of the robot and the transmitter. Basically what we are doing here is building the required connectivity to send data to the remote robot continuously from OpenCV via a transmitter. OpenCV will send commands to the transmitter through a USB cable to the first Arduino board, which will then send the data to the unit on the robot. And it will send this data to the remote robot over the RF module. Follow these steps: Before explaining the code, we need to import the RF24 library. To download RF24 library drawings please go to the GitHub link at https://github.com/maniacbug/RF24. After downloading the library, go to Sketch | Include Library | Add .ZIP Library… to include the library in the Arduino IDE environment. After clicking Add .ZIP Library…, a window will appear. Go into the downloads directory and select the RF24-master folder that you just downloaded. Now you are ready to use the RF24 library. As a reminder, it is pretty much the same to include a library in Arduino IDE as on other platforms. It is time to move on to the explanation of the code! It is important to mention that we use the same code for both the robot and the transmitter, with a small trick! The same code works differently for the robot and the transmitter. Now, let's make everything simpler during the code explanation. The receiver mode needs to ground an analog 4 pin. The idea behind the operation is simple; we are setting the role_pin to high through its internal pull-up resistor. So, it will read high even if you don't connect it, but you can still safely connect it to ground and it will read low. Basically, the analog 4 pin reads 0 if the there is a connection with a ground pin. On the other hand, if there is no connection to the ground, the analog 4 pin value is kept as 1. By doing this at the beginning, we determine the role of the board and can use the same code on both sides. Here is the code: #include <SPI.h> #include "nRF24L01.h" #include "RF24.h" #define MOTOR_PIN_1 3 #define MOTOR_PIN_2 5 #define MOTOR_PIN_3 6 #define MOTOR_PIN_4 7 #define ENABLE_PIN 4 #define SPI_ENABLE_PIN 9 #define SPI_SELECT_PIN 10 const int role_pin = A4; typedef enum {transmitter = 1, receiver} e_role; unsigned long motor_value[2]; String input_string = ""; boolean string_complete = false; RF24 radio(SPI_ENABLE_PIN, SPI_SELECT_PIN); const uint64_t pipes[2] = { 0xF0F0F0F0E1LL, 0xF0F0F0F0D2LL }; e_role role = receiver; void setup() { pinMode(role_pin, INPUT); digitalWrite(role_pin, HIGH); delay(20); radio.begin(); radio.setRetries(15, 15); Serial.begin(9600); Serial.println(" Setup Finished"); if (digitalRead(role_pin)) { Serial.println(digitalRead(role_pin)); role = transmitter; } else { Serial.println(digitalRead(role_pin)); role = receiver; } if (role == transmitter) { radio.openWritingPipe(pipes[0]); radio.openReadingPipe(1, pipes[1]); } else { pinMode(MOTOR_PIN_1, OUTPUT); pinMode(MOTOR_PIN_2, OUTPUT); pinMode(MOTOR_PIN_3, OUTPUT); pinMode(MOTOR_PIN_4, OUTPUT); pinMode(ENABLE_PIN, OUTPUT); digitalWrite(ENABLE_PIN, HIGH); radio.openWritingPipe(pipes[1]); radio.openReadingPipe(1, pipes[0]); } radio.startListening(); } void loop() { // TRANSMITTER CODE BLOCK // if (role == transmitter) { Serial.println("Transmitter"); if (string_complete) { if (input_string == "Right!") { motor_value[0] = 0; motor_value[1] = 120; } else if (input_string == "Left!") { motor_value[0] = 120; motor_value[1] = 0; } else if (input_string == "Go!") { motor_value[0] = 120; motor_value[1] = 110; } else { motor_value[0] = 0; motor_value[1] = 0; } input_string = ""; string_complete = false; } radio.stopListening(); radio.write(motor_value, 2 * sizeof(unsigned long)); radio.startListening(); delay(20); } // RECEIVER CODE BLOCK // if (role == receiver) { Serial.println("Receiver"); if (radio.available()) { bool done = false; while (!done) { done = radio.read(motor_value, 2 * sizeof(unsigned long)); delay(20); } Serial.println(motor_value[0]); Serial.println(motor_value[1]); analogWrite(MOTOR_PIN_1, motor_value[1]); digitalWrite(MOTOR_PIN_2, LOW); analogWrite(MOTOR_PIN_3, motor_value[0]); digitalWrite(MOTOR_PIN_4 , LOW); radio.stopListening(); radio.startListening(); } } } void serialEvent() { while (Serial.available()) { // get the new byte: char inChar = (char)Serial.read(); // add it to the inputString: input_string += inChar; // if the incoming character is a newline, set a flag // so the main loop can do something about it: if (inChar == '!' || inChar == '?') { string_complete = true; Serial.print("data_received"); } } } This example code is taken from one of the examples in the RF24 library. We have changed it in order to serve our needs in this project. The original example can be found in the RF24-master/Examples/pingpair directory. Summary We have combined everything we have learned up to now and built an all-in-one application. By designing and building the Click-to-Go robot from scratch you have embraced the concepts. You can see that the vision approach very well, even for complex applications. You now know how to divide a computer vision application into small pieces, how to design and implement each design step, and how to efficiently use the tools you have. Resources for Article: Further resources on this subject: Getting Started with Arduino[article] Arduino Development[article] Programmable DC Motor Controller with an LCD [article]
Read more
  • 0
  • 0
  • 2651

article-image-how-write-your-first-fabfile
Liz Tom
26 Aug 2015
5 min read
Save for later

How to Write Your First Fabfile

Liz Tom
26 Aug 2015
5 min read
Fabric is a Python library that makes it easy to run scripts over SSH. Fabric currently supports Python 2.5 - 2.7 but not Python 3 yet. Fabric has great documentation so you can also check out their site Why Use Fabric? Fabric is great to use because it makes executing commands over SSH super easy. I think the Fabric tutorial explains it best. Fabric is a Python (2.5-2.7) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. More specifically, Fabric is: A tool that lets you execute arbitrary Python functions via the command line; A library of subroutines (built on top of a lower-level library) to make executing shell commands over SSH easy and Pythonic. Naturally, most users combine these two things, using Fabric to write and execute Python functions, or tasks, to automate interactions with remote servers. What I Use Fabric For At my job, we use Fabric as an API to interact with our servers. We can deploy apps from any of our servers using a series of fab tasks. Installing Fabric The first thing you'll want to do when you start building your first Fabfile is to install Fabric. $ pip install fabric If you haven't used pip before you can find out more here But basically, pip is a package manager for Python libraries. Write Your First Fabfile Ok! Let's start writing this Fabfile. In your project's root directory (You can actually do this anywhere but I'm assuming you are using Fabfile for a specific project). $ touch fabfile.py Then in fabfile.py: def add(a, b): print int(a) + int(b) In your console, run: $ fab add:1,2 Congratulations! That's your very first fab command. One thing to notice is the way you pass arguments to the fab task. Now, in your console, run: $ fab --list You should see an output of your fab tasks you can run. This comes in handy when your Fabfile gets larger. This isn't very interesting yet... Write Your First More Useful Fabfile One of the very first things I learned to do with command line was ls. In order to run ls on using Fabfile we just do the following: from fabric.api import run, env def sub_list_files(): run("ls") Now, if I run: $ fab -H [host_name] sub_list_files This is the same as me doing: $ ssh [host_name] $ ls $ exit Ok, so it's not that exciting yet. But let's say I love adding and removing files and checking to make sure things happened the way I intended. from fabric.api import run def sub_list_files(): run("ls") def sub_create_file(name): run("touch " + name) def sub_remove_file(name): run("rm " + name) def create_file(name): sub_create_file(name) sub_list_files() def delete_file(name): sub_remove_file(name) sub_list_files() Instead of running: $ ssh [host_name] $ touch my_super_cool_file.py $ ls $ exit  I can just do: $ fab -H [host_name] create_file:my_super_cool_file.py OR: $ fab -H [host_name] sub_create_file:my_super_cool_file.py sub_list_files Fabric with Different Environments So let's say I have one virtual machine that I need to SSH into often and I don't want to have to keep using the -H flag. I can set the host name in my fabfile. from fabric.api import env, run env.hosts = ['nameof.server'] def sub_list_files(): run("ls") Now instead of having to set the -H flag I can just use: $ fab sub_list_files Now let's say I have multiple environments. I'll need a way to differentiate between which environment I want to work in. For this example, let's say you have 2 servers. You have 'staging' and 'production'. with something.staging.com and something.production.com associated with them. You'll want to be able to use: $ fab staging sub_list_files And: $ fab production sub_list_files In order to get this working we just have to add the following code to our file. from fabric.api import env, run env.hosts = ['staging.server', 'production.server'] def sub_list_files(): run("ls") Now when you run $ fab sub_list_files Fabric loops over all the servers and runs ls on all the servers in the env.hosts array. You probably don't want to run commands across all of your servers everytime you run fab commands. In order to specify which server you'd like to communicate with you'll just need to restructure slightly by replacing: env.hosts = ['staging.server', 'production.server'] with: def staging(): env.hosts = ['staging.server'] def production(): env.hosts = ['production.server'] Now, you can call:  $ fab staging create_file:my_cool_file.py Fabric Fun The documentation for Fabric is pretty good. So I do suggest reading through it to see what the Fabric API has to offer. One thing I found to be fun is the colors module. from fabric.colors import red def hello_world(): print red("hello world!") This will print a red 'hello world!' to your console. Neat! I encourage you to have fun with it. Try and use Fabric with anything that requires you to SSH. About the Author Liz Tom is a Creative Technologist at iStrategyLabs in Washington D.C. Liz’s passion for full stack development and digital media makes her a natural fit at ISL. Before joining iStrategyLabs, she worked in the film industry doing everything from mopping blood off of floors to managing budgets. When she’s not in the office, you can find Liz attempting parkour and going to check out interactive displays at museums.
Read more
  • 0
  • 0
  • 11051
Modal Close icon
Modal Close icon