How-To Tutorials

05 Jul 2017

9 min read

Azure Feature Pack

05 Jul 2017

0
0
4100

Packt

05 Jul 2017

16 min read

Lambda Functions

Packt

05 Jul 2017

16 min read

In this article, by Udita Gupta and Yohan Wadia, the authors of the book Mastering AWS Lambda, we are going to take things a step further by learning the anatomy of a typical Lambda Function and also how to actually write your own functions. We will cover the programming model for a Lambda function using simple functions as examples, the use of logs and exceptions and error handling. (For more resources related to this topic, see here.) The Lambda programming model Certain applications can be broken down into one or more simple nuggets of code called as functions and uploaded to AWS Lambda for execution. Lambda then takes care of provisioning the necessary resources to run your function along with other management activities such as auto-scaling of your functions, their availability, and so on. So what exactly are we supposed to do in all this? A developer basically has three tasks to perform when it comes to working with Lambda: Writing the code Packaging it for deployment Finally monitoring its execution and fine tuning In this section, we are going to explore the different components that actually make up a Lambda Function by understanding what AWS calls as a programming model or a programming pattern. As of date, AWS officially supports Node.js, Java, Python, and C# as the programming languages for writing Lambda functions, with each language following a generic programming pattern that comprises of certain concepts which we will see in the following sections. Handler The handler function is basically a function that Lambda calls first for execution. A handler function is capable of processing incoming event data that is passed to it as well as invoking other functions or methods from your code. We will be concentrating a lot of our code and development on Node.js; however, the programming model remains more or less the same for the other supported languages as well. A skeleton structure of a handler function is shown as follows: exports.myHandler = function(event, context, callback) { // Your code goes here. callback(); } Where, myHandler is the name of your handler function. By exporting it we make sure that Lambda knows which function it has to invoke first. The other parameters that are passed with the handler function are: event: Lambda uses this parameter to pass any event related data back to the handler. context: Lambda again uses this parameter to provide the handler with the function's runtime information such as the name of the function, the time it took to execute, and so on . callback: This parameter is used to return any data back to its caller. The callback parameter is the only optional parameter that gets passed when writing handlers. If not specified, AWS Lambda will call it implicitly and return the value as null. The callback parameter also supports two optional parameters in the form of error and result where error will return any of the function's error information back to the caller while result will return any result of your function's successful execution. Here are a few simple examples of invoking callbacks in your handler: callback() callback(null, 'Hello from Lambda') callback(error) The callback parameter is supported only in Node.js runtime v4.3. You will have to use the context methods in case your code supports earlier Node.js runtime (v0.10.42) Let us try out a simple handler example with a code: exports.myHandler = function(event, context, callback) { console.log("value = " + event.key); console.log("functionName = ", context.functionName); callback(null, "Yippee! Something worked!"); }; The following code snippet will print the value of an event (key) that we will pass to the function, print the function's name as part of the context object and finally print the success message Yippee! Something worked! if all goes well! Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the preceding code snippet in the inline code editor as shown: Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler as shown. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution before selecting the Next button: In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option: { "key": "My Printed Value!!" } With your code execution completed, you should get a similar execution result as shown in the following figure. The important things to note here are the values for the event, context and callback parameters. You can note the callback message being returned back to the caller as the function executed successfully. The other event and context object values are printed in the Log output section as highlighted in the following figure: In case you end up with any errors, make sure the handler function name matches the handler name that you passed during the function's configuration. Context object The context object is a really useful utility when it comes to obtaining runtime information about your function. The context object can provide information such as the executing function's name, the time remaining before Lambda terminates your function's execution, the log name and stream associated with your function and much more. The context object also comes with its own methods that you can call to correctly terminate your function's executions such as context.succed(), context.fail(), context.done(), and so on. However, post April 2016, Lambda has transitioned the Node.js runtime from v0.10.42 to v4.3 which does support these methods however encourages to use the callback() for performing the same actions. Here are some of the commonly used context object methods and properties described as follows: getRemainingTimeInMillis(): This property returns the number of milliseconds left for execution before Lambda terminates your function. This comes in really handy when you want to perform some corrective actions before your function exits or gets timed out. callbackWaitsForEmptyEventLoop: This property is used to override the default behaviour of a callback() function, such as to wait till the entire event loop is processed and only then return back to the caller. If set to false, this property causes the callback() function to stop any further processing in the event loop even if there are any other tasks to be performed. The default value is set to true. functionName: This property returns the name of the executing Lambda function. functionVersion: The current version of the executing Lambda function. memoryLimitInMB: The amount of resource in terms of memory set for your Lambda function. logGroupName: This property returns the name of the CloudWatch Log Group that stores function's execution logs. logStreamName: This property returns the name of the CloudWatch Log Stream that stores function's execution logs. awsRequestID: This property returns the request ID associated with that particular function's execution. If you are using Lambda functions as mobile backend processing services, you can then extract additional information about your mobile application using the context of identity and clientContext objects. These are invoked using the AWS Mobile SDK. To learn more, click here http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html. Let us look at a simple example to understand the context object a bit better. In this example, we are using the context object callbackWaitsForEmptyEventLoop and demonstrating its working by setting the object's value to either yes or no on invocation: Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the following code in the inline code editor: exports.myHandler = (event, context, callback) => { console.log('remaining time =', context.getRemainingTimeInMillis()); console.log('functionName =', context.functionName); console.log('AWSrequestID =', context.awsRequestId); console.log('logGroupName =', context.logGroupName); console.log('logStreamName =', context.logStreamName); switch (event.contextCallbackOption) { case "no": setTimeout(function(){ console.log("I am back from my timeout of 30 seconds!!"); },30000); // 30 seconds break break; case "yes": console.log("The callback won't wait for the setTimeout() n if the callbackWaitsForEmptyEventLoop is set to false"); setTimeout(function(){ console.log("I am back from my timeout of 30 seconds!!"); },30000); // 30 seconds break context.callbackWaitsForEmptyEventLoop = false; break; default: console.log("The Default code block"); } callback(null, 'Hello from Lambda'); }; Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler as shown. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution. The final change that we will do is change the Timeout value of our function from the default 3 seconds to 1 minute specifically for this example. Click Next to continue: In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option. You should see a similar output in the Log output window as shown: With the contextCallbackOption set to yes, the function does not wait for the 30 seconds setTimeout() function and will exit, however it prints the function's runtime information such as the remaining execution time, the function name, and so on. Now set the contextCallbackOption to no and re-run the test and verify the output. This time, you can see the setTimeout() function getting called and verify the same by comparing the remaining time left for execution with the earlier test run. Logging You can always log your code's execution and activities using simple log statements. The following statements are supported for logging with Node.js runtime: console.log() console.error() console.warn() console.info() The logs can be viewed using both the Management Console as well as the CLI. Let us quickly explore both the options. Using the Management Console We have already been using Lambda's dashboard to view the function's execution logs, however the logs are only for the current execution. To view your function's logs from the past, you need to view them using the CloudWatch Logs section: To do so, search and select CloudWatch option from the AWS Management Console. Next, select the Logs option to display the function's logs as shown in the following figure: You can use the Filter option to filter out your Lambda logs by typing in the log group name prefix as /aws/lambda. Select any of the present Log Groups and its corresponding Log Stream Name to view the complete and detailed execution logs of your function. If you do not see any Lambda logs listed out here it is mostly due to your Lambda execution role. Make sure your role has the necessary access rights to create the log group and log stream along with the capability to put log events. Using the CLI The CLI provides two ways using which you can view your function's execution logs: The first is using the Lambda function's invoke command itself. The invoke command when used with the --log-type parameter will print the latest 4 KB of log data that is written to CloudWatch Logs. To do so, first list out all available functions in your current region using the following command: # aws lambda list-functions Next, pick a Lambda function that you wish to invoke and substitute that function's name and payload with the following example snippet: # aws lambda invoke --invocation-type RequestResponse --function-name myFirstFunction --log-type Tail --payload '{"key1":"Lambda","key2":"is","key3":"awesome!"}' output.txt The second way is by using a combination of the context() object and the CloudWatch CLI. You can obtain your function's log group name and the log stream name using the context.logGroupName and the context.logStreamName. Next, substitute the data gathered from the output of these parameters in the following command: # aws logs get-log-events --log-group-name "/aws/lambda/myFirstFunction" --log-stream-name "2017/02/07/[$LATEST]1ae6ac9c77384794a3202802c683179a" If you run into the error The specified log stream does not exist in spite of providing correct values for the log group name and stream name; then make sure to add the "" escape character in the [$LATEST] as shown. Let us look at a few options that you can additionally pass with the get-log-events command: --start-time: The start of the log's time range. All times are in UTC. --end-time: The end of the log's time range. All times are in UTC. --next-token: The token for the next set of items to return. (You received this token from a previous call.) --limit: Used to set the maximum number of log events returned. By default the limit is set to either 10,000 log events. Alternatively, if you don't wish to use the context() objects in your code, you can still filter out the log group name and log stream name by using a combination of the following commands: # aws logs describe-log-groups --log-group-name-prefix "/aws/lambda/" The describe-log-groups command will list all the log groups that are prefixed with /aws/lambda. Make a note of your function's log group name from this output. Next, execute the following command to list your log group name's associated log stream names: # aws logs describe-log-streams --log-group-name "/aws/lambda/myFirstFunction" Make a note of the log stream name and substitute the same in the next and final command to view your log events for that particular log stream name: # aws logs get-log-events --log-group-name "/aws/lambda/myFirstFunction" --log-stream-name "2017/02/07/[$LATEST]1ae6ac9c77384794a3202802c683179a" Once again, make sure to add the backslash "" in the [$LATEST] to avoid the The specified log stream does not exist error. With the logging done, let's move on to the next piece of the programming model called exceptions. Exceptions and error handling Functions have the ability to notify AWS Lambda in case it failed to execute correctly. This is primarily done by the function passing the error object to Lambda which converts the same to a string and returns it to the user as an error message. The error messages that are returned also depend on the invocation type of the function; for example, if your function performs a synchronous execution (RequestResponse invocation type), then the error is returned back to the user and displayed on the Management Console as well as in the CloudWatch Logs. For any asynchronous executions (event invocation type), Lambda will not return anything. Instead it logs the error messages to CloudWatch Logs. Let us examine a function's error and exception handling capabilities with a simple example of a calculator function that accepts two numbers and an operand as the test events during invocation: Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the following code in the inline code editor: exports.myHandler = (event, context, callback) => { console.log("Hello, Starting the "+ context.functionName +" Lambda Function"); console.log("The event we pass will have two numbers and an operand value"); // operand can be +, -, /, *, add, sub, mul, div console.log('Received event:', JSON.stringify(event, null, 2)); var error, result; if (isNaN(event.num1) || isNaN(event.num2)) { console.error("Invalid Numbers"); // different logging error = new Error("Invalid Numbers!"); // Exception Handling callback(error); } switch(event.operand) { case "+": case "add": result = event.num1 + event.num2; break; case "-": case "sub": result = event.num1 - event.num2; break; case "*": case "mul": result = event.num1 * event.num2; break; case "/": case "div": if(event.num2 === 0){ console.error("The divisor cannot be 0"); error = new Error("The divisor cannot be 0"); callback(error, null); } else{ result = event.num1/event.num2; } break; default: callback("Invalid Operand"); break; } console.log("The Result is: " + result); callback(null, result); }; Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution. Leave the rest of the values to their defaults and click Next to continue. In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option. You should see a similar output in the Log output window as shown: { "num1": 3, "num2": 0, "operand": "div" } So what just happened there? Well first, we can print simple user friendly error messages with the help of the console.error() statement. Additionally, we can also print the stackTrace array of the error by passing the error in the callback() as shown: error = new Error("The divisor cannot be 0"); callback(error, null); You can also view the custom error message and the stackTrace JSON array both from the Lambda dashboard as well as from the CloudWatch Logs section. Next, give this code a couple of tries with some different permutations and combinations of events and check out the results. You can even write your own custom error messages and error handlers that can perform some additional task when an error is returned by the function. With this we come towards the end of a function's generic programming model and its components. Summary We deep dived into the Lambda programming model and understood each of its sub components (handlers, context objects, errors and exceptions) with easy to follow examples.

0
0
51421

Packt

05 Jul 2017

9 min read

Planning and Preparation

Packt

05 Jul 2017

9 min read

In this article by Jason Beltrame, authors of the book Penetration Testing Bootcamp, Proper planning and preparation is key to a successful penetration test. It is definitely not as exciting as some of the tasks we will do within the penetration test later, but it will lay the foundation of the penetration test. There are a lot of moving parts to a penetration test, and you need to make sure that you stay on the correct path and know just how far you can and should go. The last thing you want to do in a penetration test is cause a customer outage because you took down their application server with an exploit test (unless, of course, they want us to get to that depth) or scanned the wrong network. Performing any of these actions would cause our penetration-testing career to be a rather short-lived career. In this article, following topics will be covered: Why does penetration testing take place? Building the systems for the penetration test Penetration system software setup (For more resources related to this topic, see here.) Why does penetration testing take place? There are many reasons why penetration tests happen. Sometimes, a company may want to have a stronger understanding of their security footprint. Sometimes, they may have a compliance requirement that they have to meet. Either way, understanding why penetration testing is happening will help you understand the goal of the company. Plus, it will also let you know whether you are performing an internal penetration test or an external penetration test. External penetration tests will follow the flow of an external user and see what they have access to and what they can do with that access. Internal penetration tests are designed to test internal systems, so typically, the penetration box will have full access to that environment, being able to test all software and systems for known vulnerabilities. Since tests have different objectives, we need to treat them differently; therefore, our tools and methodologies will be different. Understanding the engagement One of the first tasks you need to complete prior to starting a penetration test is to have a meeting with the stakeholders and discuss various data points concerning the upcoming penetration test. This meeting could be you as an external entity performing a penetration test for a client or you as an internal security employee doing the test for your own company. The important element here is that the meeting should happen either way, and the same type of information needs to be discussed. During the scoping meeting, the goal is to discuss various items of the penetration test so that you have not only everything you need, but also full management buy-in with clearly defined objectives and deliverables. Full management buy-in is a key component for a successful penetration test. Without it, you may have trouble getting required information from certain teams, scope creep, or general pushback. Building the systems for the penetration test With a clear understanding of expectations, deliverables, and scope, it is now time to start working on getting our penetration systems ready to go. For the hardware, I will be utilizing a decently powered laptop. The laptop specifications are a Macbook Pro with 16 GB of RAM, 256 GB SSD, and a quad-core 2.3 Ghz Intel i7 running VMware Fusion. I will also be using the Raspberry Pi 3. The Raspberry Pi 3 is a 1.2 Ghz ARMv8 64-bit Quad Core, with 1GB of RAM and a 32 GB microSD. Obviously, there is quite a power discrepancy between the laptop and the Raspberry Pi. That is okay though, because I will be using both these devices differently. Any task that requires any sort of processing power will be done on the laptop. I love using the Raspberry Pi because of its small form factor and flexibility. It can be placed in just about any location we need, and if needed, it can be easily concealed. For software, I will be using Kali Linux as my operating system of choice. Kali is a security-oriented Linux distribution that contains a bunch of security tools already installed. Its predecessor, Backtrack, was also a very popular security operating system. One of the benefits of Kali Linux is that it is also available for the Raspberry Pi, which is perfect in our circumstance. This way, we can have a consistent platform between devices we plan to use in our penetration-testing labs. Kali Linux can be downloaded from their site at https://www.kali.org. For the Raspberry Pi, the Kali images are managed by Offensive Security at https://www.offensive-security.com. Even though I am using Kali Linux as my software platform of choice, feel free to use whichever software platform you feel most comfortable with. We will be using a bunch of open source tools for testing. A lot of these tools are available for other distributions and operating systems. Penetration system software setup Setting up Kali Linux on both systems is a bit different since they are different platforms. We won't be diving into a lot of details on the install, but we will be hitting all the major points. This is the process you can use to get the software up and running. We will start with the installation on the Raspberry Pi: Download the images from Offensive Security at https://www.offensive-security.com/kali-linux-arm-images/. Open the Terminal app on OS X. Using the utility xz, you can decompress the Kali image that was downloaded: xz-dkali-2.1.2-rpi2.img.xz Next, you insert the USB microSD card reader with the microSD card into the laptop and verify the disks that are installed so that you know the correct disk to put the Kali image on: diskutillist Once you know the correct disk, you can unmount the disk to prepare to write to it: diskutilunmountDisk/dev/disk2 Now that you have the correct disk unmounted, you will want to write the image to it using the dd command. This process can take some time, so if you want to check on the progress, you can run the Ctrl + T command anytime: sudoddif=kali-2.1.2-rpi2.imgof=/dev/disk2bs=1m Since the image is now written to the microSD drive, you can eject it with the following command: diskutileject/dev/disk2 You then remove the USB microSD card reader, place the microSD card in the Raspberry Pi, and boot it up for the first time. The default login credentials are as follows: Username:root Password:toor You then change the default password on the Raspberry Pi to make sure no one can get into it with the following command: Passwd<INSERTPASSWORDHERE> Making sure the software is up to date is important for any system, especially a secure penetration-testing system. You can accomplish this with the following commands: apt-getupdate apt-getupgrade apt-getdist-upgrade After a reboot, you are ready to go on the Raspberry Pi. Next, it's onto setting up the Kali Linux install on the Mac. Since you will be installing Kali as a VM within Fusion, the process will vary compared to another hypervisor or installing on a bare metal system. For me, I like having the flexibility of having OS X running so that I can run commands on there as well: Similar to the Raspberry Pi setup, you need to download the image. You will do that directly via the Kali website. They offer virtual images for downloads as well. If you go to select these, you will be redirected to the Offensive Security site at https://www.offensive-security.com/kali-linux-vmware-virtualbox-image-download/. Now that you have the Kali Linux image downloaded, you need to extract the VMDK. We used 7z via CLI to accomplish this task: Since the VMDK is ready to import now, you will need to go into VMware Fusion and navigate to File | New. A screen similar to the following should be displayed: Click on Create a custom virtual machine. You can select the OS as Other | Other and click on Continue: Now, you will need to import the previously decompressed VMDK. Click on the Use an existing virtual disk radio button, and hit Choose virtual disk. Browse the VMDK. Click on Continue. Then, on the last screen, click on the Finish button. The disk should now start to copy. Give it a few minutes to complete: Once completed, the Kali VM will now boot. Log in with the credentials we used in the Raspberry Pi image: Username:root Password:toor You need to then change the default password that was set to make sure no one can get into it. Open up a terminal within the Kali Linux VM and use the following command: Passwd<INSERTPASSWORDHERE> Make sure the software is up to date, like you did for the Raspberry Pi. To accomplish this, you can use the following commands: apt-getupdate apt-getupgrade apt-getdist-upgrade Once this is complete, the laptop VM is ready to go. Summary Now that we have reached the end of this article, we should have everything that we need for the penetration test. Having had the scoping meeting with all the stakeholders, we were able to get answers to all the questions that we required. Once we completed the planning portion, we moved onto the preparation phase. In this case, the preparation phase involved setting up Kali Linux on both the Raspberry Pi as well as setting it up as a VM on the laptop. We went through the steps of installing and updating the software on each platform as well as some basic administrative tasks. Resources for Article: Further resources on this subject: Introducing Penetration Testing [article] Web app penetration testing in Kali [article] BackTrack 4: Security with Penetration Testing Methodology [article]

0
0
21847

How-To Tutorials

Packt

05 Jul 2017

9 min read

Object-Oriented Scala

Packt

05 Jul 2017

9 min read

In this article by Md. Rezaul Karim and Sridhar Alla, author of Scala and Spark for Big Data Analytics, we will discuss the basic object-oriented features in Scala. In a nutshell, the following topics will be covered in this article: Variables in Scala (For more resources related to this topic, see here.) Variables in Scala Before entering the depth of OOP features, at first, we need to know the details of the different types of variables and data types in Scala. To declare a variable in Scala, you need to use the var or val keywords. The formal syntax of declaring a variable in Scala is as follows: val or var VariableName : DataType = Initial_Value For example, let's see how we can declare two variables whose data types are explicitly specified as shown: var myVar : Int = 50 val myVal : String = "Hello World! I've started learning Scala." Even you can just declare a variable without specifying the data type. For example, let's see how to declare a variable using var or val: var myVar = 50 val myVal = "Hello World! I've started learning Scala." There are two types of variables in Scala--mutable and immutable--that can be defined as this: Mutable: The ones whose values you can change later Immutable: The ones whose values you cannot change once they have been set In general, for declaring a mutable variable, the var keyword is used. On the other hand, the val keyword is used for specifying an immutable variable. To see an example of using the mutable and immutable variables, let's consider the following code segment: package com.chapter3.OOP object VariablesDemo { def main(args: Array[String]) { var myVar : Int = 50 valmyVal : String = "Hello World! I've started learning Scala." myVar = 90 myVal = "Hello world!" println(myVar) println(myVal) } } The preceding code works fine until myVar = 90, since myVar is a mutable variable. However, if you try to change the value of the immutable variable (that is, myVal), as shown earlier, your IDE will show a compilation error saying that reassignment to val as follows: Figure 1: Reassignment of immutable variables is not allowed in Scala variable scope Don't worry looking at the preceding code with object and method! We will discuss classes, methods, and objects later in this article, then things will get clearer. In Scala, variables can have three different scopes depending on the place where you have declared them: Fields: Fields are variables belonging to an instance of a class of your Scala code. The fields are, therefore, accessible from inside every method in the object. However, depending on the access modifiers, fields can be accessible to instances of the other classes. As discussed earlier, object fields can be mutable or immutable (based on the declaration types, using either var or val). However, they can’t be both at the same time. Method arguments: When the method is called, these variables can be used to pass the value inside a method. Method parameters are accessible only from inside the method. However, the objects that are being passed in may be accessible from the outside. It is to be noted that method parameters/arguments are always immutable, no matter what is/are the keywords specified. Local variables: These variables are declared inside a method and are accessible from the inside the method itself. However, the calling code can access the returned value. Reference versus value immutability According to the earlier section, val is used to declare immutable variables, so can we change the values of these variables? Also, will it be similar to the final keyword in Java? To help us understand more about this, we will use this code snippet: scala> var testVar = 10 testVar: Int = 10 scala> testVar = testVar + 10 testVar: Int = 20 scala> val testVal = 6 testVal: Int = 6 scala> testVal = testVal + 10 <console>:12: error: reassignment to val testVal = testVal + 10 ^ scala> If you ran the preceding code, an error at compilation time will be noticed, which will tell that you are trying to reassign to a val variable. In general, mutable variables bring a performance advantage. The reason is that this is closer to how the computer behaves and introducing immutable values forces the computer to create a whole new instance of an object whenever a change (no matter how small) to that instance is required. Data types in Scala As mentioned Scala is a JVM language, so it shares lots of commonalities with Java. One of these commonalities is the data types; Scala shares the same data types with Java. In short, Scala has all the data types as Java, with the same memory footprint and precision. Objects are almost everywhere in Scala and all data types are objects; you can call methods in them, as illustrated: Sr.No Data type and description 1 Byte: An 8 bit signed value; range is from -128 to 127 2 Short: A 16 bit signed value; range is -32768 to 32767 3 Int: A 32 bit signed value; range is -2147483648 to 2147483647 4 Long: A 64 bit signed value; range is from -9223372036854775808 to 9223372036854775807 5 Float: A 32 bit IEEE 754 single-precision float 6 Double: A 64 bit IEEE 754 double-precision float 7 Char: A 16 bit unsigned Unicode character; ranges from U+0000 to U+FFFF 8 String: A sequence of Chars 9 Boolean: Either the literal true or the literal false 10 Unit: Corresponds to no value 11 Null: Null or empty reference 12 Nothing: The subtype of every other type; it includes no values 13 Any: The supertype of any type; any object is of the Any type 14 AnyRef: The supertype of any reference type Table 1: Scala data types, description, and range All the data types listed in the preceding table are objects. However, note that there are no primitive types as in Java. This means that you can call methods on an Int, Long, and so on: val myVal = 20 //use println method to print it to the console; you will also notice that if will be inferred as Int println(myVal + 10) val myVal = 40 println(myVal * "test") Now you can start playing around with these variables. Now, let's get some ideas on how to initialize a variable and work on the type annotations. Variable initialization In Scala, it's a good practice to initialize the variables once declared. However, it is to be noted that uninitialized variables aren’t necessarily nulls (consider types such as Int, Long, Double, and Char) and initialized variables aren't necessarily non-null (for example, val s: String = null). The actual reasons are the following: In Scala, types are inferred from the assigned value. This means that a value must be assigned for the compiler to infer the type (how should the compiler consider this code: val a? Since a value isn't given, the compiler can't infer the type; since it can’t infer the type, it wouldn't know how to initialize it). In Scala, most of the times, you’ll use val since these are immutable; you wouldn’t be able to declare them and initialize them afterward. Although Scala language requires that you initialize your instance variable before using it, Scala does not provide a default value for your variable. Instead, you have to set up its value manually using the wildcard underscore, which acts like a default value, as follows: var name:String = _ Instead of using names such as val1, and val2, you can define your own names: scala> val result = 6 * 5 + 8 result: Int = 38 You can even use these names in subsequent expressions, as follows: scala> 0.5 * result res0: Double = 19.0 Type annotations If you used the val or var keyword to declare a variable, its data type will be inferred automatically according to the value that you assigned to this variable. You also have the luxury of explicitly stating the data type of the variable at declaration time: val myVal : Integer = 10 Now, let's see some other aspects that will be needed while working with the variable and data types in Scala. We will see how to work with type ascription and lazy variables. Type ascriptions Type ascription is used to tell the compiler what types you expect out of an expression, from all possible valid types. Consequently, a type is valid if it respects the existing constraints, such as variance and type declarations; it is either one of the types the expression it applies to "is a" or there's a conversion that applies in scope. So technically, java.lang.String extends java.lang.Object; therefore, any String is also an Object. Consider the following example: scala> val s = "Ahmed Shadman" s: String = Ahmed Shadman scala> val p = s:Object p: Object = Ahmed Shadman scala> Lazy val The main characteristic of a lazy val is that the bound expression is not evaluated immediately, but once on the first access. Here lies the main difference between val and lazy val. When the initial access happens, the expression is evaluated and the result bound to the identifier of the lazy val. On subsequent access, no further evaluation occurs; instead, the stored result is returned immediately. Let's look at an interesting example: scala> lazy val num = 1 / 0 num: Int = <lazy> If you see the preceding code in Scala REPL, you will note that the code runs very well without giving any error even though you divided an integer with 0! Let's look at a better example: scala> val x = {println("x"); 20} x x: Int = 20 scala> x res1: Int = 20 scala> This works, and you can access the value of the x variable when required later on. These are just a few examples of using lazy val concepts. The interested readers should access https://blog.codecentric.de/en/2016/02/lazy-vals-scala-look-hood/ for more details. Summary The structure code in a sane way with classes and traits enhance the reusability of your code with generics and create a project with standard and widespread tools. Improve on the basics to know how Scala implements the OO paradigm to allow building modular software systems. Resources for Article: Further resources on this subject: Spark for Beginners [article] Getting Started with Apache Spark [article] Spark – Architecture and First Program [article]

0
0
3295

article-image-getting-started-predictive-analytics

Packt

04 Jul 2017

32 min read

Getting Started with Predictive Analytics

Packt

04 Jul 2017

32 min read

0
0
11161

article-image-writing-your-first-cucumber-appium-test

Packt

27 Jun 2017

12 min read

Writing Your First Cucumber Appium Test

Packt

27 Jun 2017

12 min read

0
0
28397

Packt

23 Jun 2017

12 min read

Spatial Data

Packt

23 Jun 2017

12 min read

0
0
1952

Packt

23 Jun 2017

9 min read

Working with Basic Elements – Threads and Runnables

Packt

23 Jun 2017

9 min read

In this article by Javier Fernández González, the author of the book, Mastering Concurrency Programming with Java 9 - Second Edition, we will see the execution threads are the core of concurrent applications. When you implement a concurrent application, no matter the language, you have to create different execution threads that run in parallel in a non-deterministic order unless you use a synchronization element (such as a semaphore). In Java you can create execution threads in two ways: Extending the Thread class Implementing the Runnable interface In this article, you will learn how to use these elements to implement concurrent applications in Java. (For more resources related to this topic, see here.) Introduction Nowadays, computer users (and mobile and tablet users too) use different applications at the same time when they work with their computers. They can be writing a document with the word processor while they’re reading the news or posting on the social network and listening to music. They can do all these things at the same time because modern operating systems supports multiprocessing. They can execute different tasks at the same time. But inside an application, you can also do different things at the same time. For example, if you’re working with your word processor, you can save the file while you’re putting a text with bold style. You can do this because the modern programming languages that are used to write those applications allow programmers to create multiple execution threads inside an application. Each execution thread executes a different task, so you can do different things at the same time. Java implements execution threads using the Thread class. You can create an execution thread in your application using the following mechanisms: You can extend the Thread class and override the run() method You can implement the Runnable interface and pass an object of that class to the constructor of a Thread object In both the cases, you will have a Thread object, but the second approach is recommended over the first one. Its main advantages are: Runnable is an interface: You can implement other interfaces and extend other class. With the Thread class you can only extend that class. Runnable objects can not only be executed with threads but also in other Java concurrency objects as executors. This gives you more flexibility to change your concurrent applications. You can use the same Runnable object with different threads. Once you have a Thread object, you must use the start() method to create a new execution thread and execute the run() method of Thread. If you call the run() method directly, you will be calling a normal Java method and no new execution thread will be created. Let’s see the most important characteristics of threads in the Java programming language. Threads in Java: characteristics and states The first thing we have to say about threads in Java is that all Java programs, concurrent or not, have one Thread called the main thread. As you may know, a Java SE program starts its execution with the main() method. When you execute that program, the Java Virtual Machine (JVM) creates a new Thread and executes the main() method in that thread. This is the unique thread in the non-concurrent applications and the first one in the concurrent ones. In Java, as with other programming languages, threads share all the resources of the application, including memory and open files. This is a powerful tool because they can share information in a fast and easy way, but it must be done using adequate synchronization elements to avoid data race conditions. All the threads in Java have a priority. It’s an integer value that can be between the Thread.MIN_PRIORITY and Thread.MAX_PRIORITY values (actually, their values are 1 and 10). By default, all the threads are created with the priority, Thread.NORM_PRIORITY (actually, its value is 5). You can use the setPriority() method to change the priority of a Thread (it can throw a SecurityException exception if you are not allowed to do that operation) and the getPriority() method to get the priority of a Thread. This priority is a hint to the Java Virtual Machine and to the underlying operating system about the preference between the threads, but it’s not a contract. There’s no guarantee about the order of execution of the threads. Normally, threads with a higher priority will be executed before the threads with lower priority but, as I told you before, there’s no guarantee about this. You can create two kind of threads in Java: Daemon threads Non-daemon threads The difference between them is how they affect the end of a program. A Java program ends its execution when one of the following circumstances occurs: The program executes the exit() method of the Runtime class and the user has authorization to execute that method All the non-daemon threads of the application have ended its execution, no matter if there are daemon threads running. With these characteristics, daemon threads are usually used to execute auxiliary tasks in the applications as garbage collectors or cache managers. You can use the isDaemon() method to check whether a thread is a daemon thread or not, and you can use the setDaemon() method to establish a thread as a daemon one. Take into account that you must call this method before the thread starts its execution with the start() method. Finally, threads can pass through different states depending on the situation. All the possible states are defined in the Thread.States class and you can use the getState() method to get the status of a Thread. Obviously, you can change the status of the thread directly. These are the possible statuses of a thread: NEW: Thread has been created but it hasn’t started its execution yet RUNNABLE: Thread is running in the Java Virtual Machine BLOCKED: Thread is waiting for a lock WAITING: Thread is waiting for the action of the other thread TIME_WAITING: Thread is waiting for the action of the other thread, but this waiting has a time limit THREAD: Thread has finished its execution Now that we know the most important characteristics of threads in the Java programming language, let’s see the most important methods of the Runnable interface and the Thread class. The Thread class and the Runnable interface As we mentioned before, you can create a new execution thread using one of the following two mechanisms: Extend the Thread class and override its run() method. Implement the Runnable interface and pass an instance of that object to the constructor of a Thread object. Java good practices recommend the utilization of the second approach over the first one, and that will be the approach we will use in this article and in the whole book. The Runnable interface only defines one method: the run() method. This is the main method of every thread. When you start a new execution of the start() method, it will call the run() method (of the Thread class or of the Runnable object passed as a parameter in the constructor of the Thread class). The Thread class, on the contrary, has a lot of different methods. It has a run() method that you must override if you implement your thread extending the Thread class and the start() method that you must call to create a new execution thread. These are the other interesting methods of the Thread class: Methods to get and set information of a Thread: getId(): This method returns the identifier of the Thread. The thread identifier is a positive integer number, assigned when a thread is created. It is unique during its lifetime and it can't be changed. getName()/setName(): This method allows you to get or set the name of the Thread. This name is a String that can also be established in the constructor of the Thread class. getPriority()/setPriority(): You can use these methods to obtain and establish the priority of the Thread class. We explained before in this article how Java manages the priority of its threads. isDaemon()/setDaemon(): This method allows you to obtain and establish the condition of a daemon of the Thread. We have explained how this condition works before. getState(): This method returns the state of the Thread. We explained before all the possible states of a Thread. interrupt()/interrupted()/isInterrupted(): The first method is used to indicate to Thread that you're are requesting the end of its execution. The other two methods can be used to check the interrupt status. The main difference between those methods is that one clears the value of the interrupted flag when it's called and the other one does not. A call to the interrupt() method doesn't end the execution of a Thread. It is the responsibility of the Thread to check the status of that flag and respond accordingly. sleep(): This method allows you to suspend the execution of the Thread for a period of time. It receives a long value, that is, the number of milliseconds that you want to suspend the execution of the Thread for. join(): This method suspends the execution of the thread that makes the call until the end of the execution of the Thread that is used to call the method. You can use this method to wait for the finalization of other Thread. setUncaughtExceptionHandler(): This method is used to establish the controller of the unchecked exceptions that can occur while you're executing the threads. currentThread(): This is a static method of the Thread class that returns the Thread object that is actually executing this code. Summary In this article, you learned the threads in Java and how the Thread class and the Runnable interface work. Resources for Article: Further resources on this subject: Thread synchronization and communication [article] Multithreading with Qt [article] Concurrency and Parallelism with Swift 2 [article]

0
0
1762

article-image-use-real-world-application

Packt

23 Jun 2017

11 min read

Use in Real World Application

Packt

23 Jun 2017

11 min read

In this article by Giorgio Zarrelli author of the book Mastering Bash, we are moving a step now into the real world creating something that can turn out handy for your daily routine and during this process we will have a look at the common pitfalls in coding and how to make our script reliable. Be short or a long script, we must always ask ourselves the same questions: What do we really want to accomplish? How much time do we have? Do we have all the resources needed? Do we have the knowledge required for the task? (For more resources related to this topic, see here.) We will start coding with a Nagios plugin which will give us a broad understanding of how this monitoring system is and how to make a script dynamically interact with other programs. What is Nagios Nagios is one of the most widely adopted Open Source IT infrastructure monitoring tool whose main interesting feature being the fact that it does not know how to monitor anything. Well, it can sound like a joke but actually Nagios can be defined as an evaluating core which takes some informations as input and reacting accordingly. How this information is gathered? It is not the main concern of this tool and this leads us to an interesting point: Nagios leave sthe task of getting the monitored data to an external plugin which: Knows how to connect to the monitored services Knows how to collect the data from the monitored services Knows how to evaluate the data Inform Nagios if the values gathered are beyond or in the boundaries to raise an alarm. So, a plugin does a lot of things and one would ask himself what does Nagios do then? Imagine it as an exchange pod where information is flowing in and out and decisions are taken based on the configurations set; the core triggers the plugin to monitor a service, the plugin itself returns some information and Nagios takes a decision about: If to raise an alarm Send a notification Whom to notify For how long Which, if any action is taken in order to get back into normality The core Nagios program does everything except actually knock at the door of a service, ask for information and decide if this information shows some issues or not. Planning must be done, but it can be fun. Active and passive checks To understand how to code a plugin we have first to grasp how, on a broad scale, a Nagios check works. There are two different kinds of checks: Active check Based on a time range, or manually triggered, an active check sees a plugin actively connecting to a service and collecting informations. A typical example could be a plugin to check the disk space: once invoked it interfaces with (usually) the operating system, execute a df command, works on the output, extracts the value related to the disk space, evaluates it against some thresholds and report back a status, like OK , WARNING , CRITICAL or UNKNOWN. Passive check In this case, Nagios does not trigger anything but waits to be contacted by some means by the service which must be monitored. It seems quite confusing but let’s make a real life example. How would you monitor if a disk backup has been completed successfully? One quick answer would be: knowing when the backup task starts and how long it lasts, we can define a time and invoke a script to check the task at that given hour. Nice, but when we plan something we must have a full understanding of how real life goes and a backup is not our little pet in the living room, it’s rather a beast which does what it wants. A backup can last a variable amount of time depending on unpredictable factor. For instance, your typical backup task would copy 1 TB of data in 2 hours, starting at 03:00, out of a 6 TB disk. So, the next backup task would start at 03:00+02:00=05:00 AM, give or take some minutes, and you setup an active check for it at 05:30 and it works well for a couple of months. Then, one early morning your receive a notification on your smartphone, the backup is in CRITICAL. You wake up, connect to the backup console and see that at 06:00 in the morning you are asleep and the backup task has not even been started by the console. Then you have to wait until 08:00 AM until some of your colleagues shows up at the office to find out that the day before the disk you backup has been filled with 2 extra TB of data due to an unscheduled data transfer. So, the backup task preceding the one you are monitoring lasted not for a couple of hours but 6 hours, and the task you are monitoring then started at 09:30 AM. Long story short, your active check has been fired up too early, that is why it failed. Maybe your are tempted to move your schedule some hours ahead, but simply do not do it, these time slots are not sliding frames. If you move your check ahead you should then move all the checks for the subsequent tasks ahead. You do it in one week, the project manager will ask someone to delete the 2 TB in excess (they are no more of any use for the project), and your schedules will be 2 hours ahead making your monitoring useless. So, as we insisted before, planning and analyzing the context are the key factors in making a good script and, in this case, a good plugin. We have a service that does not run 24/7 like a web service or a mail service, what is specific to the backup is that it is run periodically but we do not know exactly when. The best approach to this kind of monitoring is letting the service itself to notify us when it finished its task and what was its outcome. That is usually accomplished using the ability of most of the backup programs to send a Simple Network Monitoring Protocol (SNMP) trap to a destination to inform it of the outcome and for our case it would be the Nagios server which would have been configured to receive the trap and analyze. Add to this an event horizon so that if you do not receive that specific trap in, let’s say, 24 hours we raise an alarm anyway and you are covered: whenever the backup task gets completed, or when it times out, we receive a notification. Nagios notifications flowchart Return codes and thresholds Before coding a plugin we must face some concepts that will be the stepping stone of our Nagios coding in base, one of these being the return codes of the plugin itself. As we already discussed, once the plugin collects the data about how the service is going, it evaluates these data and determines if the situation falls under one of the following status: Return code Status Description 0 OK The plugin checked the service and the results are inside the acceptable range 1 WARNING The plugin checked the service and the results are above a warning threshold. We must keep an eye on the service 2 CRITICAL The plugin checked the service and the results are above a CRITICAL threshold or the service not responding. We must react now. 3 UNKNOWN Either we passed the wrong arguments to the plugin or there is some internal error in it. So, our plugin will check a service, evaluate the results, and based on a threshold will return to Nagios one of the values listed in the tables and a meaningful message like we can see in the description column in the following image: Notice the service check in red and the message in the figure above. In the image we can see that some checks are green, meaning ok, and they have an explicative message in the description section: what we see in this section is the output of the plugin written in the stdout and it is what we will craft as a response to Nagios. Pay attention at the ssh check, it is red, it is failing because it is checking the service at the default port which is 22 but on this server the ssh daemon is listening on a different port. This leads us to a consideration: our plugin will need a command line parser able to receive some configuration options and some threshold limits as well because we need to know what to check, where to check and what are the acceptable working limits for a service: Where: In Nagios there can be a host without service checks (except for the implicit host alive carried on by a ping), but no services without a host to be performed onto. So any plugin must receive on the command line the indication of the host to be run against, be it a dummy host but there must be one. How: This is where our coding comes in, we will have to write the lines of code that instruct the plugin how to connect to the server, query, collect and parse the answer. What: We must instruct the plugin, usually with some meaningful options on the command line, on what are the acceptable working limits so that it can evaluate them and decide if notify an OK, WARNING or CRITICAL message. That is all for our script, who to notify, when, how, for how many times and so forth. These are tasks carried on by the core, a Nagios plugin is unaware of all of this. What he really must know for an effective monitoring is what are the correct values that identify a working service. We can pass to our script two different kinds of value: Range:A series of numeric values with a starting and ending point, like from 3 to 7 or from one number to infinite. Threshold: It is a range with an associated alert level. So, when our plugins perform its check, it collects a numeric value that is within or outside a range, based on the threshold we impose then, based on the evaluation it replies to Nagios with a return code and a message. How do we specify some ranges on the command line? Essentially in the following way: [@] start_value:end_value If the range starts from 0, the part from : to the left can be omitted. The start_value must always be a lower number than end_value. If the range starts as start_value, it means from that number to infinity. Negative infinity can be specified using ~ Alert is generated when the collected value resides outside the range specified, comprised of the endpoints. If @ is specified, the alert is generated if the value resides inside the range. Let's see some practical example on how we would call our script imposing some thresholds: Plugin call Meaning ./my_plugin -c 10 CRITICAL if less than 0 or higher than 10 ./my_plugin -w 10:20 WARNING if less than 10 or higher than 20 /my_plugin -w ~:15 -c 16 WARNING if between -infinite and 15, critical from 16 and higher ./my_plugin -c 35: CRITICAL if the value collected is below 35 ./my_plugin -w @100:200 CRITICAL if the value is from 100 to 200, OK otherwise We covered the basic requirements for our plugin that in its simplest form should be called with the following syntax: ./my_plugin -h hostaddress|hostname -w value -c value We already talked about the need to relate a check to a host and we can do this either using a host name or a host address. It is up to us what to use but we will not fill in this piece of information because it will be drawn by the service configuration as a standard macro. We just introduced a new concept, service configuration, which is essential in making our script work in Nagios. Summary In this article, we learned that how the real world can turn out handy for your daily routine and during this process we also looked at the common pitfalls in coding and how we make our script reliable. Resources for Article: Further resources on this subject: From Code to the Real World [article] Getting Ready to Launch Your PhoneGap App in the Real World [article] Implementing a WCF Service in the Real World [article]

0
0
2613

article-image-getting-started-python-and-machine-learning

Packt

23 Jun 2017

34 min read

Getting Started with Python and Machine Learning

Packt

23 Jun 2017

34 min read

In this article by Ivan Idris, Yuxi (Hayden) Liu, and Shoahoa Zhang author of the book Python Machine Learning By Example we cover basic machine learning concepts. If you need more information, then your local library, the Internet, and Wikipedia in particular should help you further. The topics that will be covered in this article are as follows: (For more resources related to this topic, see here.) What is machine learning and why do we need it? A very high level overview of machine learning Generalizing with data Overfitting and the bias variance trade off Dimensions and features Preprocessing, exploration, and feature engineering Combining models Installing software and setting up Troubleshooting and asking for help What is machine learning and why do we need it? Machine learning is a term coined around 1960 composed of two words – machine corresponding to a computer, robot, or other device, and learning an activity, which most humans are good at. So we want machines to learn, but why not leave learning to the humans? It turns out that there are many problems involving huge datasets, for instance, where it makes sense to let computers do all the work. In general, of course, computers and robots don't get tired, don't have to sleep, and may be cheaper. There is also an emerging school of thought called active learning or human-in-the-loop, which advocates combining the efforts of machine learners and humans. The idea is that there are routine boring tasks more suitable for computers, and creative tasks more suitable for humans. According to this philosophy machines are able to learn, but not everything. Machine learning doesn't involve the traditional type of programming that uses business rules. A popular myth says that the majority of the code in the world has to do with simple rules possibly programmed in Cobol, which covers the bulk of all the possible scenarios of client interactions. So why can't we just hire many coders and continue programming new rules? One reason is the cost of defining and maintaining rules becomes very expensive over time. A lot of the rules involve matching strings or numbers, and change frequently. It's much easier and more efficient to let computers figure out everything themselves from data. Also the amount of data is increasing, and actually the rate of growth is itself accelerating. Nowadays the floods of textual, audio, image, and video data are hard to fathom. The Internet of Things is a recent development of a new kind of Internet, which interconnects everyday devices. The Internet of Things will bring data from household appliances and autonomous cars to the forefront. The average company these days has mostly human clients, but for instance social media companies tend to have many bot accounts. This trend is likely to continue and we will have more machines talking to each other. An application of machine learning that you may be familiar is the spam filter, which filters e-mails considered to be spam. Another is online advertising where ads are served automatically based on information advertisers have collected about us. Yet another application of machine learning is search engines. Search engines extract information about web pages, so that you can efficiently search the Web. Many online shops and retail companies have recommender engines, which recommend products and services using machine learning. The list of applications is very long and also includes fraud detection, medical diagnosis, sentiment analysis, and financial market analysis. In the 1983, War Games movie a computer made life and death decisions, which could have resulted in Word War III. As far as I know technology wasn't able to pull off such feats at the time. However, in 1997 the Deep Blue supercomputer did manage to beat a world chess champion. In 2005, a Stanford self-driving car drove by itself for more than 130 kilometers in a desert. In 2007, the car of another team drove through regular traffic for more than 50 kilometers. In 2011, the Watson computer won a quiz against human opponents. In 2016 the AlphaGo program beat one of the best Go players in the world. If we assume that computer hardware is the limiting factor, then we can try to extrapolate into the future. Ray Kurzweil did just that and according to him, we can expect human level intelligence around 2029. A very high level overview of machine learning Machine learning is a subfield of artificial intelligence—a field of computer science concerned with creating systems, which mimic human intelligence. Software engineering is another field of computer science, and we can label Python programming as a type of software engineering. Machine learning is also closely related to linear algebra, probability theory, statistics, and mathematical optimization. We use optimization and statistics to find the best models, which explain our data. If you are not feeling confident about your mathematical knowledge, you probably are wondering, how much time you should spend learning or brushing up. Fortunately to get machine learning to work for you, most of the time you can ignore the mathematical details as they are implemented by reliable Python libraries. You do need to be able to program. As far as I know if you want to study machine learning, you can enroll into computer science, artificial intelligence, and more recently, data science masters. There are various data science bootcamps; however, the selection for those is stricter and the course duration is often just a couple of months. You can also opt for the free massively online courses available on the Internet. Machine learning is not only a skill, but also a bit of sport. You can compete in several machine learning competitions; sometimes for decent cash prizes. However, to win these competitions, you may need to utilize techniques, which are only useful in the context of competitions and not in the context of trying to solve a business problem. A machine learning system requires inputs—this can be numerical, textual, visual, or audiovisual data. The system usually has outputs, for instance, a floating point number, an integer representing a category (also called a class), or the acceleration of a self-driving car. We need to define (or have it defined for us by an algorithm) an evaluation function called loss or cost function, which tells us how well we are learning. In this setup, we create an optimization problem for us with the goal of learning in the most efficient way. For instance, if we fit data to a line also called linear regression, the goal is to have our line be as close as possible to the data points on average. We can have unlabeled data, which we want to group or explore – this is called unsupervised learning. Unsupervised learning can be used to detect anomalies, such as fraud or defective equipment. We can also have labeled examples to use for training—this is called supervised learning. The labels are usually provided by human experts, but if the problem is not too hard, they also may be produced by any members of the public through crowd sourcing for instance. Supervised learning is very common and we can further subdivide it in regression and classification. Regression trains on continuous target variables, while classification attempts to find the appropriate class label. If not all examples are labeled, but still some are we have semi supervised learning. A chess playing program usually applies reinforcement learning—this is a type of learning where the program evaluates its own performance by, for instance playing against itself or earlier versions of itself. We can roughly categorize machine learning algorithms in logic-based learning, statistical learning, artificial neural networks, and genetic algorithms. In fact, we have a whole zoo of algorithms with popularity varying over time. The logic-based systems were the first to be dominant. They used basic rules specified by human experts, and with these rules systems tried to reason using formal logic. In the mid-1980s, artificial neural networks came to the foreground, to be then pushed aside by statistical learning systems in the 1990s. Artificial neural networks imitate animal brains, and consist of interconnected neurons that are also an imitation of biological neurons. Genetic algorithms were pretty popular in the 1990s (or at least that was my impression). Genetic algorithms mimic the biological process of evolution. We are currently (2016) seeing a revolution in deep learning, which we may consider to be a rebranding of neural networks. The term deep learning was coined around 2006, and it refers to deep neural networks with many layers. The breakthrough in deep learning is amongst others caused by the availability of graphics processing units (GPU), which speed up computation considerably. GPUs were originally intended to render video games, and are very good in parallel matrix and vector algebra. It is believed that deep learning resembles, the way humans learn, therefore it may be able to deliver on the promise of sentient machines. You may have heard of Moore's law—an empirical law, which claims that computer hardware improves exponentially with time. The law was first formulated by Gordon Moore, the co-founder of Intel, in 1965. According to the law the number of transistors on a chip should double every two years. In the following graph, you can see that the law holds up nicely (the size of the bubbles corresponds to the average transistor count in GPUs): The consensus seems to be that Moore's law should continue to be valid for a couple of decades. This gives some credibility to Ray Kurzweil's predictions of achieving true machine intelligence in 2029. We will encounter many of the types of machine learning later in this book. Most of the content is about supervised learning with some examples of unsupervised learning. The popular Python libraries support all these types of learning. Generalizing with data The good thing about data is that we have a lot of data in the world. The bad thing is that it is hard to process this data. The challenges stem from the diversity and noisiness of the data. We as humans, usually process data coming in our ears and eyes. These inputs are transformed into electrical or chemical signals. On a very basic level computers and robots also work with electrical signals. These electrical signals are then translated into ones and zeroes. However, we program in Python in this book, and on that level normally we represent the data either as numbers, images, or text. Actually images and text are not very convenient, so we need to transform images and text into numerical values. Especially in the context of supervised learning we have a scenario similar to studying for an exam. We have a set of practice questions and the actual exam. We should be able to answer exam questions without knowing the answers for them. This is called generalization – we learn something from our practice questions and hopefully are able to apply this knowledge to other similar questions. Finding good representative examples is not always an easy task depending on the complexity of the problem we are trying to solve and how generic the solution needs to be. An old-fashioned programmer would talk to a business analyst or other expert, and then implement a rule that adds a certain value multiplied by another value corresponding for instance to tax rules. In a machine learning setup we can give the computer example input values and example output values. Or if we are more ambitious, we can feed the program the actual tax texts and let the machine process the data further just like an autonomous car doesn't need a lot of human input. This means implicitly that there is some function, for instance, a tax formula that we are trying to find. In physics we have almost the same situation. We want to know how the universe works, and formulate laws in a mathematical language. Since we don't know the actual function all we can do is measure what our error is, and try to minimize it. In supervised learning we compare our results against the expected values. In unsupervised learning we measure our success with related metrics. For instance, we want clusters of data to be well defined. In reinforcement learning a program evaluates its moves, for example, in a chess game using some predefined function. Overfitting and the bias variance trade off Overfitting, (one word) is such an important concept, that I decided to start discussing it very early in this book. If you go through many practice questions for an exam, you may start to find ways to answer questions, which have nothing to do with the subject material. For instance, you may find that if you have the word potato in the question, the answer is A, even if the subject has nothing to do with potatoes. Or even worse, you may memorize the answers for each question verbatim. We can then score high on the practice questions, which are called the train set in machine learning. However, we will score very low on the exam questions, which are called the test set in machine learning. This phenomenon of memorization is called bias. Overfitting almost always means that we are trying too hard to extract information from the data (random quirks), and using more training data will not help. The opposite scenario is called underfitting. When we underfit we don't have a very good result on the train set, but also not a very good result on the test set. Underfitting may indicate that we are not using enough data, or that we should try harder to do something useful with the data. Obviously we want to avoid both scenarios. Machine learning errors are the sum of bias and variance. Variance measures how much the error varies. Imagine that we are trying to decide what to wear based on the weather outside. If you have grown up in a country with a tropical climate, you may be biased towards always wearing summer clothes. If you lived in a cold country, you may decide to go to a beach in Bali wearing winter clothes. Both decisions have high bias. You may also just wear random clothes that are not appropriate for the weather outside – this is an outcome with high variance. It turns out that when we try to minimize bias or variance, we tend to increase the other – a concept known as the bias variance tradeoff. The expected error is given by the following equation: The last term is the irreducible error. Cross-validation Cross-validation is a technique, which helps to avoid overfitting. Just like we would have a separation of practice questions and exam questions, we also have training sets and test sets. It's important to keep the data separated and only use the test set in the final stage. There are many cross-validation schemes in use. The most basic setup splits the data given a specified percentage – for instance 75 % train data and 25 % test set. We can also leave out a fixed number of observations in multiple rounds, so that these items are in the test set, and the rest of the data is in the train set. For instance, we can apply leave-one-outcross-validation (LOOCV) and let each datum be in the test set once. For a large dataset LOOCV requires as many rounds as there are data items, and can therefore be too slow. The k-fold cross-validation performs better than LOOCV and randomly splits the data into k (a positive integer) folds. One of the folds becomes the test set, and the rest of the data becomes the train set. We repeat this process k times with each fold being the designated test set once. Finally, we average the k train and test errors for the purpose of evaluation. Common values for k are five and ten. The following table illustrates the setup for five folds: Iteration Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 1 Test Train Train Train Train 2 Train Test Train Train Train 3 Train Train Test Train Train 4 Train Train Train Test Train 5 Train Train Train Train Test We can also randomly split the data into train and test set multiple times. The problem with this algorithm is that some items may never end in the test set, while others may be selected in the test set multiple times. The nested cross-validation is a combination of cross-validations. Nested cross-validation consists of the following cross-validations: The inner cross-validation does optimization to find the best fit, and can be implemented as k-fold cross validation. The outer cross-validation is used to evaluate performance and do statistical analysis. Regularization Regularization like cross-validation is a general technique to fight overfitting. Regularization adds extra parameters to the error function we are trying to minimize, in order to penalize complex models. For instance, if we are trying to fit a curve to a high order polynomial, we may use regularization to reduce the influence of the higher degrees, thereby effectively reducing the order of the polynomial. Simpler methods are to be preferred, at least according to the principle of Occam's razor. William Occam was a monk and philosopher who around 1320 came up with the idea that the simplest hypothesis that fits data should be preferred. One justification is that you can invent fewer simple models than complex models. For instance, intuitively we know that there are more higher polynomial models than linear ones. The reason is that a line (f(x) = ax + b) is governed by only two numbers – the intercept b and slope a. The possible coefficients for a line span two-dimensional space. A quadratic polynomial adds an extra coefficient for the quadratic term, and we can span a three-dimensional space with the coefficients. Therefore it is less likely that we find a linear model than more complex models, because the search space for linear models is much smaller (although it is infinite). And of course simpler models are just easier to use and require less computation time. We can also stop a program early as a form of regularization. If we give a machine learner less time it is more likely to produce a simpler model, and we hope less likely to overfit. Of course, we have to do this in an ordered fashion, so this means that the algorithm should be aware of the possibility of early termination. Dimensions and features We typically represent the data as a grid of numbers (a matrix). Each column represents a variable, which we call a feature in machine learning. In supervised learning one of the variables is actually not a feature, but the label that we are trying to predict. And in supervised learning each row is an example that we can use for training or testing. The number of features corresponds to the dimensionality of the data. Our machine learning approach depends on the number of dimensions versus the number of examples. For instance, text and image data are very high dimensional, while stock market data has relatively fewer dimensions. Fitting high dimensional data is computationally expensive, and is also prone to overfitting due to the high complexity. Higher dimensions are also impossible to visualize, and therefore we can't use simple diagnostic methods. Not all the features are useful and they may only add randomness to our results. It is therefore often important to do good feature selection. Feature selection is a form of dimensionality reduction. Some machine learning algorithms are actually able to automatically perform feature selection. We should be careful not to omit features that do contain information. Some practitioners evaluate the correlation of a single feature and the target variable. In my opinion you shouldn't do that, because correlation of one variable with the target in itself doesn't tell you much, instead use an off-the-shelf algorithm and evaluate the results of different feature subsets. In principle, feature selection boils down to multiple binary decisions whether to include a feature or not. For n features we get 2n feature sets, which can be a very large number for a large number of features. For example, for 10 features we have 1024 possible feature sets (for instance if we are deciding what clothes to wear the features can be temperature, rain, the weather forecast, where we are going, and so on). At a certain point brute force evaluation becomes infeasible. We will discuss better methods in this book. Basically we have two options: we either start with all the features, and remove features iteratively or we start with a minimum set of features and add features iteratively. We then take the best feature sets for each iteration, and then compare them. Another common dimensionality reduction approach is to transform high-dimensional data in lower-dimensional space. This transformation leads to information loss, but we can keep the loss to a minimum. We will cover this in more detail later on. Preprocessing, exploration, and feature engineering Data mining, a buzzword in the 1990 is the predecessor of data science (the science of data). One of the methodologies popular in the data mining community is called cross industry standard process for data mining (CRISP DM). CRISP DM was created in 1996, and is still used today. I am not endorsing CRISP DM, however I like its general framework. The CRISP DM consists of the following phases, which are not mutually exclusive and can occur in parallel: Business understanding: This phase is often taken care of by specialized domain experts. Usually we have a business person formulate a business problem, such as selling more units of a certain product. Data understanding: This is also a phase, which may require input from domain experts, however, often a technical specialist needs to get involved more than in the business understanding phase. The domain expert may be proficient with spreadsheet programs, but have trouble with complicated data. In this book I usually call this phase exploration. Data preparation: This is also a phase where a domain expert with only Excel knowhow may not be able to help you. This is the phase where we create our training and test datasets. In this book I usually call this phase preprocessing. Modeling: This is the phase, which most people associate with machine learning. In this phase we formulate a model, and fit our data. Evaluation: In this phase, we evaluate our model, and our data to check whether we were able to solve our business problem. Deployment: This phase usually involves setting up the system in a production environment (it is considered good practice to have a separate production system). Typically this is done a specialized team. When we learn, we require high quality learning material. We can't learn from gibberish, so we automatically ignore anything that doesn't make sense. A machine learning system isn't able to recognize gibberish, so we need to help it by cleaning the input data. It is often claimed that cleaning the data forms a large part of machine learning. Sometimes cleaning is already done for us, but you shouldn't count on it. To decide how to clean the data, we need to be familiar with the data. There are some projects, which try to automatically explore the data, and do something intelligent, like producing a report. For now unfortunately we don't have a solid solution, so you need to do some manual work. We can do two things, which are not mutually exclusive – first scan the data and second visualizing the data. This also depends on the type of data we are dealing with whether we have a grid of numbers, images, audio, text, or something else. At the end, a grid of numbers is the most convenient form, and we will always work towards having numerical features. We want to know if features miss values, how the values are distributed and what type of features we have. Values can approximately follow a Normal distribution, a Binomial distribution, a Poisson distribution or another distribution altogether. Features can be binary – either yes or no, positive or negative, and so on. They can also be categorical – pertaining to a category, for instance continents (Africa, Asia, Europe, Latin America, North America, and so on).Categorical variables can also be ordered – for instance high, medium, and low. Features can also be quantitative – for example temperature in degrees or price in dollars. Feature engineering is the process of creating or improving features. It's more of a dark art than a science. Features are often created based on common sense, domain knowledge or prior experience. There are certain common techniques for feature creation, however there is no guarantee that creating new features will improve your results. We are sometimes able to use the clusters found by unsupervised learning as extra features. Deep neural networks are often able to create features automatically. Missing values Quite often we miss values for certain features. This could happen for various reasons. It can be inconvenient, expensive or even impossible to always have a value. Maybe we were not able to measure a certain quantity in the past, because we didn't have the right equipment, or we just didn't know that the feature is relevant. However, we are stuck with missing values from the past. Sometimes it's easy to figure out that we miss values and we can discover this just by scanning the data, or counting the number of values we have for a feature and comparing to the number of values we expect based on the number of rows. Certain systems encode missing values with for example values as 999999. This makes sense if the valid values are much smaller than 999999. If you are lucky you will have information about the features provided by whoever created the data in the form of a data dictionary or metadata. Once we know that we miss values the question arises of how to deal with them. The simplest answer is to just ignore them. However, some algorithms can't deal with missing values, and the program will just refuse to continue. In other circumstances ignoring missing values will lead to inaccurate results. The second solution is to substitute missing values by a fixed value – this is called imputing. We can impute the arithmetic mean, median or mode of the valid values of a certain feature. Ideally, we will have a relation between features or within a variable that is somewhat reliable. For instance, we may know the seasonal averages of temperature for a certain location and be able to impute guesses for missing temperature values given a date. Label encoding Humans are able to deal with various types of values. Machine learning algorithms with some exceptions need numerical values. If we offer a string such as Ivan unless we are using specialized software the program will not know what to do. In this example we are dealing with a categorical feature – names probably. We can consider each unique value to be a label. (In this particular example we also need to decide what to do with the case – is Ivan the same as ivan). We can then replace each label by an integer – label encoding. This approach can be problematic, because the learner may conclude that there is an ordering. One hot encoding The one-of-K or one-hot-encoding scheme uses dummy variables to encode categorical features. Originally it was applied to digital circuits. The dummy variables have binary values like bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents we will have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables, as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents: Is_africa Is_asia Is_europe Is_south_america Is_north_america Africa True False False False False Asia False True False False False Europe False False True False False South America False False False True False North America False False False False True Other False False False False False The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the SciPy package, and shouldn't be an issue. We will discuss the SciPy package later in this article. Scaling Values of different features can differ by orders of magnitude. Sometimes this may mean that the larger values dominate the smaller values. This depends on the algorithm we are using. For certain algorithms to work properly we are required to scale the data. There are several common strategies that we can apply: Standardization removes the mean of a feature and divides by the standard deviation. If the feature values are normally distributed, we will get a Gaussian, which is centered around zero with a variance of one. If the feature values are not normally distributed, we can remove the median and divide by the interquartile range. The interquartile range is a range between the first and third quartile (or 25th and 75th percentile). Scaling features to a range is a common choice of range which is a range between zero and one. Polynomial features If we have two features a and b, we can suspect that there is a polynomial relation, such as a2 + ab + b2. We can consider each term in the sum to be a feature – in this example we have three features. The product ab in the middle is called an interaction. An interaction doesn't have to be a product, although this is the most common choice, it can also be a sum, a difference or a ratio. If we are using a ratio to avoid dividing by zero, we should add a small constant to the divisor and dividend. The number of features and the order of the polynomial for a polynomial relation are not limited. However, if we follow Occam's razor we should avoid higher order polynomials and interactions of many features. In practice, complex polynomial relations tend to be more difficult to compute and not add much value, but if you really need better results they may be worth considering. Power transformations Power transforms are functions that we can use to transform numerical features into a more convenient form, for instance to conform better to a normal distribution. A very common transform for values, which vary by orders of magnitude, is to take the logarithm. Taking the logarithm of a zero and negative values is not defined, so we may need to add a constant to all the values of the related feature before taking the logarithm. We can also take the square root for positive values, square the values or compute any other power we like. Another useful transform is the Box-Cox transform named after its creators. The Box-Cox transform attempts to find the best power need to transform the original data into data that is closer to the normal distribution. The transform is defined as follows: Binning Sometimes it's useful to separate feature values into several bins. For example, we may be only interested whether it rained on a particular day. Given the precipitation values we can binarize the values, so that we get a true value if the precipitation value is not zero, and a false value otherwise. We can also use statistics to divide values into high, low, and medium bins. The binning process inevitably leads to loss of information. However, depending on your goals this may not be an issue, and actually reduce the chance of overfitting. Certainly there will be improvements in speed and memory or storage requirements. Combining models In (high) school we sit together with other students, and learn together, but we are not supposed to work together during the exam. The reason is, of course, that teachers want to know what we have learned, and if we just copy exam answers from friends, we may have not learned anything. Later in life we discover that teamwork is important. For example, this book is the product of a whole team, or possibly a group of teams. Clearly a team can produce better results than a single person. However, this goes against Occam's razor, since a single person can come up with simpler theories compared to what a team will produce. In machine learning we nevertheless prefer to have our models cooperate with the following schemes: Bagging Boosting Stacking Blending Voting and averaging Bagging Bootstrap aggregating or bagging is an algorithm introduced by Leo Breiman in 1994, which applies bootstrapping to machine learning problems. Bootstrapping is a statistical procedure, which creates datasets from existing data by sampling with replacement. Bootstrapping can be used to analyze the possible values that arithmetic mean, variance or other quantity can assume. The algorithm aims to reduce the chance of overfitting with the following steps: We generate new training sets from input train data by sampling with replacement. Fit models to each generated training set. Combine the results of the models by averaging or majority voting. Boosting In the context of supervised learning we define weak learners as learners that are just a little better than a baseline such as randomly assigning classes or average values. Although weak learners are weak individually like ants, together they can do amazing things just like ants can. It makes sense to take into account the strength of each individual learner using weights. This general idea is called boosting. There are many boosting algorithms; boosting algorithms differ mostly in their weighting scheme. If you have studied for an exam, you may have applied a similar technique by identifying the type of practice questions you had trouble with and focusing on the hard problems. Face detection in images is based on a specialized framework, which also uses boosting. Detecting faces in images or videos is a supervised learning. We give the learner examples of regions containing faces. There is an imbalance, since we usually have far more regions (about ten thousand times more) that don't have faces. A cascade of classifiers progressively filters out negative image areas stage by stage. In each progressive stage the classifiers use progressively more features on fewer image windows. The idea is to spend the most time on image patches, which contain faces. In this context boosting is used to select features and combine results. Stacking Stacking takes the outputs of machine learning estimators and then uses those as inputs for another algorithm. You can, of course, feed the output of the higher-level algorithm to another predictor. It is possible to use any arbitrary topology, but for practical reasons you should try a simple setup first as also dictated by Occam's razor. Blending Blending was introduced by the winners of the one million dollar Netflix prize. Netflix organized a contest with the challenge of finding the best model to recommend movies to their users. Netflix users can rate a movie with a rating of one to five stars. Obviously each user wasn't able to rate each movie, so the user movie matrix is sparse. Netflix published an anonymized training and test set. Later researchers found a way to correlate the Netflix data to IMDB data. For privacy reasons the Netflix data is no longer available. The competition was won in 2008 by a group of teams combining their models. Blending is a form of stacking. The final estimator in blending however trains only on a small portion of the train data. Voting and averaging We can arrive at our final answer through majority voting or averaging. It's also possible to assign different weights to each model in the ensemble. For averaging we can also use the geometric mean or the harmonic mean instead of the arithmetic mean. Usually combining the results of models, which are highly correlated to each other doesn't lead to spectacular improvements. It's better to somehow diversify the models, by using different features or different algorithms. If we find that two models are strongly correlated, we may for example decide to remove one of them from the ensemble, and increase proportionally the weight of the other model. Installing software and setting up For most projects in this book we need scikit-learn (Refer, http://scikit-learn.org/stable/install.html) and matplotlib (Refer, http://matplotlib.org/users/installing.html). Both packages require NumPy, but we also need SciPy for sparse matrices as mentioned before. The scikit-learn library is a machine learning package, which is optimized for performance as a lot of the code runs almost as fast as equivalent C code. The same statement is true for NumPy and SciPy. There are various ways to speed up the code, however they are out of scope for this book, so if you want to know more, please consult the documentation. Matplotlib is a plotting and visualization package. We can also use the seaborn package for visualization. Seaborn uses matplotlib under the hood. There are several other Python visualization packages that cover different usage scenarios. Matplotlib and seaborn are mostly useful for the visualization for small to medium datasets. The NumPy package offers the ndarray class and various useful array functions. The ndarray class is an array, that can be one or multi-dimensional. This class also has several subclasses representing matrices, masked arrays, and heterogeneous record arrays. In machine learning we mainly use NumPy arrays to store feature vectors or matrices composed of feature vectors. SciPy uses NumPy arrays and offers a variety of scientific and mathematical functions. We also require the pandas library for data wrangling. In this book we will use Python 3. As you may know Python 2 will no longer be supported after 2020, so I strongly recommend switching to Python 3. If you are stuck with Python 2 you still should be able to modify the example code to work for you. In my opinion the Anaconda Python 3 distribution is the best option. Anaconda is a free Python distribution for data analysis and scientific computing. It has its own package manager conda. The distribution includes more than 200 Python packages, which makes it very convenient. For casual users the Miniconda distribution may be the better choice. Miniconda contains the conda package manager and Python. The procedures to install Anaconda and Miniconda are similar. Obviously, Anaconda takes more disk space. Follow the instructions from the Anaconda website at http://conda.pydata.org/docs/install/quick.html. First, you have to download the appropriate installer for your operating system and Python version. Sometimes you can choose between a GUI and a command line installer. I used the Python 3 installer, although my system Python version is 2.7. This is possible since Anaconda comes with its own Python. On my machine the Anaconda installer created an anaconda directory in my home directory and required about 900 MB. The Miniconda installer installs a miniconda directory in your home directory. Installation instructions for NumPy are at http://docs.scipy.org/doc/numpy/user/install.html. Alternatively install NumPy with pip as follows: $ [sudo] pip install numpy The command for Anaconda users is: $ conda install numpy To install the other dependencies substitute NumPy by the appropriate package. Please read the documentation carefully, not all options work equally well for each operating system. The pandas installation documentation is at http://pandas.pydata.org/pandas-docs/dev/install.html. Troubleshooting and asking for help Currently the best forum is at http://stackoverflow.com. You can also reach out on mailing lists or IRC chat. The following is a list of mailing lists: Scikit-learn: https://lists.sourceforge.net/lists/listinfo/scikit-learn-general. NumPy and Scipy mailing list: https://www.scipy.org/scipylib/mailing-lists.html. IRC channels: #scikit-learn @ freenode #scipy @ freenode Summary In this article we covered the basic concepts of machine learning, a high level overview, generalizing with data, overfitting, dimensions and features, preprocessing, combining models, installing the required software and some places where you can ask for help. Resources for Article: Further resources on this subject: Machine learning and Python – the Dream Team Machine Learning in IPython with scikit-learn How to do Machine Learning with Python

0
0
15163

Packt

23 Jun 2017

20 min read

To Optimize Scans

Packt

23 Jun 2017

20 min read

0
0
37401

How-To Tutorials

Packt

23 Jun 2017

5 min read

Getting Started with Ansible 2

Packt

23 Jun 2017

5 min read

In this article, by Jonathan McAllister, author of the book, Implementing DevOps with Ansible 2, we will learn what is Ansible, how users can leverage it, it's architecture, the key differentiators of Ansible from other configuration managements. We will also see the organizations that were successfully able to leverage Ansible. (For more resources related to this topic, see here.) What is Ansible? Ansible is a relatively new addition to the DevOps and configuration management space. It's simplicity, structured automation format and development paradigm have caught the eyes of both small and large corporations alike. Organizations as large as Twitter have managed successfully to leverage Ansible for highly scaled deployments and configuration management across implementations across thousands of servers simultaneously. And Twitter isn't the only organization that has managed to implement Ansible at scale, other well-known organizations that have successfully leveraged Ansible include Logitech, NASA, NEC, Twitter, Microsoft and hundreds more. As it stands today, Ansible is in use by major players around the world managing thousands of deployments and configuration management solutions world wide. The Ansible's Automation Architecture Ansible was created with an incredibly flexible and scalable automation engine. It allows users to leverage it in many diverse ways and can be conformed to be used in the way that best suits your specific needs. Since Ansible is agentless (meaning there is no permanently running daemon on the systems it manages or executes from), it can be used locally to control a single system (without any network connectivity), or leveraged to orchestrate and execute automation against many systems, via a control server. In addition to the aforementioned named architectures, Ansible can also be leveraged via Vagrant or Docker to provision infrastructure automatically. This type of solution basically allows the Ansible user to bootstrap their hardware or infrastructure provisioning by running an Ansible playbook(s). If you happen to be a Vagrant user, there is instructions within the HashiCorp Ansible provisioning located at https://www.vagrantup.com/docs/provisioning/ansible.html. Ansible is open source, module based, pluggable, and agentless. These key differentiators from other configuration management solutions give Ansible a significant edge. Let's take a look at each of these differentiators in details and see what that actually means for Ansible developers and users: Open source: It is no secret that successful open source solutions are usually extraordinarily feature rich. This is because instead of a simple 8 person (or even 100) person engineering team, there are potentially thousands of developers. Each development and enhancement has been designed to fit a unique need. As a result the end deliverable product provides the consumers of Ansible with a very well rounded solution that can be adapted or leveraged in numerous ways. Module based: Ansible has been developed with the intention to integrate with numerous other open and closed source software solutions. This idea means that Ansible is currently compatible with multiple flavors of Linux, Windows and Cloud providers. Aside from its OS level support Ansible currently integrates with hundreds of other software solutions; including EC2, Jira, Jenkins, Bamboo, Microsoft Azure, Digital Ocean, Docker, Google and MANY MANY more. For a complete list of Ansible modules, please consult the Ansible official module support list located at http://docs.ansible.com/ansible/modules_by_category.html. Agentless: One of the key differentiators that gives Ansible an edge against the competition is the fact that it is 100% agentless. This means there are no daemons that need to be installed on remote machines, no firewall ports that need to be opened (besides traditional SSH), no monitoring that needs to be performed on the remote machines and no management that needs to be performed on the infrastructure fleet. In effect, this makes Ansible very self sufficient. Since Ansible can be implemented in a few different ways the aim of this section is to highlight these options and help get us familiar with the architecture types that Ansible supports. Generally the architecture of Ansible can be categorized into three distinct architecture types. These are described next. Pluggable: While Ansible comes out of the box with a wide spectrum of software integrations support, it is often times a requirement to integrate the solution with a company based internal software solution or a software solution that has not already been integrated into Ansible's robust playbook suite. The answer to such a requirement would be to create a plugin based solution for Ansible, this providing the custom functionality necessary. Summary In this article, we discussed the architecture of Ansible, the key differentiators that differentiate Ansible from other configuration management. We learnt that Ansible can also be leveraged via Vagrant or Docker to provision infrastructure automatically. We also saw that Ansible has been successfully leveraged by large oraganizations like Twitter, Microsoft, and many more. Resources for Article: Further resources on this subject: Getting Started with Ansible Getting Started with Ansible Introduction to Ansible Introduction to Ansible System Architecture and Design of Ansible System Architecture and Design of Ansible

0
0
25509

Packt

23 Jun 2017

19 min read

Setting up your Raspberry Pi

Packt

23 Jun 2017

19 min read

In this article by Pradeeka Seneviratne and John Sirach, the authors of the book Raspberry Pi 3 Projects for Java Programmers we will cover following topics: Getting started with the Raspberry Pi Installing Raspbian (For more resources related to this topic, see here.) Getting started with the Raspberry Pi With the release of the Raspberry Pi 3 the Raspberry Pi foundation has made a very big step in the history of the Raspberry Pi. The current hardware architecture is now based on a 1.2 Ghz 64 bit ARMv7. This latest release of the Raspberry Pi also includes support for wireless networking and has an onboard Bluetooth 4.1 chip available. To get started with the Raspberry Pi you will be needing the following components: Keyboard and mouse Having both a keyboard and mouse present will greatly help with the installation of the Raspbian distribution. Almost any keyboard or mouse will work. Display You can attach any compatible HDMI display which can be a computer display or a television. The Raspberry Pi also has composite output shared with the audio connector. You will be needing an A/V cable if you want to use this output. Power adapter Because of all the enhancements done the Raspberry Pi foundation recommends a 5V adapter capable to deliver 2.5 A. You would be able to use a lower rated one, but I strongly advice against this if you are planning to use all the available USB ports. The connector for powering the device is done with a Micro USB cable. MicroSD card The Raspberry Pi 3 uses a microSD card. I would advice to use at least a 8 GB class 10 version. This will allow to use the additional space to install applications and as our projects will log data you won’t be running out of space soon. The Raspberry Pi 3 Last but not least a Raspberry Pi 3. Some of our projects will be using the on-board Bluetooth chip and this version is also being focussed on in this article. Our first step will be preparing a SD card for usage with the Raspberry Pi. You will be needing a MicroSD card as the Raspberry Pi 3 only supports this format. The preparation of the SD card is being done on a normal PC so it is wise to purchase one with an adapter fitting a full size SD card slot. There are webshops selling pre-formatted SD cards with the NOOBS installer already present on the card. If you have bought one of these pre-formatted cards you can skip to the Installing Raspbian section. Get a compatible SD card There are a large numbers of SD cards available. The Raspberry Pi foundation advices an 8 GB card leaving space to install different kind of applications and supplies enough space for us to write any log data. When you buy a SD card it is wise to keep your eyes open for the quality of these cards. Buying them from well known and established manufactures often supplies better quality then the counterfeit ones. SD cards are being sold with different class definitions. These classes explain the minimal combined read and write speeds. Class 6 should provide at least 6 MB (Mega Byte) per second and class 10 cards should provide at least 10 MB/s. There is a good online resource available which provides tested results of used SD cards with the Raspberry Pi. If you would need any resource to check for compatible SD cards I would advice you to go to the embedded Linux page at http://elinux.org/RPi_SD_cards. Preparing and formatting the SD card To be able to use the SD card it first needs to be formatted. Most cards are already formatted with the FAT32 file system, which the Raspberry Pi NOOBS installer requires, unless you have bought a large SD card it is possible it is formatted with the exFAT file system. These then should also be formatted as FAT32. To format the SD card we will be using the SD association’s SDformatter utility which you can download from http://elinux.org/RPi_SD_cards as default Operating System supplied formatters are not always providing optimal results. In the below screenshot, the SDformatter for the Mac is shown. This utility is also available for Windows and has the same options. If you are using Linux you can use GParted. Make sure when using GParted you use FAT32 as the formatting option. As in the screenshot select the Overwrite format option and give the SD card a label. The example shows RPI3JAVA but this can be a personal label of your choice to quickly recognize the card when inserted: Press the Format button to start formatting the SD card. Depending on the size of the SD card this can take some time enabling you to get a cup of coffee. The utility will show a done message in the form of Card Format complete when the formatting is done. You will now have an usable SD card. To be able to use the NOOBS installer you will be needing to follow the following steps: Download the NOOBS installer from https://www.raspberrypi.org/downloads/. Unzip the file with your favorite unzip utility. Most Operating Systems already have one installed. Copy the contents of the unzipped file into the SD card’s root directory so the copy result is shown. When selecting the NOOBS for download do only select the lite version if you do not mind to install Raspbian using the Raspberry Pi’s network connection. Now after we have copied the required files into the SD card we can start installing the Raspbian Operating System. Installing Raspbian To install Raspbian we need to get the Raspberry Pi ready for use. As the Raspberry Pi has no power on and off button the powering of the Raspberry Pi will be done as the last step: At the bottom of the Raspberry Pi on the side you will see a slot to put your MicroSD card. Insert the SD card with the connectors pointing to the board. Next connect the HDMI or the Composite connector and your keyboard and mouse. You won’t be needing a network cable as we will be using the wireless functionality build into the Raspberry Pi. We will now connect the Raspberry Pi with the micro USB cabled power supply. When the Raspberry Pi boots up you will be presented with the installation options of Operating Systems available to be installed. Depending on the download of NOOBS you have done you will be able to see if the Raspbian Operating System is already available on the SD card or it will be installed by downloading it. This is being visualized by showing an SD card image or a network image behind the Operating system name. In the below screenshot you see the NOOBS installer with the Raspbian Image available on the SD card. At the bottom of the installation screen you will find the Language and Keyboard drop down menu’s. Make sure you select the appropriate language and keyboard selection otherwise it will become quite difficult to enter correct characters on the command line and other tools requiring text input. Select the Raspbian [RECOMMENDED] option and click the Install (i) button to start installing the Operating System: You will be prompted with a popup confirming the installation as it will overwrite any existing installed Operating Systems. As we are using a clean SD card we will not be overwriting any. It is safe to press Yes to start the installation. This installation will take up a couple of minutes, so it is a good time to go for a second cup of coffee. When the installation is done you can press Ok in the popup which appears and the Raspberry Pi will reboot. Because Raspbian is a Linux OS you will see text scrolling by of services which are being started by the OS. hen all services are started the Raspberry Pi will start the default graphical environment called LXDE which is one of the Linux window managers. Configuring Raspbian Now that we have installed Raspbian and have it booting into the graphical environment we can start configuring the Raspberry Pi for our purposes. To be able to configure the Raspberry Pi the graphical has got an utility tool installed which eases up the configuration called Raspberry Pi Configuration. To open this tool use the mouse and click on the Menu button on the top left, navigate to Preferences and press the Raspberry Pi Configuration menu option like shown in the screenshot: When clicked on the Raspberry Pi Configuration tool menu option a popup will appear with the graphical version of the known raspi-config command line tool. In thegraphicalpopup we see 4 tabs explaining different parts of possible configuration options. We first focus on the System tab which allows us to: Change the Password Change the Hostname which helps to identify the Raspberry Pi in the network Change the Boot method to, as we are looking at, To Desktop or the CLI which is the command line interface And set the Network at Boot option With the system newly installed the default username is pi and the password is set to raspberry. Because these are the default settings it is recommended that we change the password into a new one. Press the Change Password button and enter a newly chosen password twice. One time for setting the password and the second time to make sure we have entered the new password correctly. Press Ok when the password has been entered twice. Try to come up with a password which contains capital letters, numbers and some strange characters as this will make it more difficult to guess. Now after we have set a new password we are going to change the hostname of the Raspberry Pi. With the hostname we are able to identify the device on the network. I have changed the hostname into RASPI3JAVA which helps me to identify this Raspberry Pi to be used for the article. The hostname is used on the Command Line Interface so you will immediately identify this Raspberry Pi when you login. By default the Raspberry Pi boots into the graphical user interface which you are looking at right now. Because a future project will require us to make use of a display with our application we will be choosing to boot into the CLI mode. Click on the radio button which says To CLI. The next time we are rebooting we will be shown the command line interface. Because we will be going to use the integrated WI-FI connection on this Raspberry Pi we are going to change the Network at Boot option to set it to have it waiting for the network. Tick the box which says Wait for network. We are done with setting some default settings which sets some primary options to help us identify the Raspberry Pi and changed the boot mode. We will now be changing some advanced settings which enables us to make use of the hardware provided by the Raspberry Pi. Click on the Interface tab which will give us a list of the available hardware provided. This list consists of: Camera: The Official Raspberry Pi Camera interface SSH: To be able to login in from remote locations SPI: Serial Peripheral Interface Bus for communicating with hardware I2C: serial communication bus mostly used between chips Serial: The serial communication interface 1-Wire: Low data, power supplying bus interface conceptual based on I2C Remote GPIO A future project we will be working on will require some kind of Camera interface. This project will be able to use both the local attached official Raspberry Pi Camera module as well as a USB connected webcam. If you got the camera module tick Enabled radio box behind Camera. We will be deploying our applications immediately from the editor. This means we need to enable the SSH option. By default this already is so we leave the setting as is. If the SSH option is not enabled tick the radio button Enabled behind the SSH option. For now you can leave the other interfaces disabled as we will only enable them when we need them. As we now have enabled default interfaces we will be going to need, we are going to do some performance tweaking. Click on the Performance tab to open the performance options. We will not be needing to overclock the Raspberry Pi so you can leave this options as it is. Later on in the article we will be interfacing with the Raspberry Pi’s display and do some neat tricks with it. For this we need some amount of memory for the Graphical Processor Unit, the GPU. By default this is set to 64 MB. We will ask for the most amount of memory possible to be assigned to the GPU which is 512 MB. Put 512 behind the GPU Memory option, there is no need to enter the text “MB”. The memory on the Raspberry Pi is shared between the system and the GPU. By having this option set to 512 MB results in only 512 MB available for the system. I can assure you this is more then sufficient. Now that we are done with the system configuration we are making sure we can work with the Raspberry Pi. Click on the Localisation tab to show the options applicable to the location the Raspberry Pi resides. We have the options: Set Locale Where you set your locale settings Timezone The time zone you currently at Keyboard The layout of the keyboard Wi-Fi Country The country you will be making the Wi-Fi connection This article is focused on US-English language with a broad character set. Unless you prefer to continue with your own personal preferences change the following by pressing the Set Locale button: Language to en (English) Country to US (USA) Character Set to UTF-8 Press the OK button to continue. As this needs to build up the Locale settings this can take up about 20 seconds to setup, you will be notified with a small window until this process is finished. The next step is to set the Timezone. This is needed as we want to have time and dates be shown correctly. Click on the Set Timezone button and select your appropriate Area and Location from the drop down menu’s. When done press the OK button. To make sure that the text we enter in any input field is the correct one we are going to set the layout of our keyboard. There are a lot of layouts available so you need to check yours. The Raspberry Pi is quite helpful in providing any keyboard options. Press the Set Keyboard button to open up a popup showing the keyboard options. Here you are able to select your country and the keyboard layout available for this country. In my case I have to select United States as the Country and the Variant as English (US, with euro on 5). After you have made the selection you can test your keyboard setup in the input field below the Country and Variant selection lists. Press the OK button when you are satisfied with your selection. Unless you are connecting your Raspberry Pi with a wired network connection we are going to setup the country we are going to make the Wi-Fi connection so we are able to connect remotely to the Raspberry Pi. Press the Set Wi-Fi Country button to have a the Wi-Fi Country Code shown which provides us the list of available countries for t. he Wi-Fi connection. Press the OK button after you have made the selection. We are now done with the minimal Raspberry Pi system configuration. Press OK in the settings window to have all our settings stored and press No in the popup following which says a reboot is needed to have the settings applied as we are not completely done yet. Our final step is to set up the local Wi-Fi chip on the Raspberry Pi. We will now set up the Wi-Fi on the Raspberry Pi. Unless you want your Raspberry Pi connected with a Network cable you can skip this section and head over to the Set fixed IP section: To set up the Wi-Fi click on the Network icon which is shown on top of the screen between the Bluetooth and Speaker Volume icon. When you click this button The Raspberry Pi will start to scan for available wireless networks. Give it a couple of seconds if your Network does not appear immediately. When you see you network appearing click on itandifyou network is secured you will be asked to supply the credentials to be able to connect to your wireless network. If there are any troubles with connecting to your wireless network without any message log in to your router and change the Wi-Fi channel to a channel lower then channel 11. When you have entered your credentials and pressed the OK button you will see the icon changing from the two network computers to the wireless icon trying to connect to the wireless network. Now that the Raspberry Pi has rebooted we have configured the wireless network to make sure the wireless network will keep it’s connection. As the Raspberry Pi is an embedded device targeting low power consumption the Wi-Fi connection is possible set to sleep mode after a specific time there is no network usage. To make sure the Wi-Fi is not going into power sleep mode we will be changing a setting through the command line which will make sure this won’t happen. To open a command line interface we need to open a Terminal. A terminal is a window which will show the command prompt where we are able to provide commands. When you look at the graphical interface you will notice a small computer screen icon on the top in the menu bar. When we hover this icon it shows the text Terminal. Press this icon to open up a terminal. A popup will open with a large blackscreenandshowing the command prompt like shown in the screenshot: Do you notice the hostname we have set earlier? This is the same prompt as we will see when we log in remotely. Now that we have a command line open we need to enter a command to make sure the wireless network will not go to sleep after a period of no network activity. Enter the following in the terminal: sudo iw dev wlan0 set power_save off Press Enter. This command sets the power save mode to off so the wlan0 (wireless device) won’t enter power save mode and stays connected to the network. We are almost done with setting up the Raspberry Pi. To be able to connect to the Raspberry Pi from a remote location we need to know the IP address of the Raspberry Pi. This final configuration step involves setting a fixed IP address into the Raspberry Pi settings. To open the settings for a fixed IP configuration we are going to open up the settings by pressing the wireless network icon with the right mouse button and press the option Wi-Fi Networks (dhcpcdui) Settings. A popup will appear providing settings we can change. As we will will only change the settings of the Wi-Fi connection we select interface next to the Configure option. When interface is selected we are able to select the wlan0 option in the drop down menu next to the interface selection. If you have chosen to use a wired instead of the wireless connection you can select the eth0 option next to the interface option. We now have a couple of options available to enter IP address related information. Please refer to the documentation of your router to find out which IP address is available to you which you can use. My advice is to only enter the IP address in the available fields whichleaves the other options automatically configured like in the screenshot below. Notice that the entered IP address is the correct one; this only applies to my configuration which could differ from yours: After you have entered the IP address you can click Apply and Close. It is now time to restart our Raspberry Pi and have it boot to the CLI. While rebooting you will see a lot of text scrolling which shows the services starting and at the end instead of starting the graphical interface we are now shown the text based command line interface as shown in the screenshot: If you want to return to the graphical interface just type in: startx Press Enter and wait a couple of seconds for the graphical user interface appear again. We are now ready to install the Oracle JDK which we will be using to run our Java applications. Summary In this article we have learned how to start with with the Raspberry Pi and how to install Raspbian. Resources for Article: Further resources on this subject: The Raspberry Pi and Raspbian The Raspberry Pi and Raspbian Raspberry Pi Gaming Operating Systems Raspberry Pi Gaming Operating Systems Sending Notifications using Raspberry Pi Zero Sending Notifications using Raspberry Pi Zero

0
0
30936

How-To Tutorials

Packt

23 Jun 2017

17 min read

Exploring Compilers

Packt

23 Jun 2017

17 min read

0
0
12993

Packt

23 Jun 2017

18 min read

User Story Map – The First User Experience Map in a Product’s Life

Packt

23 Jun 2017

18 min read

In this article by Peter W. Szabo, the author of the book User Experience Mapping we will explore the idea of how to start with predictive analysis. In this User story maps solve the user's problems in form of a discussion. Your job as a product manager or user experience consultant should be to make the world better through user-centric products. Essentially solving the user's problems. Contrary to popular belief, user story maps are not just cash cows for agile experts. They will help a product to succeed, by increasing their understanding of the system. Not just what's inside it, but what will happen to the world as a result. By focusing on the opportunity and outcomes the team can prioritize development. In reality, this often means stopping the proliferation of features, andunderdoing your competition. Wait a minute, did you just read underdoing? As in, fewer features, not making bold promises and significantly less customizability and options? Yes indeed. The founders of Basecamp (formerly 37 signals) are the champions of building less. In their bookReWork: Change the Way You Work Foreverthey tell Basecamp's success story while giving vital advice to anyone trying to run a build a product or a startup: “When things aren't working, the natural inclination is to throw more at the problem. More people, time, and money. All that ends up doing is making the problem bigger. The right way to go is the opposite direction: Cut back. So do less. Your project won't suffer nearly as much as you fear. In fact, there's a good chance it'll end up even better.” (Jason Fried) User Story Maps will help you to throw less at the problem, chopping down extras, until you reach an awesome product, which is actually done.One of the problems with long product backlogs or nightmarish requirement documents is that it never gets done. Literally never. Once I had to work on improving the user experience of a bank's backend. It was a gargantuan task, as this backend was a large collection of distributed microservices, which meant hundreds of different forms with hard to understand functions and a badly designed multi-level menu which connected them together. I knew almost nothing about banking, and they knew almost nothing about UX, so this was a match made in heaven. They gave me a twelve-page document. That was just the non-disclosure agreement. The project had many 100+ page documents, detailing various things and how they are done, complete with business processes and banking jargon. They wanted us to compile an expert review on what needs to be redesigned and create a detailed strategy for that. I found a better use of their money than wasting time on expert reviews and redesign strategies at that stage. Recording or even watching bank employees, while they used the system during their work was out of the question. So we went for the quick win and did user story mapping in the first week of the project. Among the attendees of the user story mapping sessions, therewerea few non-manager level bank employees, who used the backend extensively. One of them was quite new to her job, but fortunately, quite talkative about it. It was immediately evident that most employees almost never used at least 95% of the functionality. Those were reserved for specially trained people, usually managers. After creating the user story map with the most essential and frequently used features, I suggested a backend interface, which only contained about 1% of the functionality of the old system at first, with the mention of other features to be added later. (As a UX consultant you should avoid saying no, instead try saying later. It has the same effect for the project but keeps everyone happy.) No one in the room believed that such a crazy idea would go through senior management, although they supported the proposal. Quite the contrary, it did go extremely well with senior management. The senior managers understood, that by creating a simple and fast backend user interface, they will be able to reduce the queues without hiring new employees. Moreover, if they need to hire people, training will be easier and faster. The new UI could also reduce the number of human errors. Almost all of the old backend was still online two years later, although used only by a few employees. This made both the product and the information security team happy, not to mention HR. The functionality of the new application extended only slightly in 24 months. Nobody complained and the bank's customers were happy with smaller queues. All this was achieved with a pack of colored sticky notes, some markers and much more importantly a discussion and shared understanding. This is just one example, how a simple technique, like user story mapping, could save millions of dollars for a company. (For more resources related to this topic, see here.) Just tell the story Drawing a map, any map will lead to solving the problem. User story maps aim to replace document hand-overs with discussions and collaboration. Enterprises tend to have some sort of formal approval process, usually with a sign-off. That's perfectly fine, and most of the time unavoidable. Just make sure, that the sign-off happens after the mapping and story discussions. Ideally, right after the discussion, not days or weeks later. There is a reason why product manager, UX experts and all stakeholders love stories: they are humans. As such, we all have a natural tendency to love an emotionally satisfying tale. Most of our entertainment revolves around stories, and we want to hear good stories. A great story revolves around conflicts in a memorable and exciting way. How to tell a story? Telling a story is an easy task. We all did that as kids, yet we tend to forget about that skill we possess when we get into a serious product management discussion. How to tell a great story? There are a few rules to consider, the most important one is that you should talk about something that captivates the audience. The audience You should focus on the audience. What are their problems? What would make them listen actively, instead of texting or catching Pokémon, while at a user story discussion? Even if the project is about scratching your own itch, you should spin the story so it's their itch that is scratched. Engaging the audience can be indeed a challenge. Once upon atimeI have written a sci-fi novel. Actually, it was published in 2000, with the title Tűzsugár, in Hungarian.The English title would be Ray of Fire, but fortunately for my future writing career, it was never translated into English. The bookhad everything my 15-16 years old self consideredfun: For instance a great space war or passionate love between members of different sapient spacefaring races. The characters were miscellaneous human and non-human life-forms stuck in a spaceship for most of the story. Some of my characters had a keen resemblance to miscellaneous video-game characters, from games like Mortal Kombat 2 or Might & Magic VI. They certainly lacked emotional struggles over insignificant things like mass-murder or the end of the universe. As I certainly hope you will never read that book, I will spoiler you the end. A whole planet died, hinting that the entire galaxy might share the same fate, with a faint hope for salvation. This could have led to a sequel, but fortunately for all sci-fi lovers, I stopped writing the sequel after nine chapters. The book seemed to be a success. A national TV channel made an interview with me, if that’s any measure of success. More importantly, I had lots of fun writing it. But the book itself was hard to understand and probably impossible to appreciate. My biggest mistake was writing only what I considered fun. To be honest, I still write for fun, but now Ihave an audience in mind. I tell the story of my passion for user experience mapping to a great audience: you. I try to find things that are fun to write and still interesting to my target audience. As a true UX geek, I create the main persona of my audience before writing anything and tell a story to her. This article’s main persona is called Maya, and she shares many traits with my daughter. Could I say, I'm writing this book to my daughter? Of course, I do, but I keep in mind all otherpersonas. Hopefully one of them is a bit like you. Before a talk at a conference, I always ask the organizers about the audience. Even if the topic stays the same, the audience completely changes the story and the way I present it. I might tell the same user story differently to one of my team members, to the leader of another team, or to a chief executive. Differently internally, to a client or a third party. When telling a story, contrary to a written story, you will see an immediate feedback or the lack of it from your audience. You should go even further and shape the story based on this feedback. Telling a story is an interactive experience. Engage with the audience. Ask them questions, and let them ask you questions as a start, then turn this into a shared storytelling experience, where the story is being told by all participants together (not at the same time, though, unless you turn the workshop into a UX carol). When you tell a fairy-tale to your daughter, she might ask you why can't theprincessescape using her sharp wits and cunning, instead of relying on a prince challenging the dragon to a bloody duel. (Then you might start appreciating the story of the My Little Pony where the girl ponies solve challenges mostly non-violently while working together as a team of friends, instead of acting as a prize to be won.) So why not spin a tale of heroic princesses with fluffy pink cats? Start with action Beginningin medias res, as in starting the narrative in the midst of action is a technique used by masters, such as Shakespeare or Homer, and it is also a powerful tool in your user story arsenal. While telling a story, always try to add as little background as possible, and start with drama or something to catch the attention of the audience, whenever possible. At the beginning of TheOdyssey quite a few unruly men want to marry Telemachus' mother, while his father has still not returned home from the Trojan War. There is no lengthy introduction explaining how those men ended up in Ithaca, or why the goddess, flashing-eyed Athena cares about Odysseus. The poem was composed in an oral tradition and was more likely to be heard thanreadat the time of composition. While literacy skyrocketed since Homer's time, you want to tell and discuss your user stories. Therefore you should consider a similar start. (Maybe not mentioning the user's mother or her rascally suitors.) Simplify In literary fiction, a complex story can be entertaining. A Game of Thrones and its sequels inA Song of Ice and Fireseries is a good example for that. The thing is, George R. R. Martin writes those novels, and he certainly has no intention to discuss them during sprint planning meetings with stakeholders. User Story Maps are more similar to sagas, folktales and other stories formed in an oral tradition. They develop in a discussion, and their understandability is granted by their simplicity. We need to create a map as simple and small as possible, with as few story cards as possible. So how big should the story map be? Jim Shore definesstory card hell as something that happens when you have 300 story cards and you have to keep track of them. Madness, huh? This is not Sparta! Sorry Jim for the bad pun, but you are absolutely right, in the 300 range, you will not understand the map, and the whole discussion part will completely fail. The user stories will be lost, and the audience will not even try to understand what's happening. There is no ideal number of cards in a story map but aim low. Then eliminate most of the cards. Clutter will destroy the conversation. In most card games you will have from two to seven cards in hand, with some rare exceptions. The most popular card game both online and offline is Texas Hold 'em Poker. In that game, each player is dealt only two cards. This is because human thought processes and discussions work best with a small number of objects. Sometimes the number of objects in the real world is high. Our mind is good at simplifying, classifying and grouping things into manageable units. With that said, most books and conference presentations about user story mapping show us a photo of a wall covered with sticky notes. The viewer will have absolutely no idea what's on them, but one thing is certain, it looks like acomplex project. I have a bad news for you: projects with a complex user story map never get finished, and if they do get finished to a degree they will fail. The abundance of sticky notes means that the communication and simplification process needs one or more iterations. Throw away most of the sticky notes! To do that, you need to understand the problem better. Tell the story of your passion Seriously. Find someone, and tell her the user story of the next big thing. The app or hardware which will change the world. Try it now. Be bold and let your imagination flow. I believe that in this century we will be able to digitalize human beings. This will be the key to both humankind's survival as a species and our exploration of the space. The digital society would have no famine, no plagues and no poverty. This would solve all major problems we face today. Digital humans would even defeat death. Sounds like a wild sci-fi idea? It is, but then again, smartphones were also a wild sci-fi idea a few decades ago. Now I will tell you the story of something we can build today. The grocery surplus webshop We will create the user story map for a grocery surplus webshop. Using this eCommerce site, we will sell clearancefood and drink at a discount price. This means food, that would be thrown away at a regular store. For example food past its expiry date or with damaged packaging.This idea is popular in developed countries, like Denmark or the UK, and it might help cutting down on the amounts of food wasted every year, totaling 1.3 billion metric tonnes worldwide. We are trying to create the online-only version of WeFood (https://donate.danchurchaid.org/wefood). Our users can be environmentally conscious shoppers or low-income families with limited budgets just to give two examples. In this article I will not introduce personas, and treat them separately, so for now, we will only think about them as shoppers. The opportunity to scratch your own itch Mapping will help you to achieve the most, with as little as possible. In other words: maximize the opportunity, while minimizing the outputs. To use the mapping lingo: The outputs are products of the map’s usage, while the outcomes are the results. The opportunity is the desired outcome we plan to achieve with the aid of the map. This is how we want to change the world. We should start the map with the opportunity. The opportunity should not be defined as selling surplus food and drink to our visitors. If you approach a project or a business without solving the users' problem the project might become a failure. The best way to find out what our user want is through valid user research, remote and lab-based user experience testing. Sometimes we need to settle with the second best solution, which happens to be free. That’ssolving your own problem, in other words,scratch your own itch. Probably the best summary of this mantra comes from Jason Fried, the founder and CEO of Basecamp: “When you solve your own problem, you create a tool that you're passionate about. And passion is key. Passion means you'll truly use it and care about it. And that's the best way to get others to feel passionate about it too.” (Getting Real: The Smarter, Faster, Easier Way to Build a Successful Web Application) We will create the web store we would love to use. Although, as the cliché goes, there is noI inteam, but there is certainly an I inwriter. My ideal eCommerce site could be different to yours. When following the examples of this article, please try to think of your itch, your ideal web store, and use my words only as guidance. You can create the user story map for any other project, ideally something you are passionate about. I would encourage you to pick something that's not a webshop, or maybe not even a digital product if you feel adventurous. You need the tell the story of your passion. (No, not that passion. This is not an adults-only website.) My passion is reducing food waste (that's also the poor excuse I'm using when looking at the bathroom scale). Here is my attempt to phrase the opportunity. The opportunity: Our shoppers want to save money while reducing global food waste. They understand and accept what surplus food and drink means, and they are happy to shop with us. Actually, the first sentence would be enough. Remember, you want to have a simple one or two sentence opportunity definition. I ended up working for two tapestry web shops as a consultant. Not at the same time, though, and the second company approached me mostly as the result of how successful the first one was. It's a relatively small industry in Europe, and business owners and decision-makers know each other by name. I still recall the pleasant experience I had meeting the owners of the first web shop. They invited me to dinner at a homely restaurant in Budapest.We had a great discussion and they shared their passion. They were an elderly couple, so they must have spent most of their life in the communist era. In the early 90's they decided to start a business, selling tapestry in a brick and mortar store. Obviously, they had no background in management or running acapitalist business, but that didn't matter, they only wanted to help people to make their homes beautiful. They loved tapestry, so they started importing and selling it. When I visited their physical store I have seen them talking to a customer. They spent more than an hour discussing interior decoration with someone, who just popped by to ask the square meter prices of tapestry. Tapestry is not sold per square meter, but they did the math for the customer among many other things. They showed her many different patterns, types and discussed application methods. After leaving the shop the customer knew more about tapestry than most other people ever will. Fast forward to the second contract. I only talked to the client on Skype, and that's perfectly fine because most of my clients don't invite me to dinner. I saw many differences in this client's approach to the previous one. At some point, I asked him “Why do you sell tapestry? Is tapestry your passion?” He was a bit startled by the question, but he promptly replied: “To make money, why else? You need to be pretty crazy to have tapestry as a passion.” Seven years later the second business no longer exists, yet the first one is still successful. Treating your work as your passion works wonders. Passion is an important contributor to the success of an idea. Whenever possible, pour your passion into a product and summarize it as the opportunity. What’s next? If you buy my new book, User Experience Mapping, you will find more about user story maps in the second chapter. In that chapter, we will explore user story maps, and how they help you to create requirements through collaboration (and a few sticky notes): We will create user stories and arrange them as a user story map. We will discuss the reasons behind creating them. We will learn how to tell a story. The grocery surplus webshop's user story map will be the example I will create in this chapter. To do this, we will explore user story templates, characteristics of a good user story (INVEST) and epics. With the 3 Cs (Card, Conversation and Confirmation) process we will turn the stories into reality. We will create a user story map on a wall with sticky notes Then digitally using StoriesOnBoard. And that’s just the second chapter, each of the eleven chapters contains different user experience maps. The book reveals two advanced mapping techniques for the first time in print, the behavioural change map and the 4D UX map. You will also explore user story maps, task models and journey maps. You will create wireflows, mental model maps, ecosystem maps and solution maps. In this book, we will show you how to use insights from real users to create and improve your maps and your product. Start mapping your products now to change your users’ lives! Resources for Article: Further resources on this subject: Learning D3.js Mapping Data Acquisition and Mapping Creating User Interfaces

0
0
933

How-To Tutorials

Azure Feature Pack

Lambda Functions

Planning and Preparation

Object-Oriented Scala

Getting Started with Predictive Analytics

Writing Your First Cucumber Appium Test

Spatial Data

Working with Basic Elements – Threads and Runnables

Use in Real World Application

Getting Started with Python and Machine Learning

Trending Topics

To Optimize Scans

Getting Started with Ansible 2

Setting up your Raspberry Pi

Exploring Compilers

User Story Map – The First User Experience Map in a Product’s Life

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access