How-To Tutorials

04 Sep 2015

8 min read

How to Optimize PUT Requests

04 Sep 2015

In this article by Naoya Hashiotmo, author of the book Amazon S3 Cookbook, explains how to optimize PUT requests, it would be effective to use multipart uploads because it can aggregate throughput by parallelizing PUT requests and uploading a large object into parts. It is recommended that the size of each part should be between 25 and 50 MB for higher networks and 10 MB for mobile networks. (For more resources related to this topic, see here.) Amazon S3 is a highly-scalable, reliable, and low-latency data storage service at a very low cost, designed for mission-critical and primary data storage. It provides the Amazon S3 APIs to simplify your programming tasks. S3 performance optimization is composed of several factors, for example, which region to choose to reduce latency, considering the naming scheme and optimizing the put and get operations. Multipart upload consists of three-step processes; the first step is initiating the upload, next is uploading the object parts, and finally, after uploading all the parts, the multipart upload is finished. The following methods are currently supported to upload objects with multipart upload: AWS SDK for Android AWS SDK for iOS AWS SDK for Java AWS SDK for JavaScript AWS SDK for PHP AWS SDK for Python AWS SDK for Ruby AWS SDK for .NET REST API AWS CLI In order to try multipart upload and see how much it aggregates throughput, we use AWS SDK for Node.js and S3 via NPM (package manager for Node.js). AWS CLI also supports multipart upload. When you use the AWS CLI s3 or s3api subcommand to upload an object, the object is automatically uploaded via multipart requests. Getting ready You need to complete the following set up in advance: Sign up on AWS and be able to access S3 with your IAM credentials Install and set up AWS CLI in your PC or use Amazon Linux AMI Install Node.js It is recommended that you score the benchmark from your local PC or if you use the EC2 instance, you should launch an instance and create an S3 bucket in different regions. For example, if you launch an instance in the Asia Pacific Tokyo region, you should create an S3 bucket in the US standard region. The reason is that the latency between EC2 and S3 is very low, and it is hard to see the difference. How to do it… We upload a 300 GB file in an S3 bucket over HTTP in two ways; one is to use a multipart upload and the other is not to use multipart upload to compare the time. To clearly see how the performance differs, I launched an instance and created an S3 bucket in different regions as follows: EC2 instance: Asia Pacific Tokyo Region (ap-northeast-1) S3 bucket: US Standard region (us-east-1) First, we install the S3 Node.js module via npm, create a dummy file, upload the object into a bucket using a sample Node.js script without enabling multipart upload, and then do the same enabling multipart upload, so that we can see how multipart upload performs the operation. Now, let's move on to the instructions: Install s3 via the npm command: $ cdaws-nodejs-sample/ $ npm install s3 Create a 300 GB dummy file: $ file=300mb.dmp $ dd if=/dev/zero of=${file} bs=10M count=30 Put the following script and save the script as s3_upload.js: // Load the SDK var AWS = require('aws-sdk'); var s3 = require('s3'); var conf = require('./conf'); // Load parameters var client = s3.createClient({ maxAsyncS3: conf.maxAsyncS3, s3RetryCount: conf.s3RetryCount, s3RetryDelay: conf.s3RetryDelay, multipartUploadThreshold: conf.multipartUploadThreshold, multipartUploadSize: conf.multipartUploadSize, }); var params = { localFile: conf.localFile, s3Params: { Bucket: conf.Bucket, Key: conf.localFile, }, }; // upload objects console.log("## s3 Parameters"); console.log(conf); console.log("## Begin uploading."); var uploader = client.uploadFile(params); uploader.on('error', function(err) { console.error("Unable to upload:", err.stack); }); uploader.on('progress', function() { console.log("Progress", uploader.progressMd5Amount,uploader.progressAmount, uploader.progressTotal); }); uploader.on('end', function() { console.log("## Finished uploading."); }); Create a configuration file and save the file conf.js in the same directory as s3_upload.js: exports.maxAsyncS3 = 20; // default value exports.s3RetryCount = 3; // default value exports.s3RetryDelay = 1000; // default value exports.multipartUploadThreshold = 20971520; // default value exports.multipartUploadSize = 15728640; // default value exports.Bucket = "your-bucket-name"; exports.localFile = "300mb.dmp"; exports.Key = "300mb.dmp"; How it works… First of all, let's try uploading a 300 GB object using multipart upload, and then upload the same file without using multipart upload. You can upload an object and see how long it takes by typing the following command: $ time node s3_upload.js ## s3 Parameters { maxAsyncS3: 20, s3RetryCount: 3, s3RetryDelay: 1000, multipartUploadThreshold: 20971520, multipartUploadSize: 15728640, localFile: './300mb.dmp', Bucket: 'bucket-sample-us-east-1', Key: './300mb.dmp' } ## Begin uploading. Progress 0 16384 314572800 Progress 0 32768 314572800 … Progress 0 314572800 314572800 Progress 0 314572800 314572800 ## Finished uploading. real 0m16.111s user 0m4.164s sys 0m0.884s As it took about 16 seconds to upload the object, the transfer rate was 18.75 MB/sec. Then, let's change the following parameters in the configuration (conf.js) as follows and see the result. The 300 GB object is uploaded through only one S3 client and exports.maxAsyncS3 = 1; exports. multipartUploadThreshold = 2097152000; exports.maxAsyncS3 = 1; exports.s3RetryCount = 3; // default value exports.s3RetryDelay = 1000; // default value exports.multipartUploadThreshold = 2097152000; exports.multipartUploadSize = 15728640; // default value exports.Bucket = "your-bucket-name"; exports.localFile = "300mb.dmp"; exports.Key = "300mb.dmp"; Let's see the result after changing the parameters in the configuration (conf.js): $ time node s3_upload.js ## s3 Parameters … ## Begin uploading. Progress 0 16384 314572800 … Progress 0 314572800 314572800 ## Finished uploading. real 0m41.887s user 0m4.196s sys 0m0.728s As it took about 42 seconds to upload the object, the transfer rate was 7.14 MB/sec. Now, let's quickly check each parameter, and then get to the conclusion; maxAsyncS3 defines the maximum number of simultaneous requests that S3 clients are open to Amazon S3. The default value is 20. s3RetryCount defines the number of retries when a request fails. The default value is 3. s3RetryDelay is how many milliseconds S3 clients will wait when a request fails. The default value is 1000. multipartUploadThreshold defines the size of uploading objects via multipart requests. The object will be uploaded via multipart request, if you choose an object that is greater than the size you specified. The default value is 20 MB, the minimum is 5 MB, and the maximum is 5 GB. multipartUploadSize defines the size for each part when uploaded via the multipart request. The default value is 15 MB, the minimum is 5 MB, and the maximum is 5 GB. The following table shows the speed test score with different parameters: maxAsyncS3 1 20 20 40 30 s3RetryCount 3 3 3 3 3 s3RetryDelay 1000 1000 1000 1000 1000 multipartUploadThreshold 2097152000 20971520 20971520 20971520 20971520 multipartUploadSize 15728640 15728640 31457280 15728640 10728640 Time (seconds) 41.88 16.11 17.41 16.37 9.68 Transfer Rate (MB) 7.51 19.53 18.07 19.22 32.50 In conclusion, multipart upload is effective for optimizing the PUT operation, aggregating throughput. However, you need to consider the following: Benchmark your scenario and evaluate the number of retry count, delay, parts, and the multipart upload size based on the networks that your application belongs to. There's more… Multipart upload specification There are limits to using multipart upload. The following table shows the specification of multipart upload: Item Specification Maximum object size 5 TB Maximum number of parts per upload 10,000 Part numbers 1 to 10,000 (inclusive) Part size 5 MB to 5 GB, last part can be more than 5 MB Maximum number of parts returned for a list of parts request 1,000 Maximum number of multipart uploads returned in a list of multipart uploads request 1,000 Multipart upload and charging If you initiate multipart upload and abort the request, Amazon S3 deletes the upload artifacts and any parts you have uploaded and you are not charged for the bills. However, you are charged for all storage, bandwidth, and requests for the multipart upload requests and the associated parts of an object after the operation is completed. The point is you are charged when a multipart upload is completed (not aborted). See also Multipart Upload Overview https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html AWS SDK for Node.js http://docs.aws.amazon.com/AWSJavaScriptSDK/guide/node-intro.htm Node.js S3 package npm https://www.npmjs.com/package/s3 Amazon Simple Storage Service: Introduction to Amazon S3 http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014 http://www.slideshare.net/AmazonWebServices/pfc403-maximizing-amazon-s3-performance-aws-reinvent-2014 AWS re:Invent 2014 | (PFC403) Maximizing Amazon S3 Performance https://www.youtube.com/watch?v=_FHRzq7eHQc Summary In this article, we learned how to optimize the PUT requests and uploading a large object into parts. Resources for Article: Further resources on this subject: Amazon Web Services[article] Achieving High-Availability on AWS Cloud[article] Architecture and Component Overview [article]

0
0
14028

article-image-creating-poi-listview-layout

Packt

04 Sep 2015

27 min read

Creating the POI ListView layout

Packt

04 Sep 2015

27 min read

0
0
5098

article-image-first-look-and-blinking-lights

Packt

04 Sep 2015

19 min read

First Look and Blinking Lights

Packt

04 Sep 2015

19 min read

This article, by Tony Olsson, the author of the book, Arduino Wearable Projects, explains the Arduino platform based on three different aspects: software, hardware, and the Arduino philosophy. (For more resources related to this topic, see here.) The hardware is the Arduino board, and there are multiple versions available for different needs. Here, we will be focusing on Arduino boards that were made with wearables in mind. The software used to program the boards is also known as the Arduino IDE. IDE stands for Integrated Development Environment, which are programs used to write programs in programming code. The programs written for the board are known as sketches, because the idea aids how to write programs and works similar to a sketchpad. If you have an IDE, you can quickly try it out in code. This is also a part of the Arduino philosophy. Arduino is based on the open source philosophy, which also reflects on how we learn about Arduino. Arduino has a large community, and there are tons of projects to learn from. First, we have the Arduino hardware, which we will use to build all the examples along with different additional electronic components. When the Arduino projects started back in 2005, there was only one piece of headwear to speak of, which was the serial Arduino board. Since then, there have been several iterations of this board, and it has inspired new designs of the Arduino hardware to fit different needs. If you are familiar with Arduino for a while, you probably started out with the standard Arduino board. Today, there are different Arduino boards that fit different needs, and there are countless clones available for specific purposes. In this article, we will be using different specialized Arduino boards the FLORA board. The Arduino software that is Arduino IDE is what we will use to program our projects. The IDE is the software used to write programs for the hardware. Once a program is compiled in the IDE, it will upload it to the Arduino board, and the processor on the board will do whatever your program says. Arduino programs are also known as sketches. The name sketches is borrowed from another open source project and software called Processing. Processing was developed as a tool for digital artists, where the idea was to use Processing as a digital sketchpad. The idea behind sketches and other aspects of Arduino is what we call the Arduino philosophy, and this is the third thing that makes Arduino. Arduino is based on open source, which is a type of licensing model where you are free to develop you own designs based on the original Arduino board. This is one of the reasons why you can find so many different models and clones of the Arduino boards. Open source is also a philosophy that allows ideas and knowledge to be shared freely. The Arduino community has grown strong, and there are many great resources to be found, and Arduino friends to be made. The only problem may be where to start? This is based on a project that will take you from the start, all the way to a finished "prototype". I call all the project prototypes because these are not finished products. As your knowledge progresses, you can develop new sketches to run on you prototypes, develop new functions, or change the physical appearance to fit your needs and preferences. In this article, you will have a look at: Installing the IDE Working with the IDE and writing sketches The FLORA board layout Connecting the FLORA board to the computer Controlling and connecting LEDs to the FLORA board Wearables This is all about wearables, which are defined as computational devices that are worn on the body. A computational device is something that can make calculations of any sort. Some consider mechanical clocks to be the first computers, since they make calculations on time. According to this definition, wearables have been around for centuries, if you think about it. Pocket watches were invented in the 16th century, and a watch is basically as small device that calculates time. Glasses are also an example of wearable technology that can be worn on your head, which have also been around for a long time. Even if glasses do not fit our more specified definition of wearables, they serve as a good example of how humans have modified materials and adapted their bodies to gain new functionality. If we are cold, we dress in clothing to keep us warm, if we break a leg, we use crutches to get around, or even if an organ fails, we can implant a device that replicates their functionality. Humans have a long tradition of developing technology to extend the functionality of the human body. With the development of technology for the army, health care, and professional sport, wearables have a long tradition. But in recent years, more and more devices have been developed for the consumer market. Today, we have smart watches, smart glasses, and different types of smart clothing. Here, we will carry on this ancient tradition and develop some wearable projects for you to learn about electronics and programming. Some of these projects are just for fun and some have a specific application. If you are already familiar with Arduino, you can pick any project and get started. Installing and using software The projects will be based on different boards made by the company Adafruit. Later in this article, we will take a look at one of these boards, called the FLORA, and explain the different parts. These boards come with a modified version of the Arduino IDE, which we will be using in the article. The Adafruit IDE looks exactly the same as the Arduino IDE. The FLORA board, for example, is based on the same microprocessor as the Arduino Leonardo board and can be used with the standard Arduino IDE but programmed using the Leonardo board option. With the use of the Adafruit IDE the FLORA board is properly named. The Adafruit version of the IDE comes preloaded with the necessary libraries for programming these boards, so there is no need to install them separately. For downloading and instructions on installing the IDE, head over to the Adafruit website and follow the steps on the website: https://learn.adafruit.com/getting-started-with-flora/download-software Make sure to download the software corresponding to your operating system. The process for installing the software depends on your operating system. These instructions may change over time and may be different for different versions of the operating system. The installation is a very straightforward process if you are working with OS X. On Windows, you will need to install some additional USB drivers. The process for installing on Linux depends on which distribution you are using. For the latest instructions, take a look at the Arduino website for the different operating systems. The Arduino IDE On the following website, you can find the original Arduino IDE if you need it in the future. Here, you will be fine sticking with the Adafruit version of the IDE, since the most common original Arduino boards are also supported. The following is the link for downloading the Arduino software: https://www.arduino.cc/en/Main/Software. First look at the IDE The IDE is where we will be doing all of our programming. The first time you open up the IDE, it should look like Figure 1.1: Figure 1.1: The Arduino IDE The main white area of the IDE is blank when you open a new sketch, and this is the area of the IDE where we will write our code later on. First, we need to get familiar with the functionality of the IDE. At the top left of the IDE, you will find five buttons. The first one, which looks like a check sign, is the compile button. When you press this button, the IDE will try to compile the code in your sketch, and if it succeeds, you will get a message in the black window at the bottom of you IDE that should look similar to this: Figure 1.2: The compile message window When writing code in an IDE, we will be using what is known as a third-level programming language. The problem with microprocessors on Arduino boards is that they are very hard to communicate with using their native language, and this is why third-level languages have been developed with human readable commands. The code you will see later needs to be translated into code that the Arduino board understands, and this is what is done when we compile the code. The compile button also makes a logical check of your code so that it does not contain any errors. If you have any errors, the text in the black box in the IDE will appear in red, indicating the line of code that is wrong by highlighting it in yellow. Don't worry about errors. They are usually misspelling errors and they happen a lot even to the most experienced programmers. One of the error messages can be seen in the following screenshot: Figure 1.3: Error message in the compile window Adjacent to the compile button, you will find the Upload button. Once this button is pressed, it does the same thing as the compile button, and if your sketch is free from errors, it will send the code from your computer to the board: Figure 1.4: The quick buttons The next three buttons are quick buttons for opening a new sketch, opening an old sketch, or saving your sketch. Make sure to save your sketches once in a while when working on them. If something happens and the IDE closes unexpectedly, it does not autosave, so manually saving once in a while is always a good idea. At the far right of the IDE you will find a button that looks like a magnifying glass. This is used to open the Serial monitor. This button will open up a new window that lets you see the communication form from, and to, the computer and the board At the top of the screen you will find a classic application menu, which may look a bit different depending on your operating system, but will follow the same structure. Under File, you will find the menu for opening your previous sketches and different example sketches that come with the IDE, as shown in Figure 1.5. Under Edit, you will find different options and quick commands for editing your code. In Sketch, you can find the same functions as in the buttons in the IDE window: Figure 1.5: The File menu Under Tools, you will find two menus that are very important to keep track of when uploading sketches to your board. Navigate to Tools | Board and you will find many different types of Arduino boards. In this menu, you will need to select the type of board you are working with. Under Tools | Serial port, you will need to select the USB port which you have connected to your board. Depending on your operating system, the port will be named differently. In Windows, they are named COM*. On OS X, they are named /dev/tty.****: Figure 1.6: The Tools menu Since there may be other things inside your computer also connected to a port, these will also show up in the list. The easiest way to figure out which port is connected to your board is to: Plug you board in to your computer using a USB cable. Then check the Serial port list and remember which port is occupied. Unplug the board and check the list again. The board missing in the list is the port where your board is connected. Plug your board back in and select it in the list. All Arduino boards connected to you computer will be given a new number. In most cases, when your sketch will not upload to you board, you have either selected the wrong board type or serial port in the tools menu. Getting to know you board As mentioned earlier, we will not be using the standard Uno Arduino boards, which is the board most people think of when they hear Arduino board. Most Arduino variations and clones use the same microprocessors as the standard Arduino boards, and it is the microprocessors that are the heart of the board. As long as they use the same microprocessors, they can be programmed as normal by selecting the corresponding standard Arduino board in the Tools menu. In our case, we will be using a modified version of the Arduino IDE, which features the types of boards we will be using. What sets other boards apart from the standard Uno Arduino boards is usually the form factor of the board and pin layout. We will be using a board called the FLORA. This board was created with wearables in mind. The FLORA is based on the same chip used in the Arduino Leonardo board, but uses a much smaller form factor and has been made round to ease the use in a wearable context. You can complete all the projects using most Arduino boards and clones, but remember that the code and construction of the project may need some modifying. The FLORA board In the following Figure 1.7 you will find the FLORA board: Figure 1.7: The FLORA board The biggest difference to normal Arduino boards besides the form factor is the number of pins available. The pins are the copper-coated areas at the edge of the FLORA. The form factor of the pins on FLORA boards is also a bit different from other Arduino boards. In this case, the pin holes and soldering pads are made bigger on FLORA boards so they can be easily sewn into garments, which is common when making wearable projects. The larger pins also make it easier to prototype with alligator clips. The pins available on the FLORA are as follows, starting from the right of the USB connector, which is located at the top of the board in the preceding Figure 1.7: The pins available on the FLORA are as follows, starting from the right of the USB connector, which is located at the top of the board in Figure 1.7: 3.3V: Regulated 3.3 volt output at a 100mA max D10: Is both a digital pin 10 and an analog pin 10 with PWM D9: Is both a digital pin 9 and an analog pin 9 with PWM GND: Ground pin D6: Is both a digital pin 6 and an analog pin 7 with PWM D12: Is both a digital pin 12 and an analog pin 11 VBATT: Raw battery voltage, can be used for as battery power output GND: Ground pin TX: Transmission communication pin or digital pin 1 RX: Receive communication pin or digital pin 0 3.3V: Regulated 3.3 volt output at a 100mA max SDA: Communication pin or digital pin 2 SCL: Clock pin or digital pin 3 with PWM As you can see, most of the pins have more than one function. The most interesting pins are the D* pins. These are the pins we will use to connect to other components. These pins can either be a digital pin or an analog pin. Digital pins operate only in 1 or 0, which mean that they can be only On or Off. You can receive information on these pins, but again, this is only in terms of on or off. The pins marked PWM have a special function, which is called Pulse Width Modulation. On these pins, we can control the output voltage level. The analog pins, however, can handle information in the range from 0 to 1023. The 3.3V pins are used to power any components connected to the board. In this case, an electronic circuit needs to be completed, and that's why there are two GND pins. In order to make an electronic circuit, power always needs to go back to where it came from. For example, if you want to power a motor, you need power from a power source connected via a cable, with another cable directing the power back to the power source, or the motor will not spin. TX, RX, SDA, and SCL are pins used for communication, dealing with more complex sensors. The VBATT pin can be used to output the same voltage as your power source, which you connect to the connector located at the bottom of the FLORA board shown in Figure 1.7. Other boards In Figure 1.8 you will find the other board types we will be using: Figure 1.8: The Gemma, Trinket and Trinket pro board In Figure 1.8, the first one from the left is the Gemma board. In the middle, you will find the Trinket board, and to the right, you have the Trinket pro board. Both the Gemma and Trinket board are based on the ATtiny85 microprocessor, which is a much smaller and cheaper processor, but comes with limitations. These boards only have three programmable pins, but what they lack in functionality, the make up for in size. The difference between the Gemma and Trinket board is the form factor, but the Trinket board also lacks a battery connector. The Trinket Pro board runs on an Atmega328 chip, which is the same chip used on the standard Arduino board to handle the USB communication. This chip has 20 programmable pins, but also lacks a battery connector. The reason for using different types of boards is that different projects require different functionalities, and in some cases, space for adding components will be limited. Don't worry though, since all of them can be programmed in the same way. Connecting and testing your board In order to make sure that you have installed your IDE correctly and to ensure your board is working, we need to connect it to your computer using a USB to USB micro cable, as show in Figure 1.9: Figure 1.9: USB to USB micro cable The small connector of the cable connects to your board, and the larger connector connects to your computer. As long as your board is connected to your computer, the USB port on the computer will power your board. Once your board is connected to the computer, open up your IDE and enter the following code. Follow the basic structure of writing sketches: First, declare your variables at the top of the sketch. The setup you make is the first segment of code that runs when the board is powered up. Then, add the loop function, which is the second segment of the code that runs, and will keep on looping until the board is powered off: int led = 7; void setup() { pinMode(led, OUTPUT); } void loop() { digitalWrite(led, HIGH); delay(1000); digitalWrite(led, LOW); delay(1000); } The first line of code declares pin number 7 as an integer and gives it the name LED. An integer is a data type, and declaring the variable using the name int allows you to store whole numbers in memory. On the FLORA board, there is a small on-board LED connected to the digital pin 7. The next part is void setup(), which is one of the functions that always needs to be in your sketch in order for it to compile. All functions use curly brackets to indicate where the function starts and ends. The { bracket is used for the start, and } the bracket is used to indicated the end of the function. In void setup(), we have declared the mode of the pin we are using. All digital pins can be used as either an input or an output. An input is used for reading the state of anything connected to it, and output is used to control anything connected to the pin. In this case, we are using pin 7, which is connected to the on-board LED. In order to control this pin we need declared it as an output. If you are using a different board, remember to change the pin number in your code. On most other Arduino boards, the onboard LED is connected to pin 13. The void loop() function is where the magic happens. This is where we put the actual commands that operate the pins on the board. In the preceding code, the first thing we do is turn the led pin HIGH by using the digitalWrite()command. The digitalWrite() function is a built-in function that takes two parameters. The first is the number of the pin, in this case, we put in the variable led that has the value 7. The second parameter is the state of the pin, and we can use the HIGH or LOW shortcuts to turn the pin on or off, respectively. Then, we make a pause in the program using the delay() command. The delay command takes one parameter, which is the number of milliseconds you want to pause your program for. After this, we use the same command as before to control the state of the pin, but this time we turn it LOW, which is the same as turning the pin off. Then we wait for an additional 1000 milliseconds. Once the sketch reaches the end of the loop function, the sketch will start over from the start of the same function and keep on looping until a new sketch is uploaded. The reset button is pressed on the FLORA board, or until the power is disconnected. Now that we have the sketch ready, you can press the upload button. If everything goes as planned, the on-board LED should start to blink with a 1 second delay. The sketch you have uploaded will stay on the board even if the board is powered off, until you upload a new sketch that overwrites the old one. If you run into problems with uploading the code, remember to perform the following steps: Check your code for errors or misspelling Check your connections and USB cable Make sure you have the right board type selected Make sure your have the right USB port selected Summary In this article, we have had a look at the different parts of the FLORA board and how to install the IDE. We also made some small sketches to work with the on-board LED. We made our first electronic circuit using an external LED. Resources for Article: Further resources on this subject: Dealing with Interrupts [article] Programmable DC Motor Controller with an LCD [article] Prototyping Arduino Projects using Python [article]

0
0
11026

How-To Tutorials

Packt

04 Sep 2015

12 min read

Introduction to Odoo

Packt

04 Sep 2015

12 min read

In this article by Greg Moss, author of Working with Odoo, he explains that Odoo is a very feature-filled business application framework with literally hundreds of applications and modules available. We have done our best to cover the most essential features of the Odoo applications that you are most likely to use in your business. Setting up an Odoo system is no easy task. Many companies get into trouble believing that they can just install the software and throw in some data. Inevitably, the scope of the project grows and what was supposed to be a simple system ends up being a confusing mess. Fortunately, Odoo's modular design will allow you to take a systematic approach to implementing Odoo for your business. (For more resources related to this topic, see here.) What is an ERP system? An Enterprise Resource Planning (ERP) system is essentially a suite of business applications that are integrated together to assist a company in collecting, managing, and reporting information throughout core business processes. These business applications, typically called modules, can often be independently installed and configured based on the specific needs of the business. As the needs of the business change and grow, additional modules can be incorporated into an existing ERP system to better handle the new business requirements. This modular design of most ERP systems gives companies great flexibility in how they implement the system. In the past, ERP systems were primarily utilized in manufacturing operations. Over the years, the scope of ERP systems have grown to encompass a wide range of business-related functions. Recently, ERP systems have started to include more sophisticated communication and social networking features. Common ERP modules The core applications of an ERP system typically include: Sales Orders Purchase Orders Accounting and Finance Manufacturing Resource Planning (MRP) Customer Relationship Management (CRM) Human Resources (HR) Let's take a brief look at each of these modules and how they address specific business needs. Selling products to your customer Sales Orders, commonly abbreviated as SO, are documents that a business generates when they sell products and services to a customer. In an ERP system, the Sales Order module will usually allow management of customers and products to optimize efficiency for data entry of the sales order. Many sales orders begin as customer quotes. Quotes allow a salesperson to collect order information that may change as the customer makes decisions on what they want in their final order. Once a customer has decided exactly what they wish to purchase, the quote is turned into a sales order and is confirmed for processing. Depending on the requirements of the business, there are a variety of methods to determine when a customer is invoiced or billed for the order. This preceding screenshot shows a sample sales order in Odoo. Purchasing products from suppliers Purchase Orders, often known as PO, are documents that a business generates when they purchase products from a vendor. The Purchase Order module in an ERP system will typically include management of vendors (also called suppliers) as well as management of the products that the vendor carries. Much like sales order quotes, a purchase order system will allow a purchasing department to create draft purchase orders before they are finalized into a specific purchasing request. Often, a business will configure the Sales Order and Purchase Order modules to work together to streamline business operations. When a valid sales order is entered, most ERP systems will allow you to configure the system so that a purchase order can be automatically generated if the required products are not in stock to fulfill the sales order. ERP systems will allow you to set minimum quantities on-hand or order limits that will automatically generate purchase orders when inventory falls below a predetermined level. When properly configured, a purchase order system can save a significant amount of time in purchasing operations and assist in preventing supply shortages. Managing your accounts and financing in Odoo Accounting and finance modules integrate with an ERP system to organize and report business transactions. In many ERP systems, the accounting and finance module is known as GL for General Ledger. All accounting and finance modules are built around a structure known as the chart of accounts. The chart of accounts organizes groups of transactions into categories such as assets, liabilities, income, and expenses. ERP systems provide a lot of flexibility in defining the structure of your chart of accounts to meet the specific requirements for your business. Accounting transactions are grouped by date into periods (typically by month) for reporting purposes. These reports are most often known as financial statements. Common financial statements include balance sheets, income statements, cash flow statements, and statements of owner's equity. Handling your manufacturing operations The Manufacturing Resource Planning (MRP) module manages all the various business operations that go into the manufacturing of products. The fundamental transaction of an MRP module is a manufacturing order, which is also known as a production order in some ERP systems. A manufacturing order describes the raw products or subcomponents, steps, and routings required to produce a finished product. The raw products or subcomponents required to produce the finished product are typically broken down into a detailed list called a bill of materials or BOM. A BOM describes the exact quantities required of each component and are often used to define the raw material costs that go into manufacturing the final products for a company. Often an MRP module will incorporate several submodules that are necessary to define all the required operations. Warehouse management is used to define locations and sublocations to store materials and products as they move through the various manufacturing operations. For example, you may receive raw materials in one warehouse location, assemble those raw materials into subcomponents and store them in another location, then ultimately manufacture the end products and store them in a final location before delivering them to the customer. Managing customer relations in Odoo In today's business environment, quality customer service is essential to being competitive in most markets. A Customer Relationship Management (CRM) module assists a business in better handling the interactions they may have with each customer. Most CRM systems also incorporate a presales component that will manage opportunities, leads, and various marketing campaigns. Typically, a CRM system is utilized the most by the sales and marketing departments within a company. For this reason, CRM systems are often considered to be sales force automation tools or SFA tools. Sales personnel can set up appointments, schedule call backs, and employ tools to manage their communication. More modern CRM systems have started to incorporate social networking features to assist sales personnel in utilizing these newly emerging technologies. Configuring human resource applications in Odoo Human Resource modules, commonly known as HR, manage the workforce- or employee-related information in a business. Some of the processes ordinarily covered by HR systems are payroll, time and attendance, benefits administration, recruitment, and knowledge management. Increased regulations and complexities in payroll and benefits have led to HR modules becoming a major component of most ERP systems. Modern HR modules typically include employee kiosk functions to allow employees to self-administer many tasks such as putting in a leave request or checking on their available vacation time. Finding additional modules for your business requirements In addition to core ERP modules, Odoo has many more official and community-developed modules available. At the time of this article's publication, the Odoo application repository had 1,348 modules listed for version 7! Many of these modules provide small enhancements to improve usability like adding payment type to a sales order. Other modules offer e-commerce integration or complete application solutions, such as managing a school or hospital. Here is a short list of the more common modules you may wish to include in an Odoo installation: Point of Sale Project Management Analytic Accounting Document Management System Outlook Plug-in Country-Specific Accounting Templates OpenOffice Report Designer You will be introduced to various Odoo modules that extend the functionality of the base Odoo system. You can find a complete list of Odoo modules at http://apps.Odoo.com/. This preceding screenshot shows the module selection page in Odoo. Getting quickly into Odoo Do you want to jump in right now and get a look at Odoo 7 without any complex installations? Well, you are lucky! You can access an online installation of Odoo, where you can get a peek at many of the core modules right from your web browser. The installation is shared publicly, so you will not want to use this for any sensitive information. It is ideal, however, to get a quick overview of the software and to get an idea for how the interface functions. You can access a trial version of Odoo at https://www.Odoo.com/start. Odoo – an open source ERP solution Odoo is a collection of business applications that are available under an open source license. For this reason, Odoo can be used without paying license fees and can be customized to suit the specific needs of a business. There are many advantages to open source software solutions. We will discuss some of these advantages shortly. Free your company from expensive software license fees One of the primary downsides of most ERP systems is they often involve expensive license fees. Increasingly, companies must pay these license fees on an annual basis just to receive bug fixes and product updates. Because ERP systems can require companies to devote great amounts of time and money for setup, data conversion, integration, and training, it can be very expensive, often prohibitively so, to change ERP systems. For this reason, many companies feel trapped as their current ERP vendors increase license fees. Choosing open source software solutions, frees a company from the real possibility that a vendor will increase license fees in the years ahead. Modify the software to meet your business needs With proprietary ERP solutions, you are often forced to accept the software solution the vendor provides chiefly "as is". While you may have customization options and can sometimes pay the company to make specific changes, you rarely have the freedom to make changes directly to the source code yourself. The advantages to having the source code available to enterprise companies can be very significant. In a highly competitive market, being able to develop solutions that improve business processes and give your company the flexibility to meet future demands can make all the difference. Collaborative development Open source software does not rely on a group of developers who work secretly to write proprietary code. Instead, developers from all around the world work together transparently to develop modules, prepare bug fixes, and increase software usability. In the case of Odoo, the entire source code is available on Launchpad.net. Here, developers submit their code changes through a structure called branches. Changes can be peer reviewed, and once the changes are approved, they are incorporated into the final source code product. Odoo – AGPL open source license The term open source covers a wide range of open source licenses that have their own specific rights and limitations. Odoo and all of its modules are released under the Affero General Public License (AGPL) version 3. One key feature of this license is that any custom-developed module running under Odoo must be released with the source code. This stipulation protects the Odoo community as a whole from developers who may have a desire to hide their code from everyone else. This may have changed or has been appended recently with an alternative license. You can find the full AGPL license at http://www.gnu.org/licenses/agpl-3.0.html. A real-world case study using Odoo The goal is to do more than just walk through the various screens and reports of Odoo. Instead, we want to give you a solid understanding of how you would implement Odoo to solve real-world business problems. For this reason, this article will present a real-life case study in which Odoo was actually utilized to improve specific business processes. Silkworm, Inc. – a mid-sized screen printing company Silkworm, Inc. is a highly respected mid-sized silkscreen printer in the Midwest that manufactures and sells a variety of custom apparel products. They have been kind enough to allow us to include some basic aspects of their business processes as a set of real-world examples implementing Odoo into a manufacturing operation. Using Odoo, we will set up the company records (or system) from scratch and begin by walking through their most basic sales order process, selling T-shirts. From there, we will move on to manufacturing operations, where custom art designs are developed and then screen printed onto raw materials for shipment to customers. We will come back to this real-world example so that you can see how Odoo can be used to solve real-world business solutions. Although Silkworm is actively implementing Odoo, Silkworm, Inc. does not directly endorse or recommend Odoo for any specific business solution. Every company must do their own research to determine whether Odoo is a good fit for their operation. Summary In this article, we have learned about the ERP system and common ERP modules. An introduction about Odoo and features of it. Resources for Article: Further resources on this subject: Getting Started with Odoo Development[article] Machine Learning in IPython with scikit-learn [article] Making Goods with Manufacturing Resource Planning [article]

0
0
9467

article-image-installing-red-hat-cloudforms-red-hat-openstack

Packt

04 Sep 2015

8 min read

Installing Red Hat CloudForms on Red Hat OpenStack

Packt

04 Sep 2015

8 min read

0
0
14012

Packt

04 Sep 2015

7 min read

Introduction to WatchKit

Packt

04 Sep 2015

7 min read

0
0
9475

Packt

03 Sep 2015

22 min read

Testing Exceptional Flow

Packt

03 Sep 2015

22 min read

0
0
2661

How-To Tutorials

Packt

03 Sep 2015

3 min read

Creating a Web Server

Packt

03 Sep 2015

3 min read

In this article by Marco Schwartz, author of the book Intel Galileo Networking Cookbook, we will cover the recipe, Reading pins via a web server. (For more resources related to this topic, see here.) Reading pins via a web server We are now going to see how to use a web server for useful things. For example, we will see here how to use a web server to read the pins of the Galileo board, and then how to display these readings on a web page. Getting ready For this chapter, you won't need to do much with your Galileo board, as we just want to see if we can read the state of a pin from a web server. I simply connected pin number 7 of the Galileo board to the VCC pin, as shown in this picture: How to do it... We are now going to see how to read the state from pin number 7, and display this state on a web page. This is the complete code: // Required modules var m = require("mraa"); var util = require('util'); var express = require('express'); var app = express(); // Set input on pin 7 var myDigitalPin = new m.Gpio(7); myDigitalPin.dir(m.DIR_IN); // Routes app.get('/read', function (req, res) { var myDigitalValue = myDigitalPin.read(); res.send("Digital pin 7 value is: " + myDigitalValue); }); // Start server var server = app.listen(3000, function () { console.log("Express app started!"); }); You can now simply copy this code and paste it inside a blank Node.js project. Also make sure that the package.json file includes the Express module. Then, as usual, upload, build, and run the application using Intel XDK. You should see the confirmation message inside the XDK console. Then, use a browser to access your board on port 3000, at the /read route. You should see the following message, which is the reading from pin number 7: If you can see this, congratulations, you can now read the state of the pins on your board, and display this on your web server! How it works... In this recipe, we combined two things that we saw in previous recipes. We again used the mraa module to read from pins, here from pin number 7 of the board. You can find out more about the mraa module at: https://github.com/intel-iot-devkit/mraa Then, we combined this with a web server using the Express framework, and we defined a new route called /read that reads the state of the pin, and sends it back so that it can be displayed inside a web browser, with this code: app.get('/read', function (req, res) { var myDigitalValue = myDigitalPin.read(); res.send("Digital pin 7 value is: " + myDigitalValue); }); See also You can now check the next recipe to see how to control a pin from the Node.js server running on the Galileo board. Summary In this recipe, we saw how to read the state from pin number 7, and display this state on a web page. If you liked this article please buy the book Intel Galileo Networking Cookbook, Packt Publishing, to learn over 45 Galileo recipes. Resources for Article: Further resources on this subject: Arduino Development[article] Integrating with Muzzley[article] Getting Started with Intel Galileo [article]

0
0
11060

Packt

03 Sep 2015

10 min read

Learning RSLogix 5000 – Buffering I/O Module Input and Output Values

Packt

03 Sep 2015

10 min read

In the following article by Austin Scott, the author of Learning RSLogix 5000 Programming, you will be introduced to the high performance, asynchronous nature of the Logix family of controllers and the requirement for the buffering I/O module data it drives. You will learn various techniques for the buffering I/O module values in RSLogix 5000 and Studio 5000 Logix Designer. You will also learn about the IEC Languages that do not require the input or output module buffering techniques to be applied to them. In order to understand the need for buffering, let's start by exploring the evolution of the modern line of Rockwell Automation Controllers. (For more resources related to this topic, see here.) ControlLogix controllers The ControlLogix controller was first launched in 1997 as a replacement for Allen Bradley's previous large-scale control platform. The PLC-5. ControlLogix represented a significant technological step forward, which included a 32-bit ARM-6 RISC-core microprocessor and the ABrisc Boolean processor combined with a bus interface on the same silicon chip. At launch, the Series 5 ControlLogix controllers (also referred to as L5 and ControlLogix 5550, which has now been superseded by the L6 and L7 series controllers) were able to execute code three times faster than PLC-5. The following is an illustration of the original ControlLogix L5 Controller: ControlLogix Logix L5 Controller The L5 controller is considered to be a PAC (Programmable Automation Controller) rather than a traditional PLC (Programmable Logic Controller), due to its modern design, power, and capabilities beyond a traditional PLC (such as motion control, advanced networking, batching and sequential control). ControlLogix represented a significant technological step forward for Rockwell Automation, but this new technology also presented new challenges for automation professionals. ControlLogix was built using a modern asynchronous operating model rather than the more traditional synchronous model used by all the previous generations of controllers. The asynchronous operating model requires a different approach to real-time software development in RSLogix 5000 (now known in version 20 and higher as Studio 5000 Logix Designer). Logix operating cycle The entire Logix family of controllers (ControlLogix and CompactLogix) have diverged from the traditional synchronous PLC scan architecture in favor of a more efficient asynchronous operation. Like most modern computer systems, asynchronous operation allows the Logix controller to handle multiple Tasks at the same time by slicing the processing time between each task. The continuous update of information in an asynchronous processor creates some programming challenges, which we will explore in this article. The following diagram illustrates the difference between synchronous and asynchronous operation. Synchronous versus Asynchronous Processor Operation Addressing module I/O data Individual channels on a module can be referenced in your Logix Designer / RSLogix 5000 programs using it's address. An address gives the controller directions to where it can find a particular piece of information about a channel on one of your modules. The following diagram illustrates the components of an address in RsLogix 5000 or Studio 5000 Logix Designer: The components of an I/O Module Address in Logix Module I/O tags can be viewed using the Controller Tags window, as the following screen shot illustrates. I/O Module Tags in Studio 5000 Logix Designer Controller Tags Window Using the module I/O tags, input and output module data can be directly accessed anywhere within a logic routine. However, it is recommended that we buffer module I/O data before we evaluate it in Logic. Otherwise, due to the asynchronous tag value updates in our I/O modules, the state of our process values could change part way through logic execution, thus creating unpredictable results. In the next section, we will introduce the concept of module I/O data buffering. Buffering module I/O data In the olden days of PLC5s and SLC500s, before we had access to high-performance asynchronous controllers like the ControlLogix, SoftLogix and CompactLogix families, program execution was sequential (synchronous) and very predictable. In asynchronous controllers, there are many activities happening at the same time. Input and output values can change in the middle of a program scan and put the program in an unpredictable state. Imagine a program starting a pump in one line of code and closing a valve directly in front of that pump in the next line of code, because it detected a change in process conditions. In order to address this issue, we use a technique call buffering and, depending on the version of Logix you are developing on, there are a few different methods of achieving this. Buffering is a technique where the program code does not directly access the real input or output tags on the modules during the execution of a program. Instead, the input and output module tags are copied at the beginning of a programs scan to a set of base tags that will not change state during the program's execution. Think of buffering as taking a snapshot of the process conditions and making decisions on those static values rather than live values that are fluctuating every millisecond. Today, there is a rule in most automation companies that require programmers to write code that "Buffers I/O" data to base tags that will not change during a programs execution. The two widely accepted methods of buffering are: Buffering to base tags Program parameter buffering (only available in the Logix version 24 and higher) Do not underestimate the importance of buffering a program's I/O. I worked on an expansion project for a process control system where the original programmers had failed to implement buffering. Once a month, the process would land in a strange state, which the program could not recover from. The operators had attributed these problem to "Gremlins" for years, until I identified and corrected the issue. Buffering to base tags Logic can be organized into manageable pieces and executed based on different intervals and conditions. The buffering to base tags practice takes advantage of Logix's ability to organize code into routines. The default ladder logic routine that is created in every new Logix project is called MainRoutine. The recommended best practice for buffering tags in ladder logic is to create three routines: One for reading input values and buffering them One for executing logic One for writing the output values from the buffered values The following ladder logic excerpt is from MainRoutine of a program that implements Input and Output Buffering: MainRoutine Ladder Logic Routine with Input and Output Buffering Subroutine Calls The following ladder logic is taken from the BufferInputs routine and demonstrates the buffering of digital input module tag values to Local tags prior to executing our PumpControl routine: Ladder Logic Routine with Input Module Buffering After our input module values have been buffered to Local tags, we can execute our processlogic in our PumpControl routine without having to worry about our values changing in the middle of the routine's execution. The following ladder logic code determines whether all the conditions are met to run a pump: Pump Control Ladder Logic Routine Finally, after all of our Process Logic has finished executing, we can write the resulting values to our digital output modules. The following ladder logic BufferOutputs, routine copies the resulting RunPump value to the digital output module tag. Ladder Logic Routine with Output Module Buffering We have now buffered our module inputs and module outputs in order to ensure they do not change in the middle of a program execution and potentially put our process into an undesired state. Buffering Structured Text I/O module values Just like ladder logic, Structured Text I/O module values should be buffered at the beginning of a routine or prior to executing a routine in order to prevent the values from changing mid-execution and putting the process into a state you could not have predicted. Following is an example of the ladder logic buffering routines written in Structured Text (ST)and using the non-retentive assignment operator: (* I/O Buffering in Structured Text Input Buffering *) StartPump [:=] Local:2:I.Data[0].0; HighPressure [:=] Local:2:I.Data[0].1; PumpStartManualOverride [:=] Local:2:I.Data[0].2; (* I/O Buffering in Structured Text Output Buffering *) Local:3:O.Data[0].0 [:=] RunPump; Function Block Diagram (FBD) and Sequential Function Chart (SFC) I/O module buffering Within Rockwell Automation's Logix platform, all of the supported IEC languages (ladder logic, structured text, function block, and sequential function chart) will compile down to the same controller bytecode language. The available functions and development interface in the various Logix programming languages are vastly different. Function Block Diagrams (FBD) and Sequential Function Charts (SFC) will always automatically buffer input values prior to executing Logic. Once a Function Block Diagram or a Sequential Function Chart has completed the execution of their logic, they will write all Output Module values at the same time. There is no need to perform Buffering on FBD or SFC routines, as it is automatically handled. Buffering using program parameters A program parameter is a powerful new feature in Logix that allows the association of dynamic values to tags and programs as parameters. The importance of program parameters is clear by the way they permeate the user interface in newer versions of Logix Designer (version 24 and higher). Program parameters are extremely powerful, but the key benefit to us for using them is that they are automatically buffered. This means that we could have effectively created the same result in one ladder logic rung rather than the eight we created in the previous exercise. There are four types of program parameters: Input: This program parameter is automatically buffered and passed into a program on each scan cycle. Output: This program parameter is automatically updated at the end of a program (as a result of executing that program) on each scan cycle. It is similar to the way we buffered our output module value in the previous exercise. InOut: This program parameter is updated at the start of the program scan and the end of the program scan. It is also important to note that, unlike the input and output parameters; the InOut parameter is passed as a pointer in memory. A pointer shares a piece of memory with other processes rather than creating a copy of it, which means that it is possible for an InOut parameter to change its value in the middle of a program scan. This makes InOut program parameters unsuitable for buffering when used on their own. Public: This program parameterbehaves like a normal controller tag and can be connected to input, output, and InOut parameters. it is similar to the InOut parameter, public parameters that are updated globally as their values are changed. This makes program parameters unsuitable for buffering, when used on their own. Primarily public program parameters are used for passing large data structures between programs on a controller. In Logix Designer version 24 and higher, a program parameter can be associated with a local tag using the Parameters and Local Tags in the Control Organizer (formally called "Program Tags"). The module input channel can be associated with a base tag within your program scope using the Parameter Connections. Add the module input value as a parameter connection. The previous screenshot demonstrates how we would associate the input module channel with our StartPump base tag using the Parameter Connection value. Summary In this article, we explored the asynchronous nature of the Logix family of controllers. We learned the importance of buffering input module and output module values for ladder logic routines and structured text routines. We also learned that, due to the way Function Block Diagrams (FBD/FB) and Sequential Function Chart (SFC) Routines execute, there is no need to buffer input module or output module tag values. Finally, we introduced the concept of buffering tags using program parameters in version 24 and high of Studio 5000 Logix Designer. Resources for Article: Further resources on this subject: vROps – Introduction and Architecture[article] Ensuring Five-star Rating in the MarketPlace[article] Building Ladder Diagram programs (Simple) [article]

0
0
11961

Packt

03 Sep 2015

22 min read

ArcGIS – Advanced ArcObjects

Packt

03 Sep 2015

22 min read

0
0
2490

Packt

03 Sep 2015

31 min read

Understanding TDD

Packt

03 Sep 2015

31 min read

0
0
2942

How-To Tutorials

article-image-walking-you-through-classes

Packt

02 Sep 2015

15 min read

Walking You Through Classes

Packt

02 Sep 2015

15 min read

0
0
2109

Packt

02 Sep 2015

22 min read

From Code to the Real World

Packt

02 Sep 2015

22 min read

In this article by Jorge R. Castro, author of Building a Home Security System with Arduino, you will learn to read a technical specification sheet, the ports on the Uno and how they work, the necessary elements for a proof of concept. Finally, we will extend the capabilities of our script in Python and rely on NFC technology. We will cover the following points: ProtoBoards and wiring Signals (digital and analog) Ports Datasheets (For more resources related to this topic, see here.) ProtoBoards and wiring There are many ways of working on and testing electrical components, one of the ways developers use is using ProtoBoards (also known as breadboards). Breadboards help eliminate the need to use soldering while testing out components and prototype circuit schematics, it lets us reuse components, enables rapid development, and allows us to correct mistakes and even improve the circuit design. It is a rectangle case composed of internally interconnected rows and columns; it is possible to couple several together. The holes on the face of the board are designed to in way that you can insert the pins of components to mount it. ProtoBoards may be the intermediate step before making our final PCB design, helping eliminate errors and reduce the associated cost. For more information visit http://en.wikipedia.org/wiki/Breadboard. We know have to know more about the wrong wiring techniques when using a breadboard. There are rules that have always been respected, and if not followed, it might lead to shorting out elements that can irreversibly damage our circuit. A ProtoBoard and an Arduino Uno The ProtoBoard is shown in the preceding image; we can see there are mainly two areas - the vertical columns (lines that begin with the + and - polarity sign), and the central rows (with a coordinate system of letters and numbers). Both + and - usually connect to the electrical supply of our circuit and to the ground. It is advised not to connect components directly in the vertical columns, instead to use a wire to connect to the central rows of the breadboard. The central row number corresponds to a unique track. If we connected one of the leads of a resistor in the first pin of the first row then other lead should be connected to any other pin in another row, and not in the same row. There is an imaginary axis that divides the breadboard vertically in symmetrical halves, which implies that they are electrically independent. We will use this feature to use certain Integrated Circuits (IC), and they will be mounted astride the division. Protoboard image obtained using Fritzing. Carefully note that in the preceding picture, the top half of the breadboard shows the components mounted incorrectly, while the lower half shows the components mounted the correct way. A chip is set correctly between the two areas, this usage is common for IC's. Fritzing is a tool that helps us create a quick circuit design of our project. Visit http://fritzing.org/download/ for more details on Fritzing. Analog and digital ports Now that we know about the correct usage of a breadboard, we now have another very important concept - ports. Our board will be composed of various inputs and outputs, the number of which varies with the kind of model we have. But what are ports? The Arduino UNO board has ports on its sides. They are connections that allow it to interact with sensors, actuators, and other devices (even with another Arduino board). The board also has ports that support digital and analog signals. The advantage is that these ports are bidirectional, and the programmers are able to define the behavior of these ports. In the following code, the first part shows that we set the status of the ports that we are going to use. This is the setup function: //CODE void setup(){ // Will run once, and use it to // "prepare" I/0 pins of Arduino pinMode(10, OUTPUT); // put the pin 10 as an output pinMode(11, INPUT); // put the pin 11 as an input } Lets take a look at what the digital and analog ports are on the Arduino UNO. Analog ports Analog signals are signals that continuously vary with time These can be explained as voltages that have continuously varying intermediate values. For example, it may vary from 2V to 3.3V to 3.0V, or 3.333V, which means that the voltages varied progressively with time, and are all different values from each other; the figures between the two values are infinite (theoretical value). This is an interesting property, and that is what we want. For example, if we want to measure the temperature of a room, the temperature measure takes values with decimals, need an analog signal. Otherwise, we will lose data, for example, in decimal values since we are not able to store infinite decimal numbers we perform a mathematical rounding (truncating the number). There is a process called discretization, and it is used to convert analog signals to digital. In reality, in the world of microcontrollers, the difference between two values is not infinite. The Arduino UNO's ports have a range of values between 0 and 1023 to describe an analog input signal. Certain ports, marked as PWM or by a ~, can create output signals that vary between values 0 and 255. Digital ports A digital signal consists of just two values - 0s and 1s. Many electronic devices internally have a range currently established, where voltage values from 3.5 to 5V are considered as a logic 1, and voltages from 0 to 2.5V are considered as a logic 0. To better understand this point, let's see an example of a button that triggers an alarm. The only useful cases for an alarm are when the button is pressed to ring the alarm and when it's not pressed; there are only two states observed. So unlike the case of the temperature sensor, where we can have a range of linear values, here only two values exist. Logically, we use these properties for different purposes. The Arduino IDE has some very illustrative examples. You can load them onto your board to study the concepts of analog and digital signals by navigating to File | Examples | Basic | Blink. Sensors There is a wide range of sensors, and without pretending to be an accurate guide of all possible sensors, we will try to establish some small differences. All physical properties or sets of them has an associated sensor, so we can measure the force (presence of people walking on the floor), movement (physical displacement even in the absence of light), smoke, gas, pictures (yes, the cameras also are sensors) noise (sound recording), antennas (radio waves, WiFi, and NFC), and an incredible list that would need a whole book to explain all its fantastic properties. It is also interesting to note that the sensors can also be specific and concrete; however, they only measure very accurate or nonspecific properties, being able to perceive a set of values but more accurately. If you go to an electronics store or a sales portal buy a humidity sensor (and generally any other electronic item), you will see a sensitive price range. In general, expensive sensors indicate that they are more reliable than their cheaper counterparts, the price also indicates the kinds of conditions that it can work in, and also the duration of operation. Expensive sensors can last more than the cheaper variants. When we started looking for a component, we looked to the datasheets of a particular component (the next point will explain what this document is), you will not be surprised to find two very similar models mentioned in the datasheets. It contains operating characteristics and different prices. Some components are professionally deployed (for instance in the technological and military sectors), while others are designed for general consumer electronics. Here, we will focus on those with average quality and foremost an economic price. We proceed with an example that will serve as a liaison with the following paragraph. To do this, we will use a temperature sensor (TMP-36) and an analog port on the Arduino UNO. The following image shows the circuit schematic: Our design - designed using Fritzing The following is the code for the preceding circuit schematic: //CODE //########################### //Author : Jorge Reyes Castro //Arduino Home Security Book //########################### // A3 <- Input // Sensor TMP36 //########################### //Global Variable int outPut = 0; float outPutToVol = 0.0; float volToTemp = 0.0; void setup(){ Serial.begin(9600); // Start the SerialPort 9600 bauds } void loop(){ outPut = analogRead(A3); // Take the value from the A3 incoming port outPutToVol = 5.0 *(outPut/1024.0); // This is calculated with the values volToTemp = 100.0 *(outPutToVol - 0.5); // obtained from the datasheet Serial.print("_____________________n"); mprint("Output of the sensor ->", outPut); mprint("Conversion to voltage ->", outPutToVol); mprint("Conversion to temperature ->", volToTemp); delay(1000); // Wait 1 Sec. } // Create our print function // smarter than repeat the code and it will be reusable void mprint(char* text, double value){ // receives a pointer to the text and a numeric value of type double Serial.print(text); Serial.print("t"); // tabulation Serial.print(value); Serial.print("n"); // new line } Now, open a serial port from the IDE or just execute the previous Python script. We can see the output from the Arduino. Enter the following command to run the code as python script: $ python SerialPython.py The output will look like this: ____________________ Output of the sensor 145.00 Conversion to voltage 0.71 Conversion to temperature 20.80 _____________________ By using Python script, which is not simple, we managed to extract some data from a sensor after the signal is processed by the Arduino UNO. It is then extracted and read by the serial interface (script in Python) from the Arduino UNO. At this point, we will be able to do what we want with the retrieved data, we can store them in a database, represent them in a function, create a HTML document, and more. As you may have noticed we make a mathematical calculation. However, from where do we get this data? The answer is in the data sheet. Component datasheets Whenever we need to work with or handle an electrical component, we have to study their properties. For this, there are official documents, of variable length, in which the manufacturer carefully describes the characteristics of that component. First of all, we, have to identify that a component can either use the unique ID or model name that it was given when it was manufactured. TMP 36GZ We have very carefully studied the component surface and thus extracted the model. Then, using an Internet browser, we can quickly find the datasheet. Go to http://www.google.com and search for: TMP36GZ + datasheet, or visit http://dlnmh9ip6v2uc.cloudfront.net/datasheets/Sensors/Temp/TMP35_36_37.pdf to get the datasheet. Once you have the datasheet, you can see that it has various components which look similar, but with very different presentations (this can be crucial for a project, as an error in the choice of encapsulated (coating/presentation) can lead to a bitter surprise). Just as an example, if we confuse the presentation of our circuit, we might end up buying components used in cellphones, and those can hardly be soldered by a standard user as they smaller than your pinkie fingernail). Therefore, an important property to which you must pay attention is the coating/presentation of the components. In the initial pages of the datasheet you see that it shows you the physical aspects of the component. It then show you the technical characteristics, such as the working current, the output voltage, and other characteristics which we must study carefully in order to know whether they will be consistent with our setup. In this case, we are working with a range of input voltage (V) of between 2.7 V to 5.5 V is perfect. Our Arduino has an output of 5V. Furthermore, the temperature range seems appropriate. We do not wish to measure anything below 0V and 5V. Let's study the optimal behavior of the components in temperature ranges in more detail. The behavior of the components is stable only between the desired ranges. This is a very important aspect because although the measurement can be done at the ends supported by the sensor, the error will be much higher and enormously increases the deviation to small variations in ambient temperature. Therefore, always choose or build a circuit with components having the best performance as shown in the optimal region of the graph of the datasheet. Also, always respect and observe that the input voltages do not exceed the specified limit, otherwise it will be irreversibly damaged. To know more about ADC visit http://en.wikipedia.org/wiki/Analog-to-digital_converter. Once we have the temperature sensor, we have to extract the data you need. Therefore, we make a simple calculation supported in the datasheet. As we can see, we feed 5V to the sensor, and it return values between 0 and 1023. So, we use a simple formula that allows us to convert the values obtained in voltage (analog value): Voltage = (Value from the sensor) x (5/1024) The value 1024 is correct, as we have said that the data can range from 0 (including zero as a value) to 1023. This data can be strange to some, but in the world of computer and electronics, zero is seen as a very important value. Therefore, be careful when making calculations. Once we obtain a value in volts, we proceed to convert this measurement in a data in degrees. For this, by using the formula from the datasheet, we can perform the conversion quickly. We use a variable that can store data with decimal point (double or float), otherwise we will be truncating the result and losing valuable information (this may seem strange, but this bug very common). The formula for conversion is Cº = (Voltage -0.5) x 100.0. We now have all the data we need and there is a better implementation of this sensor, in which we can eliminate noise and disturbance from the data. You can dive deeper into these issues if desired. With the preceding explanation, it will not be difficult to achieve. Near Field Communication Near Field Communication (NFC) technology is based on the RFID technology. Basically, the principle consists of two devices fused onto one board, one acting as an antenna to transmit signals and the other that acts as a reader/reciever. It exchanges information using electromagnetic fields. They deal with small pieces of information. It is enough for us to make this extraordinary property almost magical. For more information on RFID and NFC, visit the following links: http://en.wikipedia.org/wiki/Near_field_communication http://en.wikipedia.org/wiki/Radio-frequency_identification Today, we find this technology in identification and access cards (a pioneering example is the Document ID used in countries like Spain), tickets for public transport, credit cards, and even mobile phones (with ability to make payment via NFC). For this project, we will use a popular module PN532 Adafruit RFID/NFC board. Feel free to use any module that suits your needs. I selected this for its value, price, and above all, for its chip. The popular PN532 is largely supported by the community, and there are vast number of publications, tools, and libraries that allow us to integrate it in many projects. For instance, you can use it in conjunction with an Arduino a Raspberry Pi, or directly with your computer (via a USB to serial adapter). Also, the PN532 board comes with a free Mifare 1k card, which we can use for our project. You can also buy tag cards or even key rings that house a small circuit inside, on which information is stored, even encrypted information. One of the benefits, besides the low price of a set of labels, is that we can block access to certain areas of information or even all the information. This allows us in maintaining our valuable data safely or avoid the cloned/duplicate of this security tag. There are a number of standards that allow us to classify the different types of existing cards, one of which is the Mifare 1K card (You can use the example presented here and make small changes to adapt it to other NFC cards). For more information on MIFARE cards, visit http://en.wikipedia.org/wiki/MIFARE. As the name suggests, this model can store up to 1KB (Kilobyte) information, and is of the EEPROM type (can be rewritten). Furthermore, it possesses a serial number unique to each card (this is very interesting because it is a perfect fingerprint that will allow us to distinguish between two cards with the same information). Internally, it is divided into 16 sectors (0-15), and is subdivided into 4 blocks (0-3) with at least 16 bytes of information. For each block, we can set a password that prevents reading your content if it does not pose. (At this point, we are not going to manipulate the key because the user can accidentally, encrypt a card, leaving it unusable). By default, all cards usually come with a default password (ffffffffffff). The reader should consult the link http://www.adafruit.com/product/789, to continue with the examples, or if you select another antenna, search the internet for the same features. The link also has tutorials to make the board compatible with other devices (Raspberry Pi), the datasheet, and the library can be downloaded from Github to use this module. We will have a look at this last point more closely later; just download the libraries for now and follow my steps. It is one of the best advantages of open hardware, that the designs can be changed and improved by anyone. Remember, when handling electrical components, we have to be careful and avoid electrostatic sources. As you have already seen, the module is inserted directly on the Arduino. If soldered pins do not engage directly above then you have to use Dupont male-female connectors to be accessible other unused ports. Once we have the mounted module, we will use the library that is used to control this device (top link) and install it. It is very important that you rename the library to download, eliminate possible blanks or the like. However, the import process may throw an error. Once we have this ready and have our Mifare 1k card close, let's look at a small example that will help us to better understand all this technology and get the UID (Unique Identifier) of our tag: // ########################### // Author : Jorge Reyes Castro // Arduino Home Security Book // ########################### #include <Wire.h> #include <Adafruit_NFCShield_I2C.h> // This is the Adafruit Library //MACROS #define IRQ (2) #define RESET (3) Adafruit_NFCShield_I2C nfc(IRQ, RESET); //Prepare NFC MODULE //SETUP void setup(void){ Serial.begin(115200); //Open Serial Port Serial.print("###### Serial Port Ready ######n"); nfc.begin(); //Start NFC nfc.SAMConfig(); if(!Serial){ //If the serial port don´ work wait delay(500); } } //LOOP void loop(void) { uint8_t success; uint8_t uid[] = { 0, 0, 0, 0}; //Buffer to save the read value //uint8_t ok[] = { X, X, X, X }; //Put your serial Number. * 1 uint8_t uidLength; success = nfc.readPassiveTargetID(PN532_MIFARE_ISO14443A, uid, &uidLength); // READ // If TAG PRESENT if (success) { Serial.println("Found cardn"); Serial.print(" UID Value: t"); nfc.PrintHex(uid, uidLength); //Convert Bytes to Hex. int n = 0; Serial.print("Your Serial Number: t"); // This function show the real value // 4 position runs extracting data while (n < 4){ // Copy and remenber to use in the next example * 1 Serial.print(uid[n]); Serial.print("t"); n++; } Serial.println(""); } delay(1500); //wait 1'5 sec } Now open a serial port from the IDE or just execute the previous Python script. We can see the output from the Arduino. Enter the following command to run the Python code: $ python SerialPython.py The output will look like this: ###### Serial Port Ready ###### Found card UID Value: 0xED 0x05 0xED 0x9A Your Serial Number: 237 5 237 154 So we have our own identification number, but what are those letters that appear above our number? Well, it is hexadecimal, another widely used way to represent numbers and information in the computer. It represents a small set of characters and more numbers in decimal notation than we usually use. (Remember our card with little space. We use hexadecimal to be more useful to save memory). An example is the number 15, we need two characters in tenth, while in decimal just one character is needed f (0xf, this is how the hexadecimal code, preceded by 0x, this is a thing to remember as this would be of great help later on). Open a console and the Python screen. Run the following code to obtain hexadecimal numbers, and will see transformation to decimal (you can replace them with the values previously put in the code): For more information about numbering systems, see http://en.wikipedia.org/wiki/Hexadecimal. $ python >>> 0xED 237 >>> 0x05 5 >>> 0x9A 154 You can see that they are same numbers. RFID/NFC module and tag Once we are already familiar with the use of this technology, we proceed to increase the difficultly and create our little access control. Be sure to record the serial number of your access card (in decimal). We will make this little montage focus on the use of NFC cards, no authentication key. As mentioned earlier, there is a variant of this exercise and I suggest that the reader complete this on their part, thus increasing the robustness of our assembly. In addition, we add a LED and buzzer to help us improve the usability. Access control The objective is simple: we just have to let people in who have a specific card with a specific serial number. If you want to use encrypted keys, you can change the ID of an encrypted message inside the card with a key known only to the creator of the card. When a successful identification occurs, a green flash lead us to report that someone has had authorized access to the system. In addition, the user is notified of the access through a pleasant sound, for example, will invite you to push the door to open it. Otherwise, an alarm is sounded repeatedly, alerting us of the offense and activating an alert command center (a red flash that is repeated and consistent, which captures the attention of the watchman.) Feel free to add more elements. The diagram shall be as follows: Our scheme – Image obtained using Fritzing The following is a much clearer representation of the preceding wiring diagram as it avoids the part of the antenna for easy reading: Our design - Image obtained using Fritzing Once we clear the way to take our design, you can begin to connect all elements and then create our code for the project. The following is the code for the NFC access: //########################### //Author : Jorge Reyes Castro //Arduino Home Security Book //########################### #include <Wire.h> #include <Adafruit_NFCShield_I2C.h> // this is the Adafruit Library //MACROS #define IRQ (2) #define RESET (3) Adafruit_NFCShield_I2C nfc(IRQ, RESET); //Prepare NFC MODULE void setup(void) { pinMode(5, OUTPUT); // PIEZO pinMode(9, OUTPUT); // LED RED pinMode(10, OUTPUT); // LED GREEN okHAL(); Serial.begin(115200); //Open Serial Port Serial.print("###### Serial Port Ready ######n"); nfc.begin(); //Start NFC nfc.SAMConfig(); if(!Serial){ // If the Serial Port don't work wait delay(500); } } void loop(void) { uint8_t success; uint8_t uid[] = { 0, 0, 0, 0}; //Buffer to storage the ID from the read tag uint8_t ok[] = { 237, 5, 237, 154}; //Put your serial Number. uint8_t uidLength; success = nfc.readPassiveTargetID(PN532_MIFARE_ISO14443A, uid, &uidLength); //READ if (success) { okHAL(); // Sound "OK" Serial.println("Found cardn"); Serial.print(" UID Value: t"); nfc.PrintHex(uid, uidLength); // The Value from the UID in HEX int n = 0; Serial.print("Your Serial Number: t"); // The Value from the UID in DEC while (n < 4){ Serial.print(uid[n]); Serial.print("t"); n++; } Serial.println(""); Serial.println(" Wait..n"); // Verification int m = 0, l = 0; while (m < 5){ if(uid[m] == ok[l]){ // Compare the elements one by one from the obtained UID card that was stored previously by us. } else if(m == 4){ Serial.println("###### Authorized ######n"); // ALL OK authOk(); okHAL(); } else{ Serial.println("###### Unauthorized ######n"); // NOT EQUALS ALARM !!! authError(); errorHAL(); m = 6; } m++; l++; } } delay(1500); } //create a function that allows us to quickly create sounds // alarm ( "time to wait" in ms, "tone/power") void alarm(unsigned char wait, unsigned char power ){ analogWrite(5, power); delay(wait); analogWrite(5, 0); } // HAL OK void okHAL(){ alarm(200, 250); alarm(100, 50); alarm(300, 250); } // HAL ERROR void errorHAL(){ int n = 0 ; while(n< 3){ alarm(200, 50); alarm(100, 250); alarm(300, 50); n++; } } // These functions activated led when we called. // (Look at the code of the upper part where they are used) // RED - FAST void authError(){ int n = 0 ; while(n< 20){ digitalWrite(9, HIGH); delay(500); digitalWrite(9, LOW); delay(500); n++; } } // GREEN - SLOW void authOk(){ int n = 0 ; while(n< 5){ digitalWrite(10, HIGH); delay(2000); digitalWrite(10, LOW); delay(500); n++; } } // This code can be reduced to increase efficiency and speed of execution, // but has been created with didactic purposes and therefore increased readability Once you have the copy, compile it, and throw it on your board. Ensure that our creation now has a voice, whenever we start communicating or when someone tries to authenticate. In addition, we have added two simple LEDs that give us much information about our design. Finally, we have our whole system, perhaps it will be time to improve our Python script and try the serial port settings, creating a red window or alarm sound on the computer that is receiving the data. It is the turn of reader to assimilate all of the concepts seen in this day and modify them to delve deeper and deeper into the wonderful world of programming. Summary This has been an intense article, full of new elements. We learned to handle technical documentation fluently, covered the different type of signals and their main differences, found the perfect component for our needs, and finally applied it to a real project, without forgetting the NFC. This has just been introduced, laying the foundation for the reader to be able to modify and study it deeper it in the future. Resources for Article: Further resources on this subject: Getting Started with Arduino[article] Arduino Development[article] The Arduino Mobile Robot [article]

0
0
9439

How-To Tutorials

Packt

02 Sep 2015

24 min read

Big Data

Packt

02 Sep 2015

24 min read

In this article by Henry Garner, author of the book Clojure for Data Science, we'll be working with a relatively modest dataset of only 100,000 records. This isn't big data (at 100 MB, it will fit comfortably in the memory of one machine), but it's large enough to demonstrate the common techniques of large-scale data processing. Using Hadoop (the popular framework for distributed computation) as its case study, this article will focus on how to scale algorithms to very large volumes of data through parallelism. Before we get to Hadoop and distributed data processing though, we'll see how some of the same principles that enable Hadoop to be effective at a very large scale can also be applied to data processing on a single machine, by taking advantage of the parallel capacity available in all modern computers. (For more resources related to this topic, see here.) The reducers library The count operation we implemented previously is a sequential algorithm. Each line is processed one at a time until the sequence is exhausted. But there is nothing about the operation that demands that it must be done in this way. We could split the number of lines into two sequences (ideally of roughly equal length) and reduce over each sequence independently. When we're done, we would just add together the total number of lines from each sequence to get the total number of lines in the file: If each Reduce ran on its own processing unit, then the two count operations would run in parallel. All the other things being equal, the algorithm would run twice as fast. This is one of the aims of the clojure.core.reducers library—to bring the benefit of parallelism to algorithms implemented on a single machine by taking advantage of multiple cores. Parallel folds with reducers The parallel implementation of reduce implemented by the reducers library is called fold. To make use of a fold, we have to supply a combiner function that will take the results of our reduced sequences (the partial row counts) and return the final result. Since our row counts are numbers, the combiner function is simply +. Reducers are a part of Clojure's standard library, they do not need to be added as an external dependency. The adjusted example, using clojure.core.reducers as r, looks like this: (defn ex-5-5 [] (->> (io/reader "data/soi.csv") (line-seq) (r/fold + (fn [i x] (inc i))))) The combiner function, +, has been included as the first argument to fold and our unchanged reduce function is supplied as the second argument. We no longer need to pass the initial value of zero—fold will get the initial value by calling the combiner function with no arguments. Our preceding example works because +, called with no arguments, already returns zero: (defn ex-5-6 [] (+)) ;; 0 To participate in folding then, it's important that the combiner function have two implementations: one with zero arguments that returns the identity value and another with two arguments that combines the arguments. Different folds will, of course, require different combiner functions and identity values. For example, the identity value for multiplication is 1. We can visualize the process of seeding the computation with an identity value, iteratively reducing over the sequence of xs and combining the reductions into an output value as a tree: There may be more than two reductions to combine, of course. The default implementation of fold will split the input collection into chunks of 512 elements. Our 166,000-element sequence will therefore generate 325 reductions to be combined. We're going to run out of page real estate quite quickly with a tree representation diagram, so let's visualize the process more schematically instead—as a two-step reduce and combine process. The first step performs a parallel reduce across all the chunks in the collection. The second step performs a serial reduce over the intermediate results to arrive at the final result: The preceding representation shows reduce over several sequences of xs, represented here as circles, into a series of outputs, represented here as squares. The squares are combined serially to produce the final result, represented by a star. Loading large files with iota Calling fold on a lazy sequence requires Clojure to realize the sequence into memory and then chunk the sequence into groups for parallel execution. For situations where the calculation performed on each row is small, the overhead involved in coordination outweighs the benefit of parallelism. We can improve the situation slightly by using a library called iota (https://github.com/thebusby/iota). The iota library loads files directly into the data structures suitable for folding over with reducers that can handle files larger than available memory by making use of memory-mapped files. With iota in the place of our line-seq function, our line count simply becomes: (defn ex-5-7 [] (->> (iota/seq "data/soi.csv") (r/fold + (fn [i x] (inc i))))) So far, we've just been working with the sequences of unformatted lines, but if we're going to do anything more than counting the rows, we'll want to parse them into a more useful data structure. This is another area in which Clojure's reducers can help make our code more efficient. Creating a reducers processing pipeline We already know that the file is comma-separated, so let's first create a function to turn each row into a vector of fields. All fields except the first two contain numeric data, so let's parse them into doubles while we're at it: (defn parse-double [x] (Double/parseDouble x)) (defn parse-line [line] (let [[text-fields double-fields] (->> (str/split line #",") (split-at 2))] (concat text-fields (map parse-double double-fields)))) We're using the reducers version of map to apply our parse-line function to each of the lines from the file in turn: (defn ex-5-8 [] (->> (iota/seq "data/soi.csv") (r/drop 1) (r/map parse-line) (r/take 1) (into []))) ;; [("01" "AL" 0.0 1.0 889920.0 490850.0 ...)] The final into function call converts the reducers' internal representation (a reducible collection) into a Clojure vector. The previous example should return a sequence of 77 fields, representing the first row of the file after the header. We're just dropping the column names at the moment, but it would be great if we could make use of these to return a map representation of each record, associating the column name with the field value. The keys of the map would be the column headings and the values would be the parsed fields. The clojure.core function zipmap will create a map out of two sequences—one for the keys and one for the values: (defn parse-columns [line] (->> (str/split line #",") (map keyword))) (defn ex-5-9 [] (let [data (iota/seq "data/soi.csv") column-names (parse-columns (first data))] (->> (r/drop 1 data) (r/map parse-line) (r/map (fn [fields] (zipmap column-names fields))) (r/take 1) (into [])))) This function returns a map representation of each row, a much more user-friendly data structure: [{:N2 1505430.0, :A19300 181519.0, :MARS4 256900.0 ...}] A great thing about Clojure's reducers is that in the preceding computation, calls to r/map, r/drop and r/take are composed into a reduction that will be performed in a single pass over the data. This becomes particularly valuable as the number of operations increases. Let's assume that we'd like to filter out zero ZIP codes. We could extend the reducers pipeline like this: (defn ex-5-10 [] (let [data (iota/seq "data/soi.csv") column-names (parse-columns (first data))] (->> (r/drop 1 data) (r/map parse-line) (r/map (fn [fields] (zipmap column-names fields))) (r/remove (fn [record] (zero? (:zipcode record)))) (r/take 1) (into [])))) The r/remove step is now also being run together with the r/map, r/drop and r/take calls. As the size of the data increases, it becomes increasingly important to avoid making multiple iterations over the data unnecessarily. Using Clojure's reducers ensures that our calculations are compiled into a single pass. Curried reductions with reducers To make the process clearer, we can create a curried version of each of our previous steps. To parse the lines, create a record from the fields and filter zero ZIP codes. The curried version of the function is a reduction waiting for a collection: (def line-formatter (r/map parse-line)) (defn record-formatter [column-names] (r/map (fn [fields] (zipmap column-names fields)))) (def remove-zero-zip (r/remove (fn [record] (zero? (:zipcode record))))) In each case, we're calling one of reducers' functions, but without providing a collection. The response is a curried version of the function that can be applied to the collection at a later time. The curried functions can be composed together into a single parse-file function using comp: (defn load-data [file] (let [data (iota/seq file) col-names (parse-columns (first data)) parse-file (comp remove-zero-zip (record-formatter col-names) line-formatter)] (parse-file (rest data)))) It's only when the parse-file function is called with a sequence that the pipeline is actually executed. Statistical folds with reducers With the data parsed, it's time to perform some descriptive statistics. Let's assume that we'd like to know the mean number of returns (column N1) submitted to the IRS by ZIP code. One way of doing this—the way we've done several times throughout the book—is by adding up the values and dividing it by the count. Our first attempt might look like this: (defn ex-5-11 [] (let [data (load-data "data/soi.csv") xs (into [] (r/map :N1 data))] (/ (reduce + xs) (count xs)))) ;; 853.37 While this works, it's comparatively slow. We iterate over the data once to create xs, a second time to calculate the sum, and a third time to calculate the count. The bigger our dataset gets, the larger the time penalty we'll pay. Ideally, we would be able to calculate the mean value in a single pass over the data, just like our parse-file function previously. It would be even better if we can perform it in parallel too. Associativity Before we proceed, it's useful to take a moment to reflect on why the following code wouldn't do what we want: (defn mean ([] 0) ([x y] (/ (+ x y) 2))) Our mean function is a function of two arities. Without arguments, it returns zero, the identity for the mean computation. With two arguments, it returns their mean: (defn ex-5-12 [] (->> (load-data "data/soi.csv") (r/map :N1) (r/fold mean))) ;; 930.54 The preceding example folds over the N1 data with our mean function and produces a different result from the one we obtained previously. If we could expand out the computation for the first three xs, we might see something like the following code: (mean (mean (mean 0 a) b) c) This is a bad idea, because the mean function is not associative. For an associative function, the following holds true: Addition is associative, but multiplication and division are not. So the mean function is not associative either. Contrast the mean function with the following simple addition: (+ 1 (+ 2 3)) This yields an identical result to: (+ (+ 1 2) 3) It doesn't matter how the arguments to + are partitioned. Associativity is an important property of functions used to reduce over a set of data because, by definition, the results of a previous calculation are treated as inputs to the next. The easiest way of converting the mean function into an associative function is to calculate the sum and the count separately. Since the sum and the count are associative, they can be calculated in parallel over the data. The mean function can be calculated simply by dividing one by the other. Multiple regression with gradient descent The normal equation uses matrix algebra to very quickly and efficiently arrive at the least squares estimates. Where all data fits in memory, this is a very convenient and concise equation. Where the data exceeds the memory available to a single machine however, the calculation becomes unwieldy. The reason for this is matrix inversion. The calculation of is not something that can be accomplished on a fold over the data—each cell in the output matrix depends on many others in the input matrix. These complex relationships require that the matrix be processed in a nonsequential way. An alternative approach to solve linear regression problems, and many other related machine learning problems, is a technique called gradient descent. Gradient descent reframes the problem as the solution to an iterative algorithm—one that does not calculate the answer in one very computationally intensive step, but rather converges towards the correct answer over a series of much smaller steps. The gradient descent update rule Gradient descent works by the iterative application of a function that moves the parameters in the direction of their optimum values. To apply this function, we need to know the gradient of the cost function with the current parameters. Calculating the formula for the gradient involves calculus that's beyond the scope of this book. Fortunately, the resulting formula isn't terribly difficult to interpret: is the partial derivative, or the gradient, of our cost function J(β) for the parameter at index j. Therefore, we can see that the gradient of the cost function with respect to the parameter at index j is equal to the difference between our prediction and the true value of y multiplied by the value of x at index j. Since we're seeking to descend the gradient, we want to subtract some proportion of the gradient from the current parameter values. Thus, at each step of gradient descent, we perform the following update: Here, := is the assigment operator and α is a factor called the learning rate. The learning rate controls how large an adjustment we wish make to the parameters at each iteration as a fraction of the gradient. If our prediction ŷ nearly matches the actual value of y, then there would be little need to change the parameters. In contrast, a larger error will result in a larger adjustment to the parameters. This rule is called the Widrow-Hoff learning rule or the Delta rule. The gradient descent learning rate As we've seen, gradient descent is an iterative algorithm. The learning rate, usually represented by α, dictates the speed at which the gradient descent converges to the final answer. If the learning rate is too small, convergence will happen very slowly. If it is too large, gradient descent will not find values close to the optimum and may even diverge from the correct answer: In the preceding chart, a small learning rate leads to a show convergence over many iterations of the algorithm. While the algorithm does reach the minimum, it does so over many more steps than is ideal and, therefore, may take considerable time. By contrast, in following diagram, we can see the effect of a learning rate that is too large. The parameter estimates are changed so significantly between iterations that they actually overshoot the optimum values and diverge from the minimum value: The gradient descent algorithm requires us to iterate repeatedly over our dataset. With the correct version of alpha, each iteration should successively yield better approximations of the ideal parameters. We can choose to terminate the algorithm when either the change between iterations is very small or after a predetermined number of iterations. Feature scaling As more features are added to the linear model, it is important to scale features appropriately. Gradient descent will not perform very well if the features have radically different scales, since it won't be possible to pick a learning rate to suit them all. A simple scaling we can perform is to subtract the mean value from each of the values and divide it by the standard-deviation. This will tend to produce values with zero mean that generally vary between -3 and 3: ( defn feature-scales [features] (->> (prepare-data) (t/map #(select-keys % features)) (t/facet) (t/fuse {:mean (m/mean) :sd (m/standard-deviation)}))) The feature-factors function in the preceding code uses t/facet to calculate the mean value and standard deviation of all the input features: (defn ex-5-24 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2]] (->> (feature-scales features) (t/tesser (chunks data))))) ;; {:MARS2 {:sd 533.4496892658647, :mean 317.0412009748016}...} If you run the preceding example, you'll see the different means and standard deviations returned by the feature-scales function. Since our feature scales and input records are represented as maps, we can perform the scale across all the features at once using Clojure's merge-with function: (defn scale-features [factors] (let [f (fn [x {:keys [mean sd]}] (/ (- x mean) sd))] (fn [x] (merge-with f x factors)))) Likewise, we can perform the all-important reversal with unscale-features: (defn unscale-features [factors] (let [f (fn [x {:keys [mean sd]}] (+ (* x sd) mean))] (fn [x] (merge-with f x factors)))) Let's scale our features and take a look at the very first feature. Tesser won't allow us to execute a fold without a reduce, so we'll temporarily revert to using Clojure's reducers: (defn ex-5-25 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2] factors (->> (feature-scales features) (t/tesser (chunks data)))] (->> (load-data "data/soi.csv") (r/map #(select-keys % features )) (r/map (scale-features factors)) (into []) (first)))) ;; {:MARS2 -0.14837567114357617, :NUMDEP 0.30617757526890155, ;; :AGI_STUB -0.714280814223704, :A00200 -0.5894942801950217, ;; :A02300 0.031741856083514465} This simple step will help gradient descent perform optimally on our data. Feature extraction Although we've used maps to represent our input data in this article, it's going to be more convenient when running gradient descent to represent our features as a matrix. Let's write a function to transform our input data into a map of xs and y. The y axis will be a scalar response value and xs will be a matrix of scaled feature values. We're adding a bias term to the returned matrix of features: (defn feature-matrix [record features] (let [xs (map #(% record) features)] (i/matrix (cons 1 xs)))) (defn extract-features [fy features] (fn [record] {:y (fy record) :xs (feature-matrix record features)})) Our feature-matrix function simply accepts an input of a record and the features to convert into a matrix. We call this from within extract-features, which returns a function that we can call on each input record: (defn ex-5-26 [] (let [data (iota/seq "data/soi.csv") features [:A02300 :A00200 :AGI_STUB :NUMDEP :MARS2] factors (->> (feature-scales features) (t/tesser (chunks data)))] (->> (load-data "data/soi.csv") (r/map (scale-features factors)) (r/map (extract-features :A02300 features)) (into []) (first)))) ;; {:y 433.0, :xs A 5x1 matrix ;; ------------- ;; 1.00e+00 ;; -5.89e-01 ;; -7.14e-01 ;; 3.06e-01 ;; -1.48e-01 ;; } The preceding example shows the data converted into a format suitable to perform gradient descent: a map containing the y response variable and a matrix of values, including the bias term. Applying a single step of gradient descent The objective of calculating the cost is to determine the amount by which to adjust each of the coefficients. Once we've calculated the average cost, as we did previously, we need to update the estimate of our coefficients β. Together, these steps represent a single iteration of gradient descent: We can return the updated coefficients in a post-combiner step that makes use of the average cost, the value of alpha, and the previous coefficients. Let's create a utility function update-coefficients, which will receive the coefficients and alpha and return a function that will calculate the new coefficients, given a total model cost: (defn update-coefficients [coefs alpha] (fn [cost] (->> (i/mult cost alpha) (i/minus coefs)))) With the preceding function in place, we have everything we need to package up a batch gradient descent update rule: (defn gradient-descent-fold [{:keys [fy features factors coefs alpha]}] (let [zeros-matrix (i/matrix 0 (count features) 1)] (->> (prepare-data) (t/map (scale-features factors)) (t/map (extract-features fy features)) (t/map (calculate-error (i/trans coefs))) (t/fold (matrix-mean (inc (count features)) 1)) (t/post-combine (update-coefficients coefs alpha))))) (defn ex-5-31 [] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) data (chunks (iota/seq "data/soi.csv")) factors (->> (feature-scales features) (t/tesser data)) options {:fy :A02300 :features features :factors factors :coefs coefs :alpha 0.1}] (->> (gradient-descent-fold options) (t/tesser data)))) ;; A 6x1 matrix ;; ------------- ;; -4.20e+02 ;; -1.38e+06 ;; -5.06e+07 ;; -9.53e+02 ;; -1.42e+06 ;; -4.86e+05 The resulting matrix represents the values of the coefficients after the first iteration of gradient descent. Running iterative gradient descent Gradient descent is an iterative algorithm, and we will usually need to run it many times to convergence. With a large dataset, this can be very time-consuming. To save time, we've included a random sample of soi.csv in the data directory called soi-sample.csv. The smaller size allows us to run iterative gradient descent in a reasonable timescale. The following code runs gradient descent for 100 iterations, plotting the values of the parameters between each iteration on an xy-plot: (defn descend [options data] (fn [coefs] (->> (gradient-descent-fold (assoc options :coefs coefs)) (t/tesser data)))) (defn ex-5-32 [] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) data (chunks (iota/seq "data/soi-sample.csv")) factors (->> (feature-scales features) (t/tesser data)) options {:fy :A02300 :features features :factors factors :coefs coefs :alpha 0.1} iterations 100 xs (range iterations) ys (->> (iterate (descend options data) coefs) (take iterations))] (-> (c/xy-plot xs (map first ys) :x-label "Iterations" :y-label "Coefficient") (c/add-lines xs (map second ys)) (c/add-lines xs (map #(nth % 2) ys)) (c/add-lines xs (map #(nth % 3) ys)) (c/add-lines xs (map #(nth % 4) ys)) (i/view)))) If you run the example, you should see a chart similar to the following: In the preceding chart, you can see how the parameters converge to relatively stable the values over the course of 100 iterations. Scaling gradient descent with Hadoop The length of time each iteration of batch gradient descent takes to run is determined by the size of your data and by how many processors your computer has. Although several chunks of data are processed in parallel, the dataset is large and the processors are finite. We've achieved a speed gain by performing calculations in parallel, but if we double the size of the dataset, the runtime will double as well. Hadoop is one of several systems that has emerged in the last decade which aims to parallelize work that exceeds the capabilities of a single machine. Rather than running code across multiple processors, Hadoop takes care of running a calculation across many servers. In fact, Hadoop clusters can, and some do, consist of many thousands of servers. Hadoop consists of two primary subsystems— the Hadoop Distributed File System (HDFS)—and the job processing system, MapReduce. HDFS stores files in chunks. A given file may be composed of many chunks and chunks are often replicated across many servers. In this way, Hadoop can store quantities of data much too large for any single server and, through replication, ensure that the data is stored reliably in the event of hardware failure too. As the name implies, the MapReduce programming model is built around the concept of map and reduce steps. Each job is composed of at least one map step and may optionally specify a reduce step. An entire job may consist of several map and reduce steps chained together. In the respect that reduce steps are optional, Hadoop has a slightly more flexible approach to distributed calculation than Tesser. Gradient descent on Hadoop with Tesser and Parkour Tesser's Hadoop capabilities are available in the tesser.hadoop namespace, which we're including as h. The primary public API function in the Hadoop namespace is h/fold. The fold function expects to receive at least four arguments, representing the configuration of the Hadoop job, the input file we want to process, a working directory for Hadoop to store its intermediate files, and the fold we want to run, referenced as a Clojure var. Any additional arguments supplied will be passed as arguments to the fold when it is executed. The reason for using a var to represent our fold is that the function call initiating the fold may happen on a completely different computer than the one that actually executes it. In a distributed setting, the var and arguments must entirely specify the behavior of the function. We can't, in general, rely on other mutable local state (for example, the value of an atom, or the value of variables closing over the function) to provide any additional context. Parkour distributed sources and sinks The data which we want our Hadoop job to process may exist on multiple machines too, stored distributed in chunks on HDFS. Tesser makes use of a library called Parkour (https://github.com/damballa/parkour/) to handle accessing potentially distributed data sources. Although Hadoop is designed to be run and distributed across many servers, it can also run in local mode. Local mode is suitable for testing and enables us to interact with the local filesystem as if it were HDFS. Another namespace we'll be using from Parkour is the parkour.conf namespace. This will allow us to create a default Hadoop configuration and operate it in local mode: (defn ex-5-33 [] (->> (text/dseq "data/soi.csv") (r/take 2) (into []))) In the preceding example, we use Parkour's text/dseq function to create a representation of the IRS input data. The return value implements Clojure's reducers protocol, so we can use r/take on the result. Running a feature scale fold with Hadoop Hadoop needs a location to write its temporary files while working on a task, and will complain if we try to overwrite an existing directory. Since we'll be executing several jobs over the course of the next few examples, let's create a little utility function that returns a new file with a randomly-generated name. (defn rand-file [path] (io/file path (str (long (rand 0x100000000))))) (defn ex-5-34 [] (let [conf (conf/ig) input (text/dseq "data/soi.csv") workdir (rand-file "tmp") features [:A00200 :AGI_STUB :NUMDEP :MARS2]] (h/fold conf input workdir #'feature-scales features))) Parkour provides a default Hadoop configuration object with the shorthand (conf/ig). This will return an empty configuration. The default value is enough, we don't need to supply any custom configuration. All of our Hadoop jobs will write their temporary files to a random directory inside the project's tmp directory. Remember to delete this folder later, if you're concerned about preserving disk space. If you run the preceding example now, you should get an output similar to the following: ;; {:MARS2 317.0412009748016, :NUMDEP 581.8504423822615, ;; :AGI_STUB 3.499939975269811, :A00200 37290.58880658831} Although the return value is identical to the values we got previously, we're now making use of Hadoop behind the scenes to process our data. In spite of this, notice that Tesser will return the response from our fold as a single Clojure data structure. Running gradient descent with Hadoop Since tesser.hadoop folds return Clojure data structures just like tesser.core folds, defining a gradient descent function that makes use of our scaled features is very simple: (defn hadoop-gradient-descent [conf input-file workdir] (let [features [:A00200 :AGI_STUB :NUMDEP :MARS2] fcount (inc (count features)) coefs (vec (replicate fcount 0)) input (text/dseq input-file) options {:column-names column-names :features features :coefs coefs :fy :A02300 :alpha 1e-3} factors (h/fold conf input (rand-file workdir) #'feature-scales features) descend (fn [coefs] (h/fold conf input (rand-file workdir) #'gradient-descent-fold (merge options {:coefs coefs :factors factors})))] (take 5 (iterate descend coefs)))) The preceding code defines a hadoop-gradient-descent function that iterates a descend function 5 times. Each iteration of descend calculates the improved coefficients based on the gradient-descent-fold function. The final return value is a vector of coefficients after 5 iterations of a gradient descent. We run the job on the full IRS data in the following example: ( defn ex-5-35 [] (let [workdir "tmp" out-file (rand-file workdir)] (hadoop-gradient-descent (conf/ig) "data/soi.csv" workdir))) After several iterations, you should see an output similar to the following: ;; ([0 0 0 0 0] ;; (20.9839310796048 46.87214911003046 -7.363493937722712 ;; 101.46736841329326 55.67860863427868) ;; (40.918665605227744 56.55169901254631 -13.771345753228694 ;; 162.1908841131747 81.23969785586247) ;; (59.85666340457121 50.559130068258995 -19.463888245285332 ;; 202.32407094149158 92.77424653758085) ;; (77.8477613139478 38.67088624825574 -24.585818946408523 ;; 231.42399118694212 97.75201693843269)) We've seen how we're able to calculate gradient descent using distributed techniques locally. Now, let's see how we can run this on a cluster of our own. Summary In this article, we learned some of the fundamental techniques of distributed data processing and saw how the functions used locally for data processing, map and reduce, are powerful ways of processing even very large quantities of data. We learned how Hadoop can scale unbounded by the capabilities of any single server by running functions on smaller subsets of the data whose outputs are themselves combined to finally produce a result. Once you understand the tradeoffs, this "divide and conquer" approach toward processing data is a simple and very general way of analyzing data on a large scale. We saw both the power and limitations of simple folds to process data using both Clojure's reducers and Tesser. We've also begun exploring how Parkour exposes more of Hadoop's underlying capabilities. Resources for Article: Further resources on this subject: Supervised learning[article] Machine Learning[article] Why Big Data in the Financial Sector? [article]

0
0
10892

Packt

02 Sep 2015

12 min read

Meeting SAP Lumira

Packt

02 Sep 2015

12 min read

In this article by Dmitry Anoshin, author of the book SAP Lumira Essentials, Dmitry talks about living in a century of information technology. There are a lot of electronic devices around us which generate lots of data. For example, you can surf the Internet, visit a couple of news portals, order new Nike Air Max shoes from a web store, write a couple of messages to your friend, and chat on Facebook. Your every action produces data. We can multiply that action by the amount of people who have access to the internet or just use a cell phone, and we get really BIG DATA. Of course, you have a question: how big is it? Now, it starts from terabytes or even petabytes. The volume is not the only issue; moreover, we struggle with the variety of data. As a result, it is not enough to analyze only the structured data. We should dive deep in to unstructured data, such as machine data which are generated by various machines. (For more resources related to this topic, see here.) Nowadays, we should have a new core competence—dealing with Big Data—, because these vast data volumes won't be just stored, they need to be analysed and mined for information that management can use in order to make right business decisions. This helps to make the business more competitive and efficient. Unfortunately, in modern organizations there are still many manual steps needed in order to get data and try to answer your business questions. You need the help of your IT guys, or need to wait until new data is available in your enterprise data warehouse. In addition, you are often working with an inflexible BI tool, which can only refresh a report or export it in to Excel. You definitely need a new approach, which gives you a competitive advantage, dramatically reduces errors, and accelerates business decisions. So, we can highlight some of the key points for this kind of analytics: Integrating data from heterogeneous systems Giving more access to data Using sophisticated analytics Reducing manual coding Simplifying processes Reducing time to prepare data Focusing on self-service Leveraging powerful computing resources We could continue this list with many other bullet points. If you are a fan of traditional BI tools, you may think that it is almost impossible. Yes, you are right, it is impossible. That's why we need to change the rules of the game. As the business world changes, you must change as well. Maybe you have guessed what this means, but if not, I can help you. I will focus on a new approach of doing data analytics, which is more flexible and powerful. It is called data discovery. Of course, we need the right way in order to overcome all the challenges of the modern world. That's why we have chosen SAP Lumira—one of the most powerful data discovery tools in the modern market. But before diving deep into this amazing tool, let's consider some of the challenges of data discovery that are in our path, as well as data discovery advantages. Data discovery challenges Let's imagine that you have several terabytes of data. Unfortunately, it is raw unstructured data. In order to get business insight from this data you have to spend a lot of time in order to prepare and clean the data. In addition, you are restricted by the capabilities of your machine. That's why a good data discovery tool usually is combined of software and hardware. As a result, this gives you more power for exploratory data analysis. Let's imagine that this entire Big Data store is in Hadoop or any NoSQL data store. You have to at least be at good programmer in order to do analytics on this data. Here we can find other benefit of a good data discovery tool: it gives a powerful tool to business users, who are not as technical and maybe don't even know SQL. Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resilience of these clusters comes from the software's ability to detect and handle failures at the application layer. A NoSQL data store is a next generation database, mostly addressing some of the following points: non-relational, distributed, open-source, and horizontally scalable. Data discovery versus business intelligence You may be confused about data discovery and business intelligence technologies; it seems they are very close to each other or even BI tools can do all what data discovery can do. And why do we need a separate data discovery tool, such as, SAP Lumira? In order to better understand the difference between the two technologies, you can look at the table below: Enterprise BI Data discovery Key users All users Advanced analysts Approach Vertically-oriented (top to bottom), semantic layers, requests to existing repositories Vertically-oriented (bottom-up), mushup, putting data in the selected repository Interface Reports, dashboards Visualization Users Reporting Analysis Implementation By IT consultants By business users Let's consider the pros and cons of data discovery: Pros: Rapidly analyze data with a short shelf life Ideal for small teams Best for tactical analysis Great for answering on-off questions quickly Cons: Difficult to handle for enterprise organizations Difficult for junior users Lack of scalability As a result, it is clear that BI and data discovery handles their own tasks and complement each other. The role of data discovery Most organizations have a data warehouse. It was planned to supporting daily operations and to help make business decisions. But sometimes organizations need to meet new challenges. For example, Retail Company wants to improve their customer experience and decide to work closely with the customer database. Analysts try to segment customers into cohorts and try to analyse customer's behavior. They need to handle all customer data, which is quite big. In addition, they can use external data in order to learn more about their customers. If they start to use a corporate BI tool, every interaction, such as adding new a field or filter, can take 10-30 minutes. Another issue is adding a new field to an existing report. Usually, it is impossible without the help of IT staff, due to security or the complexities of the BI Enterprise solution. This is unacceptable in a modern business. Analysts want get an answer to their business questions immediately, and they prefer to visualize data because, as you know, human perception of visualization is much higher than text. In addition, these analysts may be independent from IT. They have their data discovery tool and they can connect to any data sources in the organization and check their crazy hypotheses. There are hundreds of examples where BI and DWH is weak, and data discovery is strong. Introducing SAP Lumira Starting from this point, we will focus on learning SAP Lumira. First of all, we need to understand what SAP Lumira is exactly. SAP Lumira is a family of data discovery tools which give us an opportunity to create amazing visualizations or even tell fantastic stories based on our big or small data. We can connect most of the popular data sources, such as Relational Database Management Systems (RDBMSs), flat files, excel spreadsheets or SAP applications. We are able to create datasets with measures, dimensions, hierarchies, or variables. In addition, Lumira allows us to prepare, edit, and clean our data before it is processed. SAP Lumira offers us a huge arsenal of graphical charts and tables to visualize our data. In addition, we can create data stories or even infographics based on our data by grouping charts, single cells, or tables together on boards to create presentation- style dashboards. Moreover, we can add images or text in order to add details. The following are the three main products in the Lumira family offered by SAP: SAP Lumira Desktop SAP Lumira Server SAP Lumira Cloud Lumira Desktop can be either a personal edition or a standard edition. Both of them give you the opportunity to analyse data on your local machine. You can even share your visualizations or insights via PDF or XLS. Lumira Server is also in two variations—Edge and Server. As you know, SAP BusinessObjects also has two types of license for the same software, Edge and Enterprise, and they differ only in terms of the number of users and the type of license. The Edge version is smaller; for example, it can cover the needs of a team or even the whole department. Lumira Cloud is Software as a Service (SaaS). It helps to quickly visualize large volumes of data without having to sacrifice performance or security. It is especially designed to speed time to insight. In addition, it saves time and money with flexible licensing options. Data connectors We met SAP Lumira for the first time and we played with the interface, and the reader could adjust the general settings of SAP Lumira. In addition, we can find this interesting menu in the middle of the window: There are several steps which help us to discover our data and gain business insights. In this article we start from first step by exploring data in SAP Lumira to create a document and acquire a dataset, which can include part or all of the original values from a data source. This is through Acquire Data. Let's click on Acquire Data. This new window will come up: There are four areas on this window. They are: A list of possible data sources (1): Here, the user can connect to his data source. Recently used objects (2): The user can open his previous connections or files. Ordinary buttons (3), such as Previous, Next, Create, and Cancel. This small chat box (4) we can find at almost every page. SAP Lumira cares about the quality of the product and gives the opportunity to the user to make a screen print and send feedback to SAP. Let's go deeper and consider more closely every connection in the table below: Data Source Description Microsoft Excel Excel data sheets Flat file CSV, TXT, LOG, PRN, or TSV SAP HANA There are two possible ways: Offline (downloading data) and Online (connected to SAP HANA) SAP BusinessObjects universe UNV or UNX SQL Databases Query data via SQL from relational databases SAP Business warehouse Downloaded data from a BEx Query or an InfoProvider Let's try to connect some data sources and extract some data from them. Microsoft spreadsheets Let's start with the easiest exercise. For example, our manager of inventory asked us to analyse flop products, which are not popular, and he sent us two excel spreadsheets, Unicorn_flop_products.xls and Unicorn_flop_price.xls. There are two different worksheets because prices and product attributes are in different systems. Both files have a unique field—SKU. As a result, it is possible to merge them by this field and analyse them as one data set. SKU or stock keeping unit is a distinct item for sale, such as a product or service, and them attributes associated with the item distinguish it from other items. For a product, these attributes include, but are not limited to, manufacturer, product description, material, size, color, packaging, and warranty terms. When a business takes inventory, it counts the quantity of each SKU. Connecting to the SAP BO universe Universe is a core thing in the SAP BusinessObjects BI platform. It is the semantic layer that isolates business users from the technical complexities of the databases where their corporate information is stored. For the ease of the end user, universes are made up of objects and classes that map to data in the database, using everyday terms that describe their business environment. Introducing Unicorn Fashion universe The Unicorn Fashion company uses the SAP BusinessObjects BI platform (BIP) as its primary BI tool. There is another Unicorn Fashion universe, which was built based on the unicorn datamart. It has a similar structure and joins as datamart. The following image shows the Unicorn Fashion universe: It unites two business processes: Sales (orange) and Stock (green) and has the following structure in business layer: Product: This specifies the attributes of an SKU, such as brand, category, ant, and so on Price: This denotes the different pricing of the SKU Sales: This specifies the sales business process Order: This denotes the order number, the shipping information, and orders measures Sales Date: This specifies the attributes of order date, such as month, year, and so on Sales Measures: This denotes various aggregated measures, such as shipped items, revenue waterfall, and so on Stock: This specifies the information about the quantity on stock Stock Date: This denotes the attributes of stock date, such as month, year, and so on Summary A step-by-step guide of learning SAP Lumira essentials starting from overview of SAP Lumira family products. We will demonstrate various data discovery techniques using real world scenarios of online ecommerce retailer. Moreover, we have detail recipes of installations, administration and customization of SAP Lumira. In addition, we will show how to work with data starting from acquiring data from various data sources, then preparing it and visualize through rich functionality of SAP Lumira. Finally, it teaches how to present data via data story or infographic and publish it across your organization or world wide web. Learn data discovery techniques, build amazing visualizations, create fantastic stories and share these visualizations through electronic medium with one of the most powerful tool – SAP Lumira. Moreover, we will focus on extracting data from different sources such as plain text, Microsoft Excel spreadsheets, SAP BusinessObjects BI Platform, SAP HANA and SQL databases. Finally, it will teach how to publish result of your painstaking work on various mediums, such as SAP BI Clients, SAP Lumira Cloud and so on. Resources for Article: Further resources on this subject: Creating Mobile Dashboards [article] Report Data Filtering [article] Creating Our First Universe [article]

0
0
2322

How to Optimize PUT Requests

Creating the POI ListView layout

First Look and Blinking Lights

Introduction to Odoo

Installing Red Hat CloudForms on Red Hat OpenStack

Introduction to WatchKit

Testing Exceptional Flow

Creating a Web Server

Learning RSLogix 5000 – Buffering I/O Module Input and Output Values

ArcGIS – Advanced ArcObjects

Trending Topics

Understanding TDD

Walking You Through Classes

From Code to the Real World

Big Data

Meeting SAP Lumira

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access