Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-creating-your-own-node-module
Soham Kamani
18 Apr 2016
6 min read
Save for later

Creating Your Own Node Module

Soham Kamani
18 Apr 2016
6 min read
Node.js has a great community and one of the best package managers I have ever seen. One of the reasons npm is so great is because it encourages you to make small composable modules, which usually have just one responsibility. Many of the larger, more complex node modules are built by composing smaller node modules. As of this writing, npm has over 219,897 packages. One of the reasons this community is so vibrant is because it is ridiculously easy to make your own node module. This post will go through the steps to create your own node module, as well as some of the best practices to follow while doing so. Prerequisites and Installation node and npm are a given. Additionally, you should also configure your npm author details: npm set init.author.name "My Name" npm set init.author.email "your@email.com" npm set init.author.url "http://your-website.com" npm adduser These are the details that would show up on npmjs.org once you publish. Hello World The reason that I say creating a node module is ridiculously easy is because you only need two files to create the most basic version of a node module. First up, create a package.json file inside of a new folder by running the npm init command. This will ask you to choose a name. Of course, the name you are thinking of might already exist in the npm registry, so to check for this run the command npm ls owner module_name , where module_name is replaced by the namespace you want to check. If it exists, you will get information about the authors: $ npm owner ls forever indexzero <charlie.robbins@gmail.com> bradleymeck <bradley.meck@gmail.com> julianduque <julianduquej@gmail.com> jeffsu <me@jeffsu.com> jcrugzz <jcrugzz@gmail.com> If your namespace is free you would get an error message. Something similar to : $ npm owner ls does_not_exist npm ERR! owner ls Couldnt get owner data does_not_exist npm ERR! Darwin 14.5.0 npm ERR! argv "node" "/usr/local/bin/npm" "owner" "ls" "does_not_exist" npm ERR! node v0.12.4 npm ERR! npm v2.10.1 npm ERR! code E404 npm ERR! 404 Registry returned 404 GET on https://registry.npmjs.org/does_not_exist npm ERR! 404 npm ERR! 404 'does_not_exist' is not in the npm registry. npm ERR! 404 You should bug the author to publish it (or use the name yourself!) npm ERR! 404 npm ERR! 404 Note that you can also install from a npm ERR! 404 tarball, folder, http url, or git url. npm ERR! Please include the following file with any support request: npm ERR! /Users/sohamchetan/Documents/jekyll-blog/npm-debug.log After setting up package.json, add a JavaScript file: module.exports = function(){ return 'Hello World!'; } And that's it! Now execute npm publish . and your node module will be published to npmjs.org. Also, anyone can now install your node module by running npm install --save module_name, where module name is the "name" property contained in package.json. Now anyone can use your module like this : var someModule = require('module_name'); console.log(someModule()); // This will output "Hello World!" Dependencies As stated before, rarely will you find large scale node modules that do not depend on other smaller modules. This is because npm encourages modularity and composability. To add dependancies to your own module, simply install them. For example, one of the most depended upon packages is lodash, a utility library. To add this, run the command : npm install --save lodash Now you can use lodash everywhere in your module by "requiring" it, and when someone else downloads your module, they get lodash bundled along with it as well. Additionally you would want to have some modules purely for development and not for distribution. These are dev-dependencies, and can be installed with the npm install --save-dev command. Dev dependencies will not install when someone else installs your node module. Configuring package.json The package.json file is what contains all the metadata for your node_module. A few fields are filled out automatically (like dependencies or devDependencies during npm installs). There are a few more fields in package.json that you should consider filling out so that your node module is best fitted to its purpose. "main": The relative path of the entry point of your module. Whatever is assigned to module.exports in this file is exported when someone "requires" your module. By default this is the index.js file. "keywords": It’s an array of keywords describing your module. Quite helpful when others from the community are searching for something that your module happens to solve. "license": I normally publish all my packages with an "MIT" licence because of its openness and popularity in the open source community. "version": This is pretty crucial because you cannot publish a node module with the same version twice. Normally, semver versioning should be followed. If you want to know more about the different properties you can set in package.json there's a great interactive guide you can check out. Using Yeoman Generators Although it's really simple to make a basic node module, it can be quite a task to make something substantial using just index.js nd package.json file. In these cases, there's a lot more to do, such as: Writing and running tests. Setting up a CI tool like Travis. Measuring code coverage. Installing standard dev dependencies for testing. Fortunately, there are many Yeoman generators to help you bootstrap your project. Check out generator-nm for setting up a basic project structure for a simple node module. If writing in ES6 is more your style, you can take a look at generator-nm-es6. These generators get your project structure, complete with a testing framework and CI integration so that you don't have to spend all your time writing boilerplate code. About the Author Soham Kamani is a full-stack web developer and electronics hobbyist.  He is especially interested in JavaScript, Python, and IoT.
Read more
  • 0
  • 0
  • 9226

article-image-setting-build-chain-grunt
Packt
18 Apr 2016
24 min read
Save for later

Setting up a Build Chain with Grunt

Packt
18 Apr 2016
24 min read
In this article by Bass Jobsen, author of the book Sass and Compass Designer's Cookbook you will learn the following topics: Installing Grunt Installing Grunt plugins Utilizing the Gruntfile.js file Adding a configuration definition for a plugin Adding the Sass compiler task (For more resources related to this topic, see here.) This article introduces you to the Grunt Task Runner and the features it offers to make your development workflow a delight. Grunt is a JavaScript Task Runner that is installed and managed via npm, the Node.js package manager. You will learn how to take advantage of its plugins to set up your own flexible and productive workflow, which will enable you to compile your Sass code. Although there are many applications available for compiling Sass, Grunt is a more flexible, versatile, and cross-platform tool that will allow you to automate many development tasks, including Sass compilation. It can not only automate the Sass compilation tasks, but also wrap any other mundane jobs, such as linting and minifying and cleaning your code, into tasks and run them automatically for you. By the end of this article, you will be comfortable using Grunt and its plugins to establish a flexible workflow when working with Sass. Using Grunt in your workflow is vital. You will then be shown how to combine Grunt's plugins to establish a workflow for compiling Sass in real time. Grunt becomes a tool to automate integration testing, deployments, builds, and development in which you can use. Finally, by understanding the automation process, you will also learn how to use alternative tools, such as Gulp. Gulp is a JavaScript task runner for node.js and relatively new in comparison to Grunt, so Grunt has more plugins and a wider community support. Currently, the Gulp community is growing fast. The biggest difference between Grunt and Gulp is that Gulp does not save intermediary files, but pipes these files' content in memory to the next stream. A stream enables you to pass some data through a function, which will modify the data and then pass the modified data to the next function. In many situations, Gulp requires less configuration settings, so some people find Gulp more intuitive and easier to learn. In this article, Grunt has been chosen to demonstrate how to run a task runner; this choice does not mean that you will have to prefer the usage of Grunt in your own project. Both the task runners can run all the tasks described in this article. Simply choose the task runner that suits you best. This recipe demonstrates shortly how to compile your Sass code with Gulp. In this article, you should enter your commands in the command prompt. Linux users should open a terminal, while Mac users should run Terminal.app and Window users should use the cmd command for command line usage. Installing Grunt Grunt is essentially a Node.js module; therefore, it requires Node.js to be installed. The goal of this recipe is to show you how to install Grunt on your system and set up your project. Getting ready Installing Grunt requires both Node.js and npm. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications, and npm is a package manager for Node.js. You can download the Node.js source code or a prebuilt installer for your platform at https://nodejs.org/en/download/. Notice that npm is bundled with node. Also, read the instructions at https://github.com/npm/npm#super-easy-install. How to do it... After installing Node.js and npm, installing Grunt is as simple as running a single command, regardless of the operating system that you are using. Just open the command line or the Terminal and execute the following command: npm install -g grunt-cli That's it! This command will install Grunt globally and make it accessible anywhere on your system. Run the grunt --version command in the command prompt in order to confirm that Grunt has been successfully installed. If the installation is successful, you should see the version of Grunt in the Terminal's output: grunt --version grunt-cli v0.1.11 After installing Grunt, the next step is to set it up for your project: Make a folder on your desktop and call it workflow. Then, navigate to it and run the npm init command to initialize the setup process: mkdir workflow && cd $_ && npm init Press Enter for all the questions and accept the defaults. You can change these settings later. This should create a file called package.json that will contain some information about the project and the project's dependencies. In order to add Grunt as a dependency, install the Grunt package as follows: npm install grunt --save-dev Now, if you look at the package.json file, you should see that Grunt is added to the list of dependencies: ..."devDependencies": {"grunt": "~0.4.5" } In addition, you should see an extra folder created. Called node_modules, it will contain Grunt and other modules that you will install later in this article. How it works... In the preceding section, you installed Grunt (grunt-cli) with the -g option. The -g option installs Grunt globally on your system. Global installation requires superuser or administrator rights on most systems. You need to run only the globally installed packages from the command line. Everything that you will use with the require() function in your programs should be installed locally in the root of your project. Local installation makes it possible to solve your project's specific dependencies. More information about global versus local installation of npm modules can be found at https://www.npmjs.org/doc/faq.html. There's more... Node package managers are available for a wide range of operation systems, including Windows, OSX, Linux, SunOS, and FreeBSD. A complete list of package managers can be found at https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager. Notice that these package managers are not maintained by the Node.js core team. Instead, each package manager has its own maintainer. See also The npm Registry is a public collection of packages of open source code for Node.js, frontend web apps, mobile apps, robots, routers, and countless other needs of the JavaScript community. You can find the npm Registry at https://www.npmjs.org/. Also, notice that you do not have to use Task Runners to create build chains. Keith Cirkel wrote about how to use npm as a build tool at http://blog.keithcirkel.co.uk/how-to-use-npm-as-a-build-tool/. Installing Grunt plugins Grunt plugins are the heart of Grunt. Every plugin serves a specific purpose and can also work together with other plugins. In order to use Grunt to set up your Sass workflow, you need to install several plugins. You can find more information about these plugins in this recipe's How it works... section. Getting ready Before you install the plugins, you should first create some basic files and folders for the project. You should install Grunt and create a package.json file for your project. Also, create an index.html file to inspect the results in your browser. Two empty folders should be created too. The scss folder contains your Sass code and the css folder contains the compiled CSS code. Navigate to the root of the project, repeat the steps from the Installing Grunt recipe of this article, and create some additional files and directories that you are going to work with throughout the article. In the end, you should end up with the following folder and file structure: How to do it... Grunt plugins are essentially Node.js modules that can be installed and added to the package.json file in the list of dependencies using npm. To do this, follow the ensuing steps: Navigate to the root of the project and run the following command, as described in the Installing Grunt recipe of this article: npm init Install the modules using npm, as follows: npm install grunt-contrib-sass load-grunt-tasks grunt-postcss --save-dev Notice the single space before the backslash in each line. For example, on the second line, grunt-contrib-sass , there is a space before the backslash at the end of the line. The space characters are necessary because they act as separators. The backslash at the end is used to continue the commands on the next line. The npm install command will download all the plugins and place them in the node_modules folder in addition to including them in the package.json file. The next step is to include these plugins in the Gruntfile.js file. How it works... Grunt plugins can be installed and added to the package.json file using the npm install command followed by the name of the plugins separated by a space, and the --save-dev flag: npm install nameOfPlugin1 nameOfPlugin2 --save-dev The --save-dev flag adds the plugin names and a tilde version range to the list of dependencies in the package.json file so that the next time you need to install the plugins, all you need to do is run the npm install command. This command looks for the package.json file in the directory from which it was called, and will automatically download all the specified plugins. This makes porting workflows very easy; all it takes is copying the package.json file and running the npm install command. Finally, the package.json file contains a JSON object with metadata. It is also worth explaining the long command that you have used to install the plugins in this recipe. This command installs the plugins that are continued on to the next line by the backslash. It is essentially equivalent to the following: npm install grunt-contrib-sass –-save-dev npm install load-grunt-tasks –-save-dev npm install grunt-postcss –-save-dev As you can see, it is very repetitive. However, both yield the same results; it is up to you to choose the one that you feel more comfortable with. The node_modules folder contains all the plugins that you install with npm. Every time you run npm install name-of-plugin, the plugin is downloaded and placed in the folder. If you need to port your workflow, you do not need to copy all the contents of the folder. In addition, if you are using a version control system, such as Git, you should add the node_modules folder to the .gitignore file so that the folder and its subdirectories are ignored. There's more... Each Grunt plugin also has its own metadata set in a package.json file, so plugins can have different dependencies. For instance, the grunt-contrib-sass plugin, as described in the Adding the Sass compiler task recipe, has set its dependencies as follows: "dependencies": {     "async": "^0.9.0",     "chalk": "^0.5.1",     "cross-spawn": "^0.2.3",     "dargs": "^4.0.0",     "which": "^1.0.5"   } Besides the dependencies described previously, this task also requires you to have Ruby and Sass installed. In the following list, you will find the plugins used in this article, followed by a brief description: load-grunt-tasks: This loads all the plugins listed in the package.json file grunt-contrib-sass: This compiles Sass files into CSS code grunt-postcss: This enables you to apply one or more postprocessors to your compiled CSS code CSS postprocessors enable you to change your CSS code after compilation. In addition to installing plugins, you can remove them as well. You can remove a plugin using the npm uninstall name-of-plugin command, where name-of-plugin is the name of the plugin that you wish to remove. For example, if a line in the list of dependencies of your package.json file contains grunt-concurrent": "~0.4.2",, then you can remove it using the following command: npm uninstall grunt-concurrent Then, you just need to make sure to remove the name of the plugin from your package.json file so that it is not loaded by the load-grunt-tasks plugin the next time you run a Grunt task. Running the npm prune command after removing the items from the package.json file will also remove the plugins. The prune command removes extraneous packages that are not listed in the parent package's dependencies list. See also More information on the npm version's syntax can be found at https://www. npmjs.org/doc/misc/semver.html  Also, see http://caniuse.com/ for more information on the Can I Use database Utilizing the Gruntfile.js file The Gruntfile.js file is the main configuration file for Grunt that handles all the tasks and task configurations. All the tasks and plugins are loaded using this file. In this recipe, you will create this file and will learn how to load Grunt plugins using it. Getting ready First, you need to install Node and Grunt, as described in the Installing Grunt recipe of this article. You will also have to install some Grunt plugins, as described in the Installing Grunt plugins recipe of this article. How to do it... Once you have installed Node and Grunt, follow these steps: In your Grunt project directory (the folder that contains the package.json file), create a new file, save it as Gruntfile.js, and add the following lines to it: module.exports = function(grunt) {   grunt.initConfig({     pkg: grunt.file.readJSON('package.json'),       //Add the Tasks configurations here.   }); // Define Tasks here }; This is the simplest form of the Gruntfile.js file that only contains two information variables. The next step is to load the plugins that you installed in the Installing Grunt plugins recipe. Add the following lines at the end of your Gruntfile.js file: grunt.loadNpmTasks('grunt-sass'); In the preceding line of code, grunt-sass is the name of the plugin you want to load. That is all it takes to load all the necessary plugins. The next step is to add the configurations for each task to the Gruntfile.js file. How it works... Any Grunt plugin can be loaded by adding a line of JavaScript to the Gruntfile.js file, as follows: grunt.loadNpmTasks('name-of-module'); This line should be added every time a new plugin is installed so that Grunt can access the plugin's functions. However, it is tedious to load every single plugin that you install. In addition, you will soon notice that, as your project grows, the number of configuration lines will increase as well. The Gruntfile.js file should be written in JavaScript or CoffeeScript. Grunt tasks rely on configuration data defined in a JSON object passed to the grunt.initConfig method. JavaScript Object Notation (JSON) is an alternative for XML and used for data exchange. JSON describes name-value pairs written as "name": "value". All the JSON data is separated by commas with JSON objects written inside curly brackets and JSON arrays inside square brackets. Each object can hold more than one name/value pair with each array holding one or more objects. You can also group tasks into one task. Your alias groups of tasks using the following line of code: grunt.registerTask('alias',['task1', 'task2']); There's more... Instead of loading all the required Grunt plugins one by one, you can load them automatically with the load-grunt-tasks plugin. You can install this by using the following command in the root of your project: npm install load-grunt-tasks --save-dev Then, add the following line at the very beginning of your Gruntfile.js file after module.exports: require('load-grunt-tasks')(grunt); Now, your Gruntfile.js file should look like this: module.exports = function(grunt) {   require('load-grunt-tasks')(grunt);   grunt.initConfig({     pkg: grunt.file.readJSON('package.json'),       //Add the Tasks configurations here.   }); // Define Tasks here }; The load-grunt-tasks plugin loads all the plugins specified in the package.json file. It simply loads the plugins that begin with the grunt- prefix or any pattern that you specify. This plugin will also read dependencies, devDependencies, and peerDependencies in your package.json file and load the Grunt tasks that match the provided patterns. A pattern to load specifically chosen plugins can be added as a second parameter. You can load, for instance, all the grunt-contrib tasks with the following code in your Gruntfile.js file: require('load-grunt-tasks')(grunt, {pattern: 'grunt-contrib-*'}); See also Read more about the load-grunt-tasks module at https://github.com/sindresorhus/load-grunt-task Adding a configuration definition for a plugin Any Grunt task needs a configuration definition. The configuration definitions are usually added to the Gruntfile.js file itself and are very easy to set up. In addition, it is very convenient to define and work with them because they are all written in the JSON format. This makes it very easy to spot the configurations in the plugin's documentation examples and add them to your Gruntfile.js file. In this recipe, you will learn how to add the configuration for a Grunt task. Getting ready For this recipe, you will first need to create a basic Gruntfile.js file and install the plugin you want to configure. If you want to install the grunt-example plugin, you can install it using the following command in the root of your project: npm install grunt-example --save-dev How to do it... Once you have created the basic Gruntfile.js file (also refer to the Utilizing the Gruntfile.js file recipe of this article), follow this step: A simple form of the task configuration is shown in the following code. Start by adding it to your Gruntfile.js file wrapped inside grunt.initConfig{}: example: {   subtask: {    files: {      "stylesheets/main.css":      "sass/main.scss"     }   } } How it works... If you look closely at the task configuration, you will notice the files field that specifies what files are going to be operated on. The files field is a very standard field that appears in almost all the Grunt plugins simply due to the fact that many tasks require some or many file manipulations. There's more... The Don't Repeat Yourself (DRY) principle can be applied to your Grunt configuration too. First, define the name and the path added to the beginning of the Gruntfile.js file as follows: app {  dev : "app/dev" } Using the templates is a key in order to avoid hard coded values and inflexible configurations. In addition, you should have noticed that the template has been used using the <%= %> delimiter to expand the value of the development directory: "<%= app.dev %>/css/main.css": "<%= app.dev %>/scss/main.scss"   The <%= %> delimiter essentially executes inline JavaScript and replaces values, as you can see in the following code:   "app/dev/css/main.css": "app/dev/scss/main.scss" So, put simply, the value defined in the app object at the top of the Gruntfile.js file is evaluated and replaced. If you decide to change the name of your development directory, for example, all you need to do is change the app's variable that is defined at the top of your Gruntfile.js file. Finally, it is also worth mentioning that the value for the template does not necessarily have to be a string and can be a JavaScript literal. See also You can read more about templates in the Templates section of Grunt's documentation at http://gruntjs.com/configuring- tasks#templates Adding the Sass compiler task The Sass tasks are the core task that you will need for your Sass development. It has several features and options, but at the heart of it is the Sass compiler that can compile your Sass files into CSS. By the end of this recipe, you will have a good understanding of this plugin, how to add it to your Gruntfile.js file, and how to take advantage of it. In this recipe, the grunt-contrib-sass plugin will be used. This plugin compiles your Sass code by using Ruby Sass. You should use the grunt-sass plugin to compile Sass into CSS with node-sass (LibSass). Getting ready The only requirement for this recipe is to have the grunt-contrib-sass plugin installed and loaded in your Gruntfile.js file. If you have not installed this plugin in the Installing Grunt Plugins recipe of this article, you can do this using the following command in the root of your project: npm install grunt-contrib-sass --save-dev You should also install grunt local by running the following command: npm install grunt --save-dev Finally, your project should have the file and directory, as describe in the Installing Grunt plugins recipe of this article. How to do it... An example of the Sass task configuration is shown in the following code. Start by adding it to your Gruntfile.js file wrapped inside the grunt.initConfig({}) code. Now, your Gruntfile.js file should look as follows: module.exports = function(grunt) {   grunt.initConfig({     //Add the Tasks configurations here.     sass: {                                            dist: {                                            options: {                                       style: 'expanded'         },         files: {                                         'stylesheets/main.css': 'sass/main.scss'  'source'         }       }     }   });     grunt.loadNpmTasks('grunt-contrib-sass');     // Define Tasks here    grunt.registerTask('default', ['sass']);  } Then, run the following command in your console: grunt sass The preceding command will create a new stylesheets/main.css file. Also, notice that the stylesheets/main.css.map file has also been automatically created. The Sass compiler task creates CSS sourcemaps to debug your code by default. How it works... In addition to setting up the task configuration, you should run the Grunt command to test the Sass task. When you run the grunt sass command, Grunt will look for a configuration called Sass in the Gruntfile.js file. Once it finds it, it will run the task with some default options if they are not explicitly defined. Successful tasks will end with the following message: Done, without errors. There's more... There are several other options that you can include in the Sass task. An option can also be set at the global Sass task level, so the option will be applied in all the subtasks of Sass. In addition to options, Grunt also provides targets for every task to allow you to set different configurations for the same task. In other words, if, for example, you need to have two different versions of the Sass task with different source and destination folders, you could easily use two different targets. Adding and executing targets are very easy. Adding more builds just follows the JSON notation, as shown here:    sass: {                                      // Task       dev: {                                    // Target         options: {                               // Target options           style: 'expanded'         },         files: {                                 // Dictionary of files         'stylesheets/main.css': 'sass/main.scss' // 'destination': 'source'         }       },       dist: {                               options: {                        style: 'expanded',           sourcemap: 'none'                  },         files: {                                      'stylesheets/main.min.css': 'sass/main.scss'         }       }     } In the preceding example, two builds are defined. The first one is named dev and the second is called dist. Each of these targets belongs to the Sass task, but they use different options and different folders for the source and the compiled Sass code. Moreover, you can run a particular target using grunt sass:nameOfTarget, where nameOfTarge is the name of the target that you are trying to use. So, for example, if you need to run the dist target, you will have to run the grunt sass:dist command in your console. However, if you need to run both the targets, you could simply run grunt sass and it would run both the targets sequentially. As already mentioned, the grunt-contrib-sass plugin compiles your Sass code by using Ruby Sass, and you should use the grunt-sass plugin to compile Sass to CSS with node-sass (LibSass). To switch to the grunt-sass plugin, you will have to install it locally first by running the following command in your console: npm install grunt-sass Then, replace grunt.loadNpmTasks('grunt-contrib-sass'); with grunt.loadNpmTasks('grunt-sass'); in the Gruntfile.js file; the basic options for grunt-contrib-sass and grunt-sass are very similar, so you have to change the options for the Sass task when switching to grunt-sass. Finally, notice that grunt-contrib-sass also has an option to turn Compass on. See also Please refer to Grunt's documentation for a full list of options, which is available at https://gruntjs/grunt-contrib-sass#options Also, read Grunt's documentation for more details about configuring your tasks and targets at http://gruntjs.com/configuring-tasks#task-configuration-and-targets github.com/ Summary In this article you studied about installing Grunt, installing Grunt plugins, utilizing the Gruntfile.js file, adding a configuration definition for a plugin and adding the Sass compiler task. Resources for Article: Further resources on this subject: Meeting SAP Lumira [article] Security in Microsoft Azure [article] Basic Concepts of Machine Learning and Logistic Regression Example in Mahout [article]
Read more
  • 0
  • 0
  • 35045

article-image-configuring-redmine
Packt
18 Apr 2016
15 min read
Save for later

Configuring Redmine

Packt
18 Apr 2016
15 min read
In this article by Andriy Lesyuk, author of Mastering Redmine, whentalking about the web interface (that is, not system files), all of the global configuration of Redmine can be done on the Settings page of the Administration menu. This is actually the page that this articleis based around. Some settings on this page, however, depend on special system files or third-party tools that need to be installed. And these are the other things that we will discuss. You might expect to see detailed explanations for all the administration settings here, but instead, we will review in detail only a few of them, as I believe that the others do not need to be explained or can easily be tested. So generally, we will focus on hard-to-understand settings and thosesettings that need to be configured additionally in some special way or have some obscurities. So, why should you read this articleif you are not an administrator? Some features of Redmine are available only if they have been configured, so by reading this article, you will learn what extra features exist and get an idea of how to enable them. In this article, we will cover the following topics: The first thing to fix The general settings Authentication (For more resources related to this topic, see here.) The first thing to fix A fresh Redmine installation has only one user account, which has administrator privileges. You can see it in the following screenshot: This account is exactly the same by default on all Redmine installations. That's why it is extremely important to change its credentials immediately after you complete the installation, especially for Redmine instances that can be accessed publicly. The administrator credentials can be changed on the Users page of the Administration menu. To do this, click on the admin link. You will see this screen: In this form, you should specify a new password in the Password and Confirmation fields. Also, it's recommended that you change the login to something different. Additionally, consider specifying your e-mail instead of admin@example.net (at least), changing the First name and Last name. The general settings Everything that is possible to configure at the global level (the opposite is the project level) can be found under the Administration link in the top-left menu. Of course, this link is available only for administrators If you click on the Administrationlink, you will get the list of available administration pages on the sidebar to the right. Most of them are for managing Redmine objects, such as projects and trackers. We will be discussing only general, system-wide configuration. Most of the settings that we are going to review are compiled on the Settings page, as shown in the following screenshot: As all of these settings can't fit on a single page, Redmine organizes them into tabs. We will discuss the Authentication, Email notifications, Incoming emails, and Repositories tabs in the next sections. The General tab So let's start with the General tab, which can be seen in the previous screenshot. Settings in this tab control the general behavior of Redmine, thus Application title is the name of the website that is shown at the top of non-project pages, Welcome text is displayed on the start page of Redmine, Objects per page options specifies how many objects users will be able to see on a page, such settings as Search results per page and Days displayed on project activity allow to control the number of objects that are shown on search results and activity pages correspondingly, the Protocol setting specifies the preferred protocol that will be used in links to the website, Wiki history compression controls whether the history of Wiki changes should be compressed to save the space, and finally Maximum number of items in Atom feeds sets the limit for the amount of items that are returned in the Atom feed. Additionally, the General tab contains settings, which I want to discuss in detail. The Cache formatted text setting Redmine supports text formatting through the lightweight markup language Textile or Markdown. While conversion of text from such a language to HTML is quite fast, in some circumstances, you may want to cache the resulting HTML. If that is the case, the Cache formatted text checkbox is what you need. When this setting is enabled, all Textile or Markdown content that is larger than 2 KB will be cached. The cached HTML will be refreshed only when any changes are made to the source text, so you should take this into account if you are using a Wiki extension that generates the dynamic content (such as my WikiNG plugin). Unless performance is extremely critical for you, you should leave this checkbox unchecked. Other settings tips Here are some other tips for the General tab: The value of the Host name and path setting will be used to generate URLs in the e-mail messages that will be sent to users, so it's important to specify a proper value here. For the Text formatting, select the markup language that is best for you. It's also possible to select none here, but I would not recommend to do this. The Display tab As it comes from the name, this tab contains settings related to the look and feel of Redmine. Its settings can be seen in the following screenshot: Using the Theme setting users can choose a theme for the Redmine interface. The Default language setting allows to specify which language will be used for the interface, if Redmine fails to determine the language of the user. Thus, for not logged-in users it will attempt to use the preferred language of the user's browser, what can be disabled by the Force default language for anonymous users setting, and for logged-in users it will use the language that is chosen by users in their profiles, what can be disabled by the Force default language for logged-in users setting. By default the user's language also affects the start day of the week, and date and time formats, what can also be changed by the Start calendars on, Date format and Time format settings correspondingly. The display format of the user name is controlled by the Users display format setting. Finally, the Thumbnails size (in pixels) setting specifies the size of thumbnail images in pixels. Now let's check what the rest of settings mean. The Use Gravatar user icons setting Once I used a WordPress form to leave a comment on someone's blog. That form asked me to specify the first name, the last name, my e-mail address, and the text. After submitting it, I was surprised to see my photo near the comment. That's what Gravatar does. Gravatar stands for Globally Recognized Avatar. It's a web service that allows you to assign an image for each user's e-mail. Then, third-party sites can fetch the corresponding image by supplying a hash of the user's e-mail address. The Use Gravatar user icons setting enables this behavior for Redmine. Having this option checked is a good idea (unless potential users of your Redmine installation can be unable to access Internet because, for example, Redmine is going to be used in an isolated Intranet. The Default Gravatar image setting What happens if a Gravatar is not available for the user's e-mail? In such cases, the Gravatar service returns a default image, which depends on the Default Gravatar image setting. The following table shows the six available themes of the default avatar image: Theme Sample image Description None The default image, which is shown if no other theme is selected Wavatars A generated face with differing features and background Identicons A geometric pattern Monster IDs A generated monster image with different colors, face, and so on Retro A generated 8-bit, arcade-style pixelated face Mystery man A simple, cartoon-style silhouetted outline of a person   For all of these themes, except Mystery manandnone, Gravatar generates an avatar image that is based on the hash of the user's e-mail and is therefore unique to it. The Redmine Local Avatars plugin Consider installing the Redmine Local Avatars plugin by Andrew Chaika, Luca Pireddu, and Ricardo Santos, if you preferwant users to upload their avatars directly onto Redmine: https://github.com/thorin/redmine_local_avatars This plugin will also let your users take their pictures with web cameras. The Display attachment thumbnails setting If the Display attachment thumbnails setting is enabled, all image attachments—no matter what object (for example, Wiki or issue) they are attached to—will be also seen under the attachment list as clickable thumbnails. If the user clicks on such a thumbnail, the full-size image will be opened. The Redmine Lightbox 2 plugin In pure Redmine, full-size images are opened in the same browser window. To open them in a lightbox, you can use the Lightbox 2 plugin that was created by Genki Zhang and Tobias Fischer: https://github.com/paginagmbh/redmine_lightbox2 Note that in order for this setting to work, you must have the ImageMagick's convert tool installed. The API tab In addition to the web interface that is intended for human Redmine comes with a special REST application programming interface (API) that is intended for third-party applications. Thus, Redmine REST API is used by Redmine Mylyn Connector for Eclipse and RedmineApp for iPhone. This interface can be enabled and configured under the API tab of the Settings page which is shown in the following screenshot: Let's check what these settings mean: If you need to support integration of third-party tools, you should turn on Redmine REST API using the Enable REST web service checkbox. But it is safe to keep this setting disabled, if you are not using any external Redmine tools. Redmine API can also be used via JavaScript in the web browser, but not if the API client (that is, a website, that runs JavaScript) is on different domain. In such cases to bypass the browser's same-origin policy the API client may use the technique called JSONP. As this technique is considered to be insecure it should be explicitly enabled using the Enable JSONP support setting. So in most cases you should leave this option disabled. The Files tab The Files tab contains settings related to file display and attachment as shown in the following screenshot: Here Allowed extensions and Disallowed extensions can be used to restrict file uploads by file extensions – thus you can use the former setting to allow certain extensions only or the latter one to forbid certain extensions only. Such settings as Maximum size of text files displayed inline and Maximum number of diff lines displayed control the amount of the file content that can be displayed. The rest settings are used more often: You may need to change the Maximum attachment size setting to a large value (which is in kB). Thus, project files (releases) are attachments as well, so if you expect your users to upload large files, consider changing this setting to a bigger value. The value of the Attachments and repositories encodings option is used to convert commit messages to UTF-8. Authentication There are two pages in Redmine intended for configuring the authentication. The first one is the Authentication tab on the Settings page, and the second one is the special LDAP Authentication page, which can be found in the Administration menu. Let's discuss these pages in detail. The Authentication tab The next tab in the administration settings is Authentication. The following screenshot shows the various options available under this tab: If the Authentication required setting is enabled, users won't be able to see the content of your Redmine without having logged in first. The Autologin setting can be used to let your users keep themselves logged in for some period of time using their browsers. The Self-registration setting controls, how user accounts are activated (the manual account activation option means that users should be enabled by administrators). The Allow users to delete their own account setting controls, whether users will be able to delete their accounts. The Minimum password length setting specifies the minimum size of the password in characters and the Require password change after setting can be used to force users to change their passwords periodically. The Lost password setting controls, whether users will be able to restore their passwords in cases when they, for example, have forgotten them. And finally the Maximum number of additional email addresses setting specifies the number of additional email addresses a user account may have. After a user logs in Redmine opens a user session. The lifetime of such session is controlled by the Session maximum lifetime setting (value disabled means that the session hangs forever). Such session can also be automatically terminated, if the user was not active for some time, what is controlled by the Session inactivity timeout setting (value disabled means that the session never expires). Now, let's discuss the very special setting, which we skipped. The Allow OpenID login and registration setting If you are running a public website with open registration, you perhaps know (or you will know if you want your Redmine installation to be public and open for user registration) that users do not like to register on each new site. This is understandable, as they do not want to create another password to remember or share their existing password with a new and therefore untrusted website. Besides, it's also a matter of sharing the e-mail address and—sometimes—remembering another login. That's when OpenID comes in handy. OpenID is an open-standard authentication protocol in which authentication (password verification) is performed by the OpenID provider. This popular protocol is currently supported by many companies, such as Yahoo!, PayPal, AOL, LiveJournal, IBM, VeriSign, and WordPress. In other words, servers of such companies can act as OpenID providers, and therefore users can log in to Redmine using their accounts that they have on these companies' websites if the Allow OpenID login and registration setting is enabled. Google used to support OpenID too, but they shut it down recently in favor of the OAuth2.0-based OpenID Connect authentication protocol. Despite the use of "OpenID" in its name, OpenID Connect is very different from OpenID. So, if your Redmine installation is (or is going to be) public, consider enabling this setting. But note that to log in using this protocol, your users will need to specify OpenID URL (the URL of the OpenID provider) in addition to Login and Password, as it can be seen on the following Redmine login form: LDAP authentication Just as OpenID is convenient for public sites to be used to authenticate external users, LDAP is convenient for private sites—to authenticate corporate users. Like OpenID, LDAP is a standard that describes how to authenticate against a special LDAP directory server, and is widely used by many applications such as MediaWiki, Apache, JIRA, Samba, SugarCRM, and so on. Also, as LDAP is an open protocol, it is supported by some other directory servers, such as Microsoft Active Directory and Apple Open Directory. For this reason, it is often used by companies as a centralized users' directory and an authentication server. To allow users to authenticate against an LDAP server, you should add it to the list of supported authentication modes on the LDAP authentication page, which is available in the Administration menu. To add a mode, click on the New authentication mode link. This will open the form: If the On-the-fly user creation option is checked, user accounts will be created automatically when users log in to the system for the first time. If this option is not checked, users will have to be added manually beforehand. Also, if you check this option, you need to specify all the attributes in the Attributes box, as they are going to be used to import user details from the LDAP server. Check with your LDAP server administrator to find out what values should be used in this form. In Redmine, LDAP authentication can be performed against many LDAP servers. Every such server is represented as an authentication source in the authentication mode list, which has just been mentioned. The corresponding source can also be seen in the user's profile and can even be changed to the internal Redmine authentication if needed. Summary I guess you have become a bit tired with all those general details, installations, configurations, integrations, and so on. You might expect to see detailed explanations for all the administration settings here, but instead, we will review in detail only a few of them, as I believe that the others do not need to be explained or can easily be tested. So generally, we will focus on hard-to-understand settings and those settings that need to be configured additionally in some special way or have some obscurities. Resources for Article: Further resources on this subject: Project management with Redmine [article] Redmine - Permissions and Security [article] Installing and customizing Redmine [article]
Read more
  • 0
  • 0
  • 5530

article-image-web-server-development
Packt
15 Apr 2016
24 min read
Save for later

Web Server Development

Packt
15 Apr 2016
24 min read
In this article by Holger Brunn, Alexandre Fayolle, and Daniel Eufémio Gago Reis, the authors of the book, Odoo Development Cookbook, have discussed how to deploy the web server in Odoo. In this article, we'll cover the following topics: Make a path accessible from the network Restrict access to web accessible paths Consume parameters passed to your handlers Modify an existing handler Using the RPC API (For more resources related to this topic, see here.) Introduction We'll introduce the basics of the web server part of Odoo in this article. Note that this article covers the fundamental pieces. All of Odoo's web request handling is driven by the Python library werkzeug (http://werkzeug.pocoo.org). While the complexity of werkzeug is mostly hidden by Odoo's convenient wrappers, it is an interesting read to see how things work under the hood. Make a path accessible from the network In this recipe, we'll see how to make an URL of the form http://yourserver/path1/path2 accessible to users. This can either be a web page or a path returning arbitrary data to be consumed by other programs. In the latter case, you would usually use the JSON format to consume parameters and to offer you data. Getting ready We'll make use of a ready-made library.book model. We want to allow any user to query the full list of books. Furthermore, we want to provide the same information to programs via a JSON request. How to do it… We'll need to add controllers, which go into a folder called controllers by convention. Add a controllers/main.py file with the HTML version of our page: from openerp import http from openerp.http import request class Main(http.Controller): @http.route('/my_module/books', type='http', auth='none') def books(self): records = request.env['library.book']. sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a function to serve the same information in the JSON format @http.route('/my_module/books/json', type='json', auth='none') def books_json(self): records = request.env['library.book']. sudo().search([]) return records.read(['name']) Add the file controllers/__init__.py: from . import main Add controllers to your __init__.py addon: from . import controllers After restarting your server, you can visit /my_module/books in your browser and get presented with a flat list of book names. To test the JSON-RPC part, you'll have to craft a JSON request. A simple way to do that would be using the following command line to receive the output on the command line: curl -i -X POST -H "Content-Type: application/json" -d "{}" localhost:8069/my_module/books/json If you get 404 errors at this point, you probably have more than one database available on your instance. In this case, it's impossible for Odoo to determine which database is meant to serve the request. Use the --db-filter='^yourdatabasename$' parameter to force using exact database you installed the module in. Now the path should be accessible. How it works… The two crucial parts here are that our controller is derived from openerp.http.Controller and that the methods we use to serve content are decorated with openerp.http.route. Inheriting from openerp.http.Controller registers the controller with Odoo's routing system in a similar way as models are registered by inheriting from openerp.models.Model; also, Controller has a meta class that takes care of this. In general, paths handled by your addon should start with your addon's name to avoid name clashes. Of course, if you extend some addon's functionality, you'll use this addon's name. openerp.http.route The route decorator allows us to tell Odoo that a method is to be web accessible in the first place, and the first parameter determines on which path it is accessible. Instead of a string, you can also pass a list of strings in case you use the same function to serve multiple paths. The type argument defaults to http and determines what type of request is to be served. While strictly speaking JSON is HTTP, declaring the second function as type='json' makes life a lot easier, because Odoo then handles type conversions itself. Don't worry about the auth parameter for now, it will be addressed in recipe Restrict access to web accessible paths. Return values Odoo's treatment of the functions' return values is determined by the type argument of the route decorator. For type='http', we usually want to deliver some HTML, so the first function simply returns a string containing it. An alternative is to use request.make_response(), which gives you control over the headers to send in the response. So to indicate when our page was updated the last time, we might change the last line in books() to the following: return request.make_response( result, [ ('Last-modified', email.utils.formatdate( ( fields.Datetime.from_string( request.env['library.book'].sudo() .search([], order='write_date desc', limit=1) .write_date) - datetime.datetime(1970, 1, 1) ).total_seconds(), usegmt=True)), ]) This code sends a Last-modified header along with the HTML we generated, telling the browser when the list was modified for the last time. We extract this information from the write_date field of the library.book model. In order for the preceding snippet to work, you'll have to add some imports on the top of the file: import email import datetime from openerp import fields You can also create a Response object of werkzeug manually and return that, but there's little gain for the effort. Generating HTML manually is nice for demonstration purposes, but you should never do this in production code. Always use templates as appropriate and return them by calling request.render(). This will give you localization for free and makes your code better by separating business logic from the presentation layer. Also, templates provide you with functions to escape data before outputting HTML. The preceding code is vulnerable to cross-site-scripting attacks if a user manages to slip a script tag into the book name, for example. For a JSON request, simply return the data structure you want to hand over to the client, Odoo takes care of serialization. For this to work, you should restrict yourself to data types that are JSON serializable, which are roughly dictionaries, lists, strings, floats and integers. openerp.http.request The request object is a static object referring to the currently handled request, which contains everything you need to take useful action. Most important is the property request.env, which contains an Environment object which is just the same as in self.env for models. This environment is bound to the current user, which is none in the preceding example because we used auth='none'. Lack of a user is also why we have to sudo() all our calls to model methods in the example code. If you're used to web development, you'll expect session handling, which is perfectly correct. Use request.session for an OpenERPSession object (which is quite a thin wrapper around the Session object of werkzeug), and request.session.sid to access the session id. To store session values, just treat request.session as a dictionary: request.session['hello'] = 'world' request.session.get('hello') Note that storing data in the session is not different from using global variables. Use it only if you must - that is usually the case for multi request actions like a checkout in the website_sale module. And also in this case, handle all functionality concerning sessions in your controllers, never in your modules. There's more… The route decorator can have some extra parameters to customize its behavior further. By default, all HTTP methods are allowed, and Odoo intermingles with the parameters passed. Using the parameter methods, you can pass a list of methods to accept, which usually would be one of either ['GET'] or ['POST']. To allow cross origin requests (browsers block AJAX and some other types of requests to domains other than where the script was loaded from for security and privacy reasons), set the cors parameter to * to allow requests from all origins, or some URI to restrict requests to ones originating from this URI. If this parameter is unset, which is the default, the Access-Control-Allow-Origin header is not set, leaving you with the browser's standard behavior. In our example, we might want to set it on /my_module/books/json in order to allow scripts pulled from other websites accessing the list of books. By default, Odoo protects certain types of requests from an attack known as cross-site request forgery by passing a token along on every request. If you want to turn that off, set the parameter csrf to False, but note that this is a bad idea in general. See also If you host multiple Odoo databases on the same instance and each database has different web accessible paths on possibly multiple domain names per database, the standard regular expressions in the --db-filter parameter might not be enough to force the right database for every domain. In that case, use the community module dbfilter_from_header from https://github.com/OCA/server-tools in order to configure the database filters on proxy level. To see how using templates makes modularity possible, see recipe Modify an existing handler later in the article. Restrict access to web accessible paths We'll explore the three authentication mechanisms Odoo provides for routes in this recipe. We'll define routes with different authentication mechanisms in order to show their differences. Getting ready As we extend code from the previous recipe, we'll also depend on the library.book model, so you should get its code correct in order to proceed. How to do it… Define handlers in controllers/main.py: Add a path that shows all books: @http.route('/my_module/all-books', type='http', auth='none') def all_books(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a path that shows all books and indicates which was written by the current user, if any: @http.route('/my_module/all-books/mark-mine', type='http', auth='public') def all_books_mark_mine(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table>' for record in records: result += '<tr>' if record.author_ids & request.env.user.partner_id: result += '<th>' else: result += '<td>' result += record.name if record.author_ids & request.env.user.partner_id: result += '</th>' else: result += '</td>' result += '</tr>' result += '</table></body></html>' return result Add a path that shows the current user's books: @http.route('/my_module/all-books/mine', type='http', auth='user') def all_books_mine(self): records = request.env['library.book'].search([ ('author_ids', 'in', request.env.user.partner_id.ids), ]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result With this code, the paths /my_module/all_books and /my_module/all_books/mark_mine look the same for unauthenticated users, while a logged in user sees her books in a bold font on the latter path. The path /my_module/all-books/mine is not accessible at all for unauthenticated users. If you try to access it without being authenticated, you'll be redirected to the login screen in order to do so. How it works… The difference between authentication methods is basically what you can expect from the content of request.env.user. For auth='none', the user record is always empty, even if an authenticated user is accessing the path. Use this if you want to serve content that has no dependencies on users, or if you want to provide database agnostic functionality in a server wide module. The value auth='public' sets the user record to a special user with XML ID, base.public_user, for unauthenticated users, and to the user's record for authenticated ones. This is the right choice if you want to offer functionality to both unauthenticated and authenticated users, while the authenticated ones get some extras, as demonstrated in the preceding code. Use auth='user' to be sure that only authenticated users have access to what you've got to offer. With this method, you can be sure request.env.user points to some existing user. There's more… The magic for authentication methods happens in the ir.http model from the base addon. For whatever value you pass to the auth parameter in your route, Odoo searches for a function called _auth_method_<yourvalue> on this model, so you can easily customize this by inheriting this model and declaring a method that takes care of your authentication method of choice. As an example, we provide an authentication method base_group_user which enforces a currently logged in user who is a member of the group with XML ID, base.group_user: from openerp import exceptions, http, models from openerp.http import request class IrHttp(models.Model): _inherit = 'ir.http' def _auth_method_base_group_user(self): self._auth_method_user() if not request.env.user.has_group('base.group_user'): raise exceptions.AccessDenied() Now you can say auth='base_group_user' in your decorator and be sure that users running this route's handler are members of this group. With a little trickery you can extend this to auth='groups(xmlid1,…)', the implementation of this is left as an exercise to the reader, but is included in the example code. Consume parameters passed to your handlers It's nice to be able to show content, but it's better to show content as a result of some user input. This recipe will demonstrate the different ways to receive this input and react to it. As the recipes before, we'll make use of the library.book model. How to do it… First, we'll add a route that expects a traditional parameter with a book's ID to show some details about it. Then, we'll do the same, but we'll incorporate our parameter into the path itself: Add a path that expects a book's ID as parameter: @http.route('/my_module/book_details', type='http', auth='none') def book_details(self, book_id): record = request.env['library.book'].sudo().browse( int(book_id)) return u'<html><body><h1>%s</h1>Authors: %s' % ( record.name, u', '.join(record.author_ids.mapped( 'name')) or 'none', ) Add a path where we can pass the book's ID in the path @http.route("/my_module/book_details/<model('library.book') :book>", type='http', auth='none') def book_details_in_path(self, book): return self.book_details(book.id) If you point your browser to /my_module/book_details?book_id=1, you should see a detail page of the book with ID 1. If this doesn't exist, you'll receive an error page. The second handler allows you to go to /my_module/book_details/1 and view the same page. How it works… By default, Odoo (actually werkzeug) intermingles with GET and POST parameters and passes them as keyword argument to your handler. So by simply declaring your function as expecting a parameter called book_id, you introduce this parameter as either GET (the parameter in the URL) or POST (usually passed by forms with your handler as action) parameter. Given that we didn't add a default value for this parameter, the runtime will raise an error if you try to access this path without setting the parameter. The second example makes use of the fact that in a werkzeug environment, most paths are virtual anyway. So we can simply define our path as containing some input. In this case, we say we expect the ID of a library.book as the last component of the path. The name after the colon is the name of a keyword argument. Our function will be called with this parameter passed as keyword argument. Here, Odoo takes care of looking up this ID and delivering a browse record, which of course only works if the user accessing this path has appropriate permissions. Given that book is a browse record, we can simply recycle the first example's function by passing book.id as parameter book_id to give out the same content. There's more… Defining parameters within the path is a functionality delivered by werkzeug, which is called converters. The model converter is added by Odoo, which also defines the converter, models, that accepts a comma separated list of IDs and passes a record set containing those IDs to your handler. The beauty of converters is that the runtime coerces the parameters to the expected type, while you're on your own with normal keyword parameters. These are delivered as strings and you have to take care of the necessary type conversions yourself, as seen in the first example. Built-in werkzeug converters include int, float, and string, but also more intricate ones such as path, any, or uuid. You can look up their semantics at http://werkzeug.pocoo.org/docs/0.11/routing/#builtin-converters. See also Odoo's custom converters are defined in ir_http.py in the base module and registered in the _get_converters method of ir.http. As an exercise, you can create your own converter that allows you to visit the /my_module/book_details/Odoo+cookbook page to receive the details of this book (if you added it to your library before). Modify an existing handler When you install the website module, the path /website/info displays some information about your Odoo instance. In this recipe, we override this in order to change this information page's layout, but also to change what is displayed. Getting ready Install the website module and inspect the path /website/info. Now craft a new module that depends on website and uses the following code. How to do it… We'll have to adapt the existing template and override the existing handler: Override the qweb template in a file called views/templates.xml: <?xml version="1.0" encoding="UTF-8"?> <odoo> <template id="show_website_info" inherit_id="website.show_website_info"> <xpath expr="//dl[@t-foreach='apps']" position="replace"> <table class="table"> <tr t-foreach="apps" t-as="app"> <th> <a t-att-href="app.website"> <t t-esc="app.name" /></a> </th> <td><t t-esc="app.summary" /></td> </tr> </table> </xpath> </template> </odoo> Override the handler in a file called controllers/main.py: from openerp import http from openerp.addons.website.controllers.main import Website class Website(Website): @http.route() def website_info(self): result = super(Website, self).website_info() result.qcontext['apps'] = result.qcontext[ 'apps'].filtered( lambda x: x.name != 'website') return result Now when visiting the info page, we'll only see a filtered list of installed applications, and in a table as opposed to the original definition list. How it works In the first step, we override an existing QWeb template. In order to find out which that is, you'll have to consult the code of the original handler. Usually, it will end with the following command line, which tells you that you need to override template.name: return request.render('template.name', values) In our case, the handler uses a template called website.info, but this one is extended immediately by another template called website.show_website_info, so it's more convenient to override this one. Here, we replace the definition list showing installed apps with a table. In order to override the handler method, we must identify the class that defines the handler, which is openerp.addons.website.controllers.main.Website in this case. We import the class to be able to inherit from it. Now we override the method and change the data passed to the response. Note that what the overridden handler returns is a Response object and not a string of HTML as the previous recipes did for the sake of brevity. This object contains a reference to the template to be used and the values accessible to the template, but is only evaluated at the very end of the request. In general, there are three ways to change an existing handler: If it uses a QWeb template, the simplest way of changing it is to override the template. This is the right choice for layout changes and small logic changes. QWeb templates get a context passed, which is available in the response as the field qcontext. This usually is a dictionary where you can add or remove values to suit your needs. In the preceding example, we filter the list of apps to only contain apps which have a website set. If the handler receives parameters, you could also preprocess those in order to have the overridden handler behave the way you want. There's more… As seen in the preceding section, inheritance with controllers works slightly differently than model inheritance: You actually need a reference to the base class and use Python inheritance on it. Don't forget to decorate your new handler with the @http.route decorator; Odoo uses it as a marker for which methods are exposed to the network layer. If you omit the decorator, you actually make the handler's path inaccessible. The @http.route decorator itself behaves similarly to field declarations: every value you don't set will be derived from the decorator of the function you're overriding, so we don't have to repeat values we don't want to change. After receiving a response object from the function you override, you can do a lot more than just changing the QWeb context: You can add or remove HTTP headers by manipulating response.headers. If you want to render an entirely different template, you can set response.template. To detect if a response is based on QWeb in the first place, query response.is_qweb. The resulting HTML code is available by calling response.render(). Using the RPC API One of Odoo's strengths is its interoperability, which is helped by the fact that basically any functionality is available via JSON-RPC 2.0 and XMLRPC. In this recipe, we'll explore how to use both of them from client code. This interface also enables you to integrate Odoo with any other application. Making functionality available via any of the two protocols on the server side is explained in the There's more section of this recipe. We'll query a list of installed modules from the Odoo instance, so that we could show a list as the one displayed in the previous recipe in our own application or website. How to do it… The following code is not meant to run within Odoo, but as simple scripts: First, we query the list of installed modules via XMLRPC: #!/usr/bin/env python2 import xmlrpclib db = 'odoo9' user = 'admin' password = 'admin' uid = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/common') .authenticate(db, user, password, {}) odoo = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/object') installed_modules = odoo.execute_kw( db, uid, password, 'ir.module.module', 'search_read', [[('state', '=', 'installed')], ['name']], {'context': {'lang': 'fr_FR'}}) for module in installed_modules: print module['name'] Then we do the same with JSONRPC: import json import urllib2 db = 'odoo9' user = 'admin' password = 'admin' request = urllib2.Request( 'http://localhost:8069/web/session/authenticate', json.dumps({ 'jsonrpc': '2.0', 'params': { 'db': db, 'login': user, 'password': password, }, }), {'Content-type': 'application/json'}) result = urllib2.urlopen(request).read() result = json.loads(result) session_id = result['result']['session_id'] request = urllib2.Request( 'http://localhost:8069/web/dataset/call_kw', json.dumps({ 'jsonrpc': '2.0', 'params': { 'model': 'ir.module.module', 'method': 'search_read', 'args': [ [('state', '=', 'installed')], ['name'], ], 'kwargs': {'context': {'lang': 'fr_FR'}}, }, }), { 'X-Openerp-Session-Id': session_id, 'Content-type': 'application/json', }) result = urllib2.urlopen(request).read() result = json.loads(result) for module in result['result']: print module['name'] Both code snippets will print a list of installed modules, and because they pass a context that sets the language to French, the list will be in French if there are no translations available. How it works… Both snippets call the function search_read, which is very convenient because you can specify a search domain on the model you call, pass a list of fields you want to be returned, and receive the result in one request. In older versions of Odoo, you had to call search first to receive a list of IDs and then call read to actually read the data. search_read returns a list of dictionaries, with the keys being the names of the fields requested and the values the record's data. The ID field will always be transmitted, no matter if you requested it or not. Now, we need to look at the specifics of the two protocols. XMLRPC The XMLRPC API expects a user ID and a password for every call, which is why we need to fetch this ID via the method authenticate on the path /xmlrpc/2/common. If you already know the user's ID, you can skip this step. As soon as you know the user's ID, you can call any model's method by calling execute_kw on the path /xmlrpc/2/object. This method expects the database you want to execute the function on, the user's ID and password for authentication, then the model you want to call your function on, and then the function's name. The next two mandatory parameters are a list of positional arguments to your function, and a dictionary of keyword arguments. JSONRPC Don't be distracted by the size of the code example, that's because Python doesn't have built in support for JSONRPC. As soon as you've wrapped the urllib calls in some helper functions, the example will be as concise as the XMLRPC one. As JSONRPC is stateful, the first thing we have to do is to request a session at /web/session/authenticate. This function takes the database, the user's name, and their password. The crucial part here is that we record the session ID Odoo created, which we pass in the header X-Openerp-Session-Id to /web/dataset/call_kw. Then the function behaves the same as execute_kw from; we need to pass a model name and a function to call on it, then positional and keyword arguments. There's more… Both protocols allow you to call basically any function of your models. In case you don't want a function to be available via either interface, prepend its name with an underscore – Odoo won't expose those functions as RPC calls. Furthermore, you need to take care that your parameters, as well as the return values, are serializable for the protocol. To be sure, restrict yourself to scalar values, dictionaries, and lists. As you can do roughly the same with both protocols, it's up to you which one to use. This decision should be mainly driven by what your platform supports best. In a web context, you're generally better off with JSON, because Odoo allows JSON handlers to pass a CORS header conveniently (see the Make a path accessible from the network recipe for details). This is rather difficult with XMLRPC. Summary In this article, we saw how to start about with the web server architecture. Later on, we covered the Routes and Controllers that will be used in the article and their authentication, how the handlers consumes parameters, and how to use an RPC API, namely, JSON-RPC and XML-RPC. Resources for Article: Further resources on this subject: Advanced React [article] Remote Authentication [article] ASP.Net Site Performance: Improving JavaScript Loading [article]
Read more
  • 0
  • 0
  • 17536

Packt
15 Apr 2016
17 min read
Save for later

Finding Patterns in the Noise – Clustering and Unsupervised Learning

Packt
15 Apr 2016
17 min read
In this article by, Joseph J, author of Mastering Predictive Analytics with Python, we will cover one of the natural questions to ask about a dataset is if it contains groups. For example, if we examine financial markets as a time series of prices over time, are there groups of stocks that behave similarly over time? Likewise, in a set of customer financial transactions from an e-commerce business, are there user accounts distinguished by patterns of similar purchasing activity? By identifying groups using the methods described in this article, we can understand the data as a set of larger patterns rather than just individual points. These patterns can help in making high-level summaries at the outset of a predictive modeling project, or as an ongoing way to report on the shape of the data we are modeling. Likewise, the groupings produced can serve as insights themselves, or they can provide starting points for the models. For example, the group to which a datapoint is assigned can become a feature of this observation, adding additional information beyond its individual values. Additionally, we can potentially calculate statistics (such as mean and standard deviation) for other features within these groups, which may be more robust as model features than individual entries. (For more resources related to this topic, see here.) In contrast to the methods, grouping or clustering algorithms are known as unsupervised learning, meaning we have no response, such as a sale price or click-through rate, which is used to determine the optimal parameters of the algorithm. Rather, we identify similar datapoints, and as a secondary analysis might ask whether the clusters we identify share a common pattern in their responses (and thus suggest the cluster is useful in finding groups associated with the outcome we are interested in). The task of finding these groups, or clusters, has a few common ingredients that vary between algorithms. One is a notion of distance or similarity between items in the dataset, which will allow us to compare them. A second is the number of groups we wish to identify; this can be specified initially using domain knowledge, or determined by running an algorithm with different choices of initial groups to identify the best number of groups that describes a dataset, as judged by numerical variance within the groups. Finally, we need a way to measure the quality of the groups we've identified; this can be done either visually or through the statistics that we will cover. In this article we will dive into: How to normalize data for use in a clustering algorithm and to compute similarity measurements for both categorical and numerical data How to use k-means to identify an optimal number of clusters by examining the loss function How to use agglomerative clustering to identify clusters at different scales Using affinity propagation to automatically identify the number of clusters in a dataset How to use spectral methods to cluster data with nonlinear boundaries Similarity and distance The first step in clustering any new dataset is to decide how to compare the similarity (or dissimilarity) between items. Sometimes the choice is dictated by what kinds of similarity we are trying to measure, in others it is restricted by the properties of the dataset. In the following we illustrate several kinds of distance for numerical, categorical, time series, and set-based data—while this list is not exhaustive, it should cover many of the common use cases you will encounter in business analysis. We will also cover normalizations that may be needed for different data types prior to running clustering algorithms. Numerical distances Let's begin by looking at an example contained in the wine.data file. It contains a set of chemical measurements that describe the properties of different kinds of wines, and the class of quality (I-III) to which the wine is assigned (Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation, Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.). Open the file in an iPython notebook and look at the first few rows: Notice that in this dataset we have no column descriptions. We need to parse these from the dataset description file wine.data. With the following code, we generate a regular expression that will match a header name (we match a pattern where a number followed by a parenthesis has a column name after it, as you can see in the list of column names listed in the file), and add these to an array of column names along with the first column, which is the class label of the wine (whether it belongs to category I-III). We then assign this list to the dataframe column names: Now that we have appended the column names, we can look at a summary of the dataset: How can we calculate a similarity between wines based on this data? One option would be to consider each of the wines as a point in a thirteen-dimensional space specified by its dimensions (for example, each of the properties other than the class). Since the resulting space has thirteen dimensions, we can't directly visualize the datapoints using a scatterplot to see if they are nearby, but we can calculate distances just the same as with a more familiar 2- or 3-dimensional space using the Euclidean distance formula, which is simply the length of the straight line between two points. This formula for this length can be used whether the points are in a 2-dimensional plot or a more complex space such as this example, and is given by: Here aand bare rows of the dataset and nis the number of columns. One feature of the Euclidean distance is that columns whose scale is much different from others can distort it. In our example, the values describing the magnesium content of each wine are ~100 times greater than the magnitude of features describing the alcohol content or ash percentage. If we were to calculate the distance between these datapoints, it would largely be determined by the magnesium concentration (as even small differences on this scale overwhelmingly determine the value of the distance calculation), rather than any of its other properties. While this might sometimes be desirable, in most applications we do not favour one feature over another and want to give equal weight to all columns. To get a fair distance comparison between these points, we need to normalize the columns so that they fall into the same numerical range (have similar maxima and minima values). We can do so using the scale()function in scikit-learn:   This function will subtract the mean value of a column from each element and then divide each point by the standard deviation of the column. This normalization centers each column at 0 with variance 1, and in the case of normally distributed data this would make a standard normal distribution. Also note that the scale() function returns a numpy dataframe, which is why we must call dataframe on the output to use the pandas function describe(). Now that we've scaled the data, we can calculate Euclidean distances between the points: We've now converted our dataset of 178 rows and 13 columns into a square matrix, giving the distance between each of these rows. In other words, row I, column j in this matrix represents the Euclidean distance between rows I and j in our dataset. This 'distance matrix' is the input we will use for clustering inputs in the following section. If we just want to get a visual sense of how the points compare to each other, we could use multidimensional scaling (MDS)—Modern Multidimensional Scaling - Theory and Applications Borg, I., Groenen P., Springer Series in Statistics (1997), Nonmetric multidimensional scaling: a numerical method, Kruskal, J. Psychometrika, 29 (1964), and Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Kruskal, J. Psychometrika, 29, (1964)—to create a visualization. Multidimensional scaling attempts to find the set of lower dimensional coordinates (here, two dimensions) that best represents the distances in the higher dimensions of a dataset (here, the pairwise Euclidean distances we calculated from the 13 dimensions). It does this by minimizing the coordinates (x, y) according to the strain function: Strain(x1…..xn) = (1 – Sum(ijdij*<xi,xj>)2/Sum(ij(dij**2)Sumij<xi,x,j>**2))1/2 Where d are the distances we've calculated between points. In other words, we find coordinates that best capture the variation in the distance through the variation in dot product the coordinates. We can then plot the resulting coordinates, using the wine class to label points in the diagram. Note that the coordinates themselves have no interpretation (in fact, they could change each time we run the algorithm). Rather, it is the relative position of points that we are interested in: Given that there are many ways we could have calculated the distance between datapoints, is the Euclidean distance a good choice here? Visually, based on the multidimensional scaling plot, we can see there is separation between the classes based on the features we've used to calculate distance, so conceptually it appears that this is a reasonable choice in this case. However, the decision also depends on what we are trying to compare; if we are interested in detecting wines with similar attributes in absolute values, then it is a good metric. However, what if we're not interested so much in the absolute composition of the wine, but whether its variables follow similar trends among wines with different alcohol contents? In this case, we wouldn't be interested in the absolute difference in values, but rather the correlationbetween the columns. This sort of comparison is common for time series, which we turn to next. Correlations and time series For time series data, we are often concerned with whether the patterns between series exhibit the same variation over time, rather than their absolute differences in value. For example, if we were to compare stocks, we might want to identify groups of stocks whose prices move up and down in similar patterns over time. The absolute price is of less interest than this pattern of increase and decrease. Let's look at an example of the Dow Jones industrial average over time (Brown, M. S., Pelosi, M., and Dirska, H. (2013). Dynamic-radius Species-conserving Genetic Algorithm for the Financial Forecasting of Dow Jones Index Stocks and Machine Learning and Data Mining in Pattern Recognition, 7988, 27-41.): This data contains the daily stock price (for 6 months) for a set of 30 stocks. Because all of the numerical values (the prices) are on the same scale, we won't normalize this data as with the wine dimensions. We notice two things about this data. First, the closing price per week (the variable we will use to calculate correlation) is presented as a string. Second, the date is not in the current format for plotting. We will process both columns to fix this, converting the columns to a float and datetime object, respectively: With this transformation, we can now make a pivot table to place the closing prices for week as columns and individual stocks as rows: As we can see, we only need columns 2 and onwards to calculate correlations between rows. Let's calculate the correlation between these time series of stock prices by selecting the second column to end columns of the data frame, calculating the pairwise correlations distance metric, and visualizing it using MDS, as before: It is important to note that the Pearson coefficient, which we've calculated here, is a measure of linearcorrelation between these time series. In other words, it captures the linear increase (or decrease) of the trend in one price relative to another, but won't necessarily capture nonlinear trends. We can see this by looking at the formula for the Pearson correlation, which is given by: P(a,b) = cov(a,b)/sd(a)/sd(b) = Sum(a-mean(b))*Sum(b-mean(b))/Sqrt(Sum(a-mean(a))2* Sqrt(Sum(b-mean(b)) This value varies from 1 (highly correlated) to -1 (inversely correlated), with 0 representing no correlation (such as a cloud of points). You might recognize the numerator of this equation as the covariance, which is a measure of how much two datasets, a and b, vary with one another. You can understand this by considering that the numerator is maximized when corresponding points in both datasets are above or below their mean value. However, whether this accurately captures the similarity in the data depends upon the scale. In data that is distributed in regular intervals between a maximum and minimum, with roughly the same difference between consecutive values (which is essentially how a trend line appears), it captures this pattern well. However, consider a case in which the data is exponentially distributed, with orders of magnitude differences between the minimum and maximum, and the difference between consecutive datapoints also varyies widely. Here, the Pearson correlation would be numerically dominated by only the largest terms, which might or might not represent the overall similarity in the data. This numerical sensitivity also occurs in the numerator, which represents the product of the standard deviations of both datasets. Thus, the value of the correlation is maximized when the variation in the two datasets is roughly explained by the product of their individual variations; there is no 'left over' variation between the datasets that is not explained by their respective standard deviations. Looking at the first two stocks in this dataset, this assumption of linearity appears to be a valid one for comparing datapoints: In addition to verifying that these stocks have a roughly linear correlation, this command introduces some new functions in pandas you may find useful. The first is iloc, which allows you to select indexed rows from a dataframe. The second is transpose, which inverts the rows and columns. Here, we select the first two rows, transpose, and then select all rows (prices) after the first (since the first is the Ticker symbol) Despite the trend we see in this example, we could imagine a nonlinear trend between prices. In these cases, it might be better to measure, not the linear correlation of the prices themselves, but whether the high prices for one stock coincide with another. In other words, the rank of market days by price should be the same, even if the prices are nonlinearly related. We can also calculate this rank correlation, also known as the Spearman's Rho, using scipy, with the following formula: Rho(a,b) = 6 * sum(d^2) / n (n2-1) Where n is the number of datapoints in each of two sets a and b, and d is the difference in ranks between each pair of datapoints ai and bi. Because we only compare the ranks of the data, not their actual values, this measure can capture variations up and down between two datasets, even if they vary over wide numerical ranges. Let's see if plotting the results using the Spearman correlation metric generates any differences in the pairwise distance of the stocks: The Spearman correlation distances, based on the x and y axes, appear closer to each other, suggesting from the perspective of rank correlation that the time series appear more similar. Though they differ in their assumptions about how the two compared datasets are distributed numerically, Pearson and Spearman correlations share the requirement that the two sets are of the same length. This is usually a reasonable assumption, and will be true of most of the examples we consider in this book. However, for cases where we wish to compare time series of unequal lengths, we can use Dynamic Time Warping (DTW). Conceptually, the idea of DTW is to warp one time series to align with a second, by allowing us to open gaps in either dataset so that it becomes the same size as the second. What the algorithm needs to resolve is where the most similar areas of the two series are, so that gaps can be places in the appropriate locations. In the simplest implementation, DTW consists of the following steps: For a dataset a of length n and a dataset n of length m, construct a matrix m by n. Set the top row and the leftmost column of this matrix to both be infinity. For each point i in set a, and point j in set b, compare their similarity using a cost function. To this cost function, add the minimum of the element (i-1, j-1), (i-1, j), and (j-1, i)—moving up and left, left, or up). These conceptually represent the costs of opening a gap in one of the series, versus aligning the same element in both. At the end of step 3, we will have traced the minimum cost path to align the two series, and the DTW distance will be represented by the bottommost corner of the matrix, (n.m). A negative aspect of this algorithm is that step 3 involves computing a value for every element of series a and b. For large time series or large datasets, this can be computationally prohibitive. While a full discussion of algorithmic improvements is beyond the scope of our present examples, we refer interested readers to FastDTW (which we will use in our example) and SparseDTW as examples of improvements that can be evaluated using many fewer calculations (Al-Naymat, G., Chawla, S., & Taheri, J. (2012), SparseDTW: A Novel Approach to Speed up Dynamic Time Warping and Stan Salvador and Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pages 70-80, 20043). We can use the FastDTW algorithm to compare the stocks data as well, and to plot the resulting coordinates. First we will compare pairwise each pair of stocks and record their DTW distance in a matrix: For computational efficiency (because the distance between i and j equals the distance between stocks j and i), we calculate only the upper triangle of this matrix. We then add the transpose (for example, the lower triangle) to this result to get the full distance matrix. Finally, we can use MDS again to plot the results: Compared to the distribution of coordinates along the x and y axis for Pearson correlation and rank correlation, the DTW distances appear to span a wider range, picking up more nuanced differences between the time series of stock prices. Now that we've looked at numerical and time series data, as a last example let's examine calculating similarity in categorical datasets. Summary In this section, we learned how to identify groups of similar items in a dataset, an exploratory analysis that we might frequently use as a first step in deciphering new datasets. We explored different ways of calculating the similarity between datapoints and described what kinds of data these metrics might best apply to. We examined both divisive clustering algorithms, which split the data into smaller components starting from a single group, and agglomerative methods, where every datapoint starts as its own cluster. Using a number of datasets, we showed examples where these algorithms will perform better or worse, and some ways to optimize them. We also saw our first (small) data pipeline, a clustering application in PySpark using streaming data. Resources for Article: Further resources on this subject: Python Data Structures[article] Big Data Analytics[article] Data Analytics[article]
Read more
  • 0
  • 0
  • 12166

article-image-getting-started-forcecom
Packt
15 Apr 2016
17 min read
Save for later

Getting Started with Force.com

Packt
15 Apr 2016
17 min read
In this article by Siddhesh Kabe, the author of the book Salesforce Platform App Builder Certification Handbook, will introduce you to the Force.com platform. We will understand the life cycle of an application build using Force.com. We will define the multi-tenant architecture and understand how it will impact the data of the organization stored on the cloud. And finally, we will build our first application on Force.com. We will cover the following topics in this article: The multi-tenant architecture of Force.com Understanding the Force.com platform Application development on the Force.com platform Discussing the maintenance and releases schedule by Salesforce.com Types of Force.com applications Discussing the scenarios when to use point-and-click customization and when to use code Talking about Salesforce.com identity Developer resources So, let's get started and step into the cloud. (For more resources related to this topic, see here.) The cloud computing model of Force.com Force.com is a cloud computing platform used to build enterprise applications. The end user does not have to worry about networks, hardware, software licenses, or any other things. The data saved is completely secure in the cloud. The following features of Force.com make it a 100 percent cloud-based system: The multi-tenant architecture: The multi-tenant architecture is a way of serving multiple clients on the single software instance. Each client gets their own full version of the software configuration and data. They cannot utilize the other instance resources. The software is virtually partitioned into different instances. The basic structure of the multi-tenant architecture is shown in the following image: Just like how tenants in a single building share the resources of electricity and water, in the multi-tenant system, tenants share common resources and databases. In a multi-tenant system, such as Salesforce.com, different organizations use the same shared database system that is separated by a secure virtual partition. Special programs keep the data separated and make sure that no single organization monopolizes on the resources. Automatic upgrades: In a cloud computing system, all the new updates are automatically released to its subscribers. Any developments or customizations made during the previous version are automatically updated to the latest version without any manual modification to the code. This results in all instances of Salesforce staying up to date and on the same version. Subscription model: Force.com is distributed under the subscription model. The user can purchase a few licenses and build the system. After the system is up and successful, further user licenses can be purchased from Salesforce. This model ensures that there no large start up fees and we pay as we go, which adds fixed, predictable costs in the future. The subscription model can be visualized like the electricity distribution system. We pay for whatever electricity we use and not the complete generator and the infrastructure. Scalability: The multi-tenant kernel is already tested and running for many users simultaneously. If the organization is growing, there is always room for scaling the application with new users without worrying about the load balancing and data limitation. Force.com provides data storage per user, which means that the data storage increases with the number of users added to the organization. Upgrades and maintenance: Force.com releases three updated versions every year. The new releases consist of feature updates to Salesforce.com and the Force.com platform with selected top ideas from IdeaExchange. IdeaExchange is the community of Salesforce users where the users submit ideas and the community votes for them. The most popular ideas are considered by Salesforce in their next release. All the instances hosted on the servers are upgraded with no additional cost. The Salesforce maintenance outage during a major release is only 5 minutes. The sandboxes are upgraded early so there can be testing for compatibility with the new release. The new releases are backward compatible with previous releases, thus the old code will work with new versions. The upgrades are taken care of by Force.com and the end user gets the latest version running application. Understanding the new model of Salesforce1 platform In the earlier edition of the book, we discussed the Force.com platform in detail. In the last couple of years, Salesforce has introduced the new Salesforce1 platform. It encompasses all the existing features of the Force.com platform but also includes the new powerful tools for mobile development. The new Salesforce1 platform is build mobile first and all the existing features of cloud development are automatically available for mobiles. From Winter 16, Salesforce has also introduced the lighting experience. The lighting experience is another extension to the existing platform. It provides a brand new set of design and development library that lets developers build applications that work on mobiles as well as web. Let's take a detailed look at the services that form the platform offered by Force.com. The following section provides us with an overview of the Force.com platform. Force.com platform Force.com is the world's first cloud application development platform where end users can build, share, and run an application directly on the cloud. While most cloud computing systems provide the ability to deploy the code from the local machine, Force.com gives us the feature to directly write the code in cloud. The Force.com platform runs in a hosted multi-tenant environment, which gives the end users freedom to build their custom application without hardware purchases, database maintenance, and maintaining a software license. Salesforce.com provides the following main products: Sales force Automation, Sales Cloud Service and Support Center, Service Cloud The Exact Target Marketing Cloud Collaboration Cloud, Chatter The following screenshot shows the Force.com platform: The application built on Force.com is automatically hosted on the cloud platform. It can be used separately (without the standard Sales, Service, and Marketing cloud) or can be used in parallel with the existing Salesforce application. The users can access the application using a browser from any mobile, computer, tablet, and any of the operating system such as Windows, UNIX, Mac, and so on, giving them complete freedom of location. For a complete list of supported browsers, visit https://help.salesforce.com/apex/HTViewHelpDoc?id=getstart_browser_overview.htm. Model-View-Controller architecture The most efficient way to build an enterprise application is to clearly separate out the model, that is, data, the code, that is, controller, and the UI, that is, the View. By separating the three, we can make sure that each area is handled by an expert. The business logic is separated from the backend database and the frontend user interface.  It is also easy to upgrade a part of the system without disturbing the entire structure. The following diagram illustrates the model-view-controller of Force.com: We will be looking in detail at each layer in the MVC architecture in the subsequent article. Key technology behind the Force.com platform Force.com is a hosted multi-tenant service used to build a custom cloud computing application. It is a 100 percent cloud platform where we pay no extra cost for the hardware and network. Any application built on Force.com is directly hosted on the cloud and can be accessed using a simple browser from a computer or a mobile. The Force.com platform runs on some basic key technologies. The multi-tenant kernel The base of the platform forms a multi-tenant kernel where all users share a common code base and physical infrastructure. The multiple tenants, who are hosted on a shared server, share the resources under governor limits to prevent a single instance monopolizing the resources. The custom code and data are separated by software virtualization and users cannot access each other's code. The multi-tenant kernel ensures that all the instances are updated to the latest version of the software simultaneously. The updates are applied automatically without any patches or software download. The multi-tenant architecture is already live for one million users. This helps developers easily scale the applications from one to a million users with little or no modification at all. The following image illustrates the multi-tenant architecture: Traditional software systems are hosted on a single-tenant system, usually a client-server-based enterprise application. With the multi-tenant architecture, the end user does not have to worry about the hardware layer or software upgrade and patches. The software system deployed over the Internet can be accessed using a browser from any location possible, even wide ranges of mobile devices. The multi-tenant architecture also allows the applications to be low cost, quick to deploy, and open to innovation. Other examples of software using the multi-tenant architecture are webmail systems, such as www.Gmail.com, www.Yahoo.com, and online storage systems, such as www.Dropbox.com, or note-taking applications, such as Evernote, Springpad, and so on. Force.com metadata Force.com is entirely metadata-driven. The metadata is defined in XML and can be extracted and imported. We will look into metadata in detail later in this article. Force.com Webservice API The data and the metadata stored on the Force.com server can be accessed programmatically through the Webservice API. This enables the developers to extend the functionality to virtually any language, operating system, and platform possible. The web services are based on open web standards, such as SOAP XML and JSON REST, and are directly compatible with other technologies, such as .Net, JAVA, SAP, and Oracle. We can easily integrate the Force.com application with the current business application without rewriting the entire code. Apex and Visualforce Apex is the world's first on-demand language introduced by Salesforce. It is an object-oriented language very similar to C# or JAVA. Apex is specially designed to process bulk data for business applications. Apex is used to write the controller in the MVC architecture. Salesforce Object Query Language (SOQL) gives developers an easy and declarative query language that can fetch and process a large amount of data in an easy, human-readable query language. For those who have used other relational database systems, such as Oracle, SQL Server, and so on, it is similar to SQL but does not support advance capabilities, such as joins. Apex and SOQL together give the developers powerful tools to manage the data and processes of their application, leaving the rest of the overhead on the Force.com platform. The following screenshot shows the page editor for Visualforce. It is easy to use and splits a page into two parts: the one at the bottom is for development and the above half shows the output: Visualforce is an easy to use, yet a powerful framework used to create rich user interfaces, thus extending the standard tabs and forms to any kind of interfaces imaginable. Visualforce ultimately renders into HTML, and hence, we can use any HTML code alongside the Visualforce markup to create a powerful and rich UI to manage business applications. Apart from the UI, Visualforce provides a very easy and direct access to the server-side data and metadata from Apex. The powerful combination of a rich UI with access to the Salesforce metadata makes Visualforce the ultimate solution to build powerful business applications on Salesforce. As the Salesforce.com Certified Force.com Developer Certification does not include Apex and Visualforce, we won't be going into detail about Apex and Visualforce in this book. The developer Console The developer console is an Integrated Development Environment (IDE) for tools to help write code, run tests, and debug the system. The developer console provides an editor for writing code. It also provides a UI to monitor and debug Unit test classes, as shown in the following screenshot: AppExchange AppExchange is the directory of applications build on the Force.com platform. Developers can choose to submit their developed applications on AppExchange. The applications extend the functionality of Force.com beyond CRM with many ready-made business applications available to download and use. AppExchange is available at http://appexchange.salesforce.com. Force.com sites Using Force.com sites or site.com, we can build public facing websites that use the existing Salesforce data and browser technologies, such as HTML, JavaScript, CSS, Angular JS, Bootstrap, and so on. The sites can have an external login for sensitive data or a no login public portal that can be linked to the corporate website as well. Site.com helps in creating websites using drag-and-drop controls. The user with less or no HTML knowledge can build websites using the site.com editor. Force.com development Like any other traditional software development process, the Force.com platform offers tools used to define data, business process, logic, and rich UI for the business application. Many of these tools are built-in, point-and-click tools simplified for users without any development skills. Any user with no programming knowledge can build applications suitable to their business on Force.com. The point-and-click tools are easy to use, but they have limitations and control. To extend the platform beyond these limitations, we use Apex and Visualforce. Let's now compare the tools used for traditional software development and Force.com:   JAVA Dot Net Force.com Building the database Oracle, MS-Access, SQL, or any third-party database setup Oracle, MS-Access, SQL, or any third-party database setup Salesforce metadata (now database.com) Connection to the database JDBC   Ado.net Salesforce metadata API Developing the IDE NetBeans, Eclipse, and so on Visual Studio Online Page Editor and App Setup, Force.com IDE, Mavens Mate, and Aside.io Controlled environment for development and testing Local servers, remote test servers Local servers, remote test servers Force.com real time sandboxes Force.com metadata Everything on Force.com such as data models, objects, forms, tabs, and workflows are defined by metadata. The definitions or metadata are made in XML and can be extracted and imported. The metadata-driven development also helps users with no prior development experience to build business applications without any need to code. We can define the objects, tabs, and forms in the UI using point-and-click. All the changes made to the metadata in App-Setup are tracked. Alternatively, the developers can customize every part of Salesforce using XML flies that control the organization's metadata. The files are downloaded using the Eclipse IDE or Force.com IDE. To customize metadata on Salesforce UI, go to Setup | Build: As Force.com Developer Certification is about using point-and-click, we will be going into the setup details in the coming article. Metadata API The metadata API provides easy access to the organization data, business logic, and the user interface. We can modify the metadata in a controlled test organization called the sandbox. Finally, the tested changes can be deployed to a live production environment edition. The production environment is the live environment that is used by the users and contains live data. The production instance does not allow developers to code in them directly; this ensures that only debugged and tested code reaches the live organization. Online page editor and the Eclipse Force.com IDE Force.com provides a built-in online editor that helps edit the Visualforce pages in real time. The online editor can be enabled by checking the Development Mode checkbox on the user profile, as shown in the following screenshot: The online page editor splits the screen into two parts with live code in the bottom half and the final page output in the top half. Force.com also provides an inline editor for editing the Apex code in the browser itself. Force.com IDE is an IDE built over eclipse. It provides an easy environment to write code and also offline saving. It also comes with a schema browser and a query generator, which helps in generating simple queries (select statements) by selecting fields and objects. The code is auto synced with the organization. Sandboxes Force.com provides a real-time environment to develop, test, and train people in the organization. It is a safe and isolated environment where any changes made will not affect the production data or application. These sandboxes are used to experiment on new features without disturbing the live production organization. Separation of test and dev instances also ensures that only the tested and verified code reaches the production organization. There are three types of sandboxes: Developer sandbox: This environment is specially used to code and test the environment by a single developer. Just like the configuration-only sandbox, this also copies the entire customization of the production organization, excluding the data. The added feature of a developer sandbox is that it allows Apex and Visualforce coding also. Developer pro sandbox: Developer pro sandboxes are similar to developer sandboxes but with larger storage. This sandbox is mostly used to handle more developer and quality assurance tasks. With a larger sandbox, we can store more data and run more efficient tasks. Partial copy sandbox: This is used as a testing environment. This environment copies the full metadata of the production environment and a subset of production data that can be set using a template. Full copy sandbox: This copies the entire production organization and all its data records, documents, and attachments. This is usually used to develop and test a new application until it is ready to be shared with the users. Full copy sandbox has the same IDs of the records as that of production only when it has been freshly created. Force.com application types There are some common types of applications that are required to automate an enterprise process. They are as follows: Content centric applications: These applications enable organizations to share and version content across different levels. They consist of file sharing systems, versioning systems, and content management systems. Transaction centric applications: These applications focus on the transaction. They are applications, such as banking systems, online payment systems, and so on. Process centric applications: These applications focus on automating the business process in the organization such as a bug tracking system, procurement process, approval process, and so on. Force.com is suited to build these kinds of applications. Data centric applications: These applications are built around a powerful database. Many of the organizations use spreadsheets for these applications. Some examples include CRM, HRM, and so on. Force.com is suited to build these kinds of applications. Developing on the Force.com platform There are two ways of development on Force.com: one way is to use point-and-click without a single line of coding, called the declarative development. The other way is to develop an application using code, called programmatic development. Let's take a look at the two types of development in detail. Declarative development The declarative type of development is done by point-and-click using a browser. We use ready-to-use components and modify their configuration to build applications. We can add new objects, define their standard views, and create input forms with simple point-and-link with no coding knowledge. The declarative framework allows rapid development and deployment of applications. The declarative development also follows the MVC architecture in development. The MVC components in the declarative development using Force.com are mentioned in the following table: Model View Controller Objects Fields Relationships   Applications Tabs Page layouts Record types Workflow rules Validation rules Assignment rules   Summary In this article, we became familiar with the Force.com platform. We have seen the life cycle of an application build using Force.com. We saw the multi-tenant architecture and how it is different from the web hosting server. We have a fresh new developer account, and now in further article, we will be using it to build an application on Force.com. Resources for Article: Further resources on this subject: Custom Coding with Apex [article] Auto updating child records in Process Builder [article] Configuration in Salesforce CRM [article]
Read more
  • 0
  • 0
  • 6299
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-building-our-first-poky-image-raspberry-pi
Packt
14 Apr 2016
12 min read
Save for later

Building Our First Poky Image for the Raspberry Pi

Packt
14 Apr 2016
12 min read
In this article by Pierre-Jean TEXIER, the author of the book Yocto for Raspberry Pi, covers basic concepts of the Poky workflow. Using the Linux command line, we will proceed to different steps, download, configure, and prepare the Poky Raspberry Pi environment and generate an image that can be used by the target. (For more resources related to this topic, see here.) Installing the required packages for the host system The steps necessary for the configuration of the host system depend on the Linux distribution used. It is advisable to use one of the Linux distributions maintained and supported by Poky. This is to avoid wasting time and energy in setting up the host system. Currently, the Yocto project is supported on the following distributions: Ubuntu 12.04 (LTS) Ubuntu 13.10 Ubuntu 14.04 (LTS) Fedora release 19 (Schrödinger's Cat) Fedora release 21 CentOS release 6.4 CentOS release 7.0 Debian GNU/Linux 7.0 (Wheezy) Debian GNU/Linux 7.1 (Wheezy) Debian GNU/Linux 7.2 (Wheezy) Debian GNU/Linux 7.3 (Wheezy) Debian GNU/Linux 7.4 (Wheezy) Debian GNU/Linux 7.5 (Wheezy) Debian GNU/Linux 7.6 (Wheezy) openSUSE 12.2 openSUSE 12.3 openSUSE 13.1 Even if your distribution is not listed here, it does not mean that Poky will not work, but the outcome cannot be guaranteed. If you want more information about Linux distributions, you can visitthis link: http://www.yoctoproject.org/docs/current/ref-manual/ref-manual.html Poky on Ubuntu The following list shows required packages by function, given a supported Ubuntu or Debian Linux distribution. The dependencies for a compatible environment include: Download tools: wget and git-core Decompression tools: unzip and tar Compilation tools: gcc-multilib, build-essential, and chrpath String-manipulation tools: sed and gawk Document-management tools: texinfo, xsltproc, docbook-utils, fop, dblatex, and xmlto Patch-management tools: patch and diffstat Here is the command to type on a headless system: $ sudo apt-get install gawk wget git-core diffstat unzip texinfo gcc-multilib build-essential chrpath  Poky on Fedora If you want to use Fedora, you just have to type this command: $ sudo yum install gawk make wget tar bzip2 gzip python unzip perl patch diffutils diffstat git cpp gcc gcc-c++ glibc-devel texinfo chrpath ccache perl-Data-Dumper perl-Text-ParseWords perl-Thread-Queue socat Downloading the Poky metadata After having installed all the necessary packages, it is time to download the sources from Poky. This is done through the git tool, as follows: $ git clone git://git.yoctoproject.org/poky (branch master) Another method is to download tar.bz2 file directly from this repository: https://www.yoctoproject.org/downloads To avoid all hazardous and problematic manipulations, it is strongly recommended to create and switch to a specific local branch. Use these commands: $ cd poky $ git checkout daisy –b work_branch Downloading the Raspberry Pi BSP metadata At this stage, we only have the base of the reference system (Poky), and we have no support for the Broadcom BCM SoC. Basically, the BSP proposed by Poky only offers the following targets: $ ls meta/conf/machine/*.conf beaglebone.conf edgerouter.conf genericx86-64.conf genericx86.conf mpc8315e-rdb.conf This is in addition to those provided by OE-Core: $ ls meta/conf/machine/*.conf qemuarm64.conf qemuarm.conf qemumips64.conf qemumips.conf qemuppc.conf qemux86-64.conf qemux86.conf In order to generate a compatible system for our target, download the specific layer (the BSP Layer) for the Raspberry Pi: $ git clone git://git.yoctoproject.org/meta-raspberrypi If you want to learn more about git scm, you can visit the official website: http://git-scm.com/ Now we can verify whether we have the configuration metadata for our platform (the rasberrypi.conf file): $ ls meta-raspberrypi/conf/machine/*.conf raspberrypi.conf This screenshot shows the meta-raspberrypi folder: The examples and code presented in this article use Yocto Project version 1.7 and Poky version 12.0. For reference,the codename is Dizzy. Now that we have our environment freshly downloaded, we can proceed with its initialization and the configuration of our image through various configurations files. The oe-init-build-env script As can be seen in the screenshot, the Poky directory contains a script named oe-init-build-env. This is a script for the configuration/initialization of the build environment. It is not intended to be executed but must be "sourced". Its work, among others, is to initialize a certain number of environment variables and place yourself in the build directory's designated argument. The script must be run as shown here: $ source oe-init-build-env [build-directory] Here, build-directory is an optional parameter for the name of the directory where the environment is set (for example, we can use several build directories in a single Poky source tree); in case it is not given, it defaults to build. The build-directory folder is the place where we perform the builds. But, in order to standardize the steps, we will use the following command throughout to initialize our environment: $ source oe-init-build-env rpi-build ### Shell environment set up for builds. ### You can now run 'bitbake <target>' Common targets are:     core-image-minimal     core-image-sato     meta-toolchain     adt-installer     meta-ide-support You can also run generated qemu images with a command like 'runqemu qemux86' When we initialize a build environment, it creates a directory (the conf directory) inside rpi-build. This folder contain two important files: local.conf: It contains parameters to configure BitBake behavior. bblayers.conf: It lists the different layers that BitBake takes into account in its implementation. This list is assigned to the BBLAYERS variable. Editing the local.conf file The local.conf file under rpi-build/conf/ is a file that can configure every aspect of the build process. It is through this file that we can choose the target machine (the MACHINE variable), the distribution (the DISTRO variable), the type of package (the PACKAGE_CLASSES variable), and the host configuration (PARALLEL_MAKE, for example). The minimal set of variables we have to change from the default is the following: BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}" PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}" MACHINE ?= raspberrypi MACHINE ?= "raspberrypi" The BB_NUMBER_THREADS variable determines the number of tasks that BitBake will perform in parallel (tasks under Yocto; we're not necessarily talking about compilation). By default, in build/conf/local.conf, this variable is initialized with ${@oe.utils.cpu_count()},corresponding to the number of cores detected on the host system (/proc/cpuinfo). The PARALLEL_MAKE variable corresponds to the -j of the make option to specify the number of processes that GNU Make can run in parallel on a compilation task. Again, it is the number of cores present that defines the default value used. The MACHINE variable is where we determine the target machine we wish to build for the Raspberry Pi (define in the .conf file; in our case, it is raspberrypi.conf). Editing the bblayers.conf file Now, we still have to add the specific layer to our target. This will have the effect of making recipes from this layer available to our build. Therefore, we should edit the build/conf/bblayers.conf file: # LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf # changes incompatibly LCONF_VERSION = "6" BBPATH = "${TOPDIR}" BBFILES ?= "" BBLAYERS ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto   /home/packt/RASPBERRYPI/poky/meta-yocto-bsp   " BBLAYERS_NON_REMOVABLE ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto " Add the following line: # LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf # changes incompatibly LCONF_VERSION = "6" BBPATH = "${TOPDIR}" BBFILES ?= "" BBLAYERS ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto   /home/packt/RASPBERRYPI/poky/meta-yocto-bsp   /home/packt/RASPBERRYPI/poky/meta-raspberrypi   " BBLAYERS_NON_REMOVABLE ?= "   /home/packt/RASPBERRYPI/poky/meta   /home/packt/RASPBERRYPI/poky/meta-yocto " Naturally, you have to adapt the absolute path (/home/packt/RASPBERRYPI here) depending on your own installation. Building the Poky image At this stage, we will have to look at the available images as to whether they are compatible with our platform (.bb files). Choosing the image Poky provides several predesigned image recipes that we can use to build our own binary image. We can check the list of available images by running the following command from the poky directory: $ ls meta*/recipes*/images/*.bb All the recipes provide images which are, in essence, a set of unpacked and configured packages, generating a filesystem that we can use on actual hardware (for further information about different images, you can visit http://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#ref-images). Here is a small representation of the available images: We can add the layers proposed by meta-raspberrypi to all of these layers: $ ls meta-raspberrypi/recipes-core/images/*.bb rpi-basic-image.bb rpi-hwup-image.bb rpi-test-image.bb Here is an explanation of the images: rpi-hwup-image.bb: This is an image based on core-image-minimal. rpi-basic-image.bb: This is an image based on rpi-hwup-image.bb, with some added features (a splash screen). rpi-test-image.bb: This is an image based on rpi-basic-image.bb, which includes some packages present in meta-raspberrypi. We will take one of these three recipes for the rest of this article. Note that these files (.bb) describe recipes, like all the others. These are organized logically, and here, we have the ones for creating an image for the Raspberry Pi. Running BitBake At this point, what remains for us is to start the build engine Bitbake, which will parse all the recipes that contain the image you pass as a parameter (as an initial example, we can take rpi-basic-image): $ bitbake rpi-basic-image Loading cache: 100% |########################################################################################################################################################################| ETA:  00:00:00 Loaded 1352 entries from dependency cache. NOTE: Resolving any missing task queue dependencies Build Configuration: BB_VERSION        = "1.25.0" BUILD_SYS         = "x86_64-linux" NATIVELSBSTRING   = "Ubuntu-14.04" TARGET_SYS        = "arm-poky-linux-gnueabi" MACHINE           = "raspberrypi" DISTRO            = "poky" DISTRO_VERSION    = "1.7" TUNE_FEATURES     = "arm armv6 vfp" TARGET_FPU        = "vfp" meta              meta-yocto        meta-yocto-bsp    = "master:08d3f44d784e06f461b7d83ae9262566f1cf09e4" meta-raspberrypi  = "master:6c6f44136f7e1c97bc45be118a48bd9b1fef1072" NOTE: Preparing RunQueue NOTE: Executing SetScene Tasks NOTE: Executing RunQueue Tasks Once launched, BitBake begins by browsing all the (.bb and .bbclass)files that the environment provides access to and stores the information in a cache. Because the parser of BitBake is parallelized, the first execution will always be longer because it has to build the cache (only about a few seconds longer). However, subsequent executions will be almost instantaneous, because BitBake will load the cache. As we can see from the previous command, before executing the task list, BitBake displays a trace that details the versions used (target, version, OS, and so on). Finally, BitBake starts the execution of tasks and shows us the progress. Depending on your setup, you can go drink some coffee or even eat some pizza. Usually after this, , if all goes well, you will be pleased to see that the tmp/subdirectory's directory construction (rpi-build) is generally populated. The build directory (rpi-build) contains about20 GB after the creation of the image. After a few hours of baking, we can rejoice with the result and the creation of the system image for our target: $ ls rpi-build/tmp/deploy/images/raspberrypi/*sdimg rpi-basic-image-raspberrypi.rpi-sdimg This is this file that we will use to create our bootable SD card. Creating a bootable SD card Now that our environment is complete, you can create a bootable SD card with the following command (remember to change /dev/sdX to the proper device name and be careful not to kill your hard disk by selecting the wrong device name): $ sudo dd if=rpi-basic-image-raspberrypi.rpi-sdimg of=/dev/sdX bs=1M Once the copying is complete, you can check whether the operation was successful using the following command (look at mmcblk0): $ lsblk NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT mmcblk0     179:0    0   3,7G  0 disk ├─mmcblk0p1 179:1    0    20M  0 part /media/packt/raspberrypi └─mmcblk0p2 179:2    0   108M  0 part /media/packt/f075d6df-d8b8-4e85-a2e4-36f3d4035c3c You can also look at the left-hand side of your interface: Booting the image on the Raspberry Pi This is surely the most anticipated moment of this article—the moment where we boot our Raspberry Pi with a fresh Poky image. You just have to insert your SD card in a slot, connect the HDMI cable to your monitor, and connect the power supply (it is also recommended to use a mouse and a keyboard to shut down the device, unless you plan on just pulling the plug and possibly corrupting the boot partition). After connectingthe power supply, you could see the Raspberry Pi splash screen: The login for the Yocto/Poky distribution is root. Summary In this article, we learned the steps needed to set up Poky and get our first image built. We ran that image on the Raspberry Pi, which gave us a good overview of the available capabilities. Resources for Article:   Further resources on this subject: Programming on Raspbian [article] Working with a Webcam and Pi Camera [article] Creating a Supercomputer [article]
Read more
  • 0
  • 1
  • 41800

article-image-step-detector-and-step-counters-sensors
Packt
14 Apr 2016
13 min read
Save for later

Step Detector and Step Counters Sensors

Packt
14 Apr 2016
13 min read
In this article by Varun Nagpal, author of the book, Android Sensor Programming By Example, we will focus on learning about the use of step detector and step counter sensors. These sensors are very similar to each other and are used to count the steps. Both the sensors are based on a common hardware sensor, which internally uses accelerometer, but Android still treats them as logically separate sensors. Both of these sensors are highly battery optimized and consume very low power. Now, lets look at each individual sensor in detail. (For more resources related to this topic, see here.) In this article by Varun Nagpal, author of the book, Android Sensor Programming By Example, we will focus on learning about the use of step detector and step counter sensors. These sensors are very similar to each other and are used to count the steps. Both the sensors are based on a common hardware sensor, which internally uses accelerometer, but Android still treats them as logically separate sensors. Both of these sensors are highly battery optimized and consume very low power. Now, lets look at each individual sensor in detail. The step counter sensor The step counter sensor is used to get the total number of steps taken by the user since the last reboot (power on) of the phone. When the phone is restarted, the value of the step counter sensor is reset to zero. In the onSensorChanged() method, the number of steps is give by event.value[0]; although it's a float value, the fractional part is always zero. The event timestamp represents the time at which the last step was taken. This sensor is especially useful for those applications that don't want to run in the background and maintain the history of steps themselves. This sensor works in batches and in continuous mode. If we specify 0 or no latency in the SensorManager.registerListener() method, then it works in a continuous mode; otherwise, if we specify any latency, then it groups the events in batches and reports them at the specified latency. For prolonged usage of this sensor, it's recommended to use the batch mode, as it saves power. Step counter uses the on-change reporting mode, which means it reports the event as soon as there is change in the value. The step detector sensor The step detector sensor triggers an event each time a step is taken by the user. The value reported in the onSensorChanged() method is always one, the fractional part being always zero, and the event timestamp is the time when the user's foot hit the ground. The step detector sensor has very low latency in reporting the steps, which is generally within 1 to 2 seconds. The Step detector sensor has lower accuracy and produces more false positive, as compared to the step counter sensor. The step counter sensor is more accurate, but has more latency in reporting the steps, as it uses this extra time after each step to remove any false positive values. The step detector sensor is recommended for those applications that want to track the steps in real time and want to maintain their own history of each and every step with their timestamp. Time for action – using the step counter sensor in activity Now, you will learn how to use the step counter sensor with a simple example. The good thing about the step counter is that, unlike other sensors, your app doesn't need to tell the sensor when to start counting the steps and when to stop counting them. It automatically starts counting as soon as the phone is powered on. For using it, we just have to register the listener with the sensor manager and then unregister it after using it. In the following example, we will show the total number of steps taken by the user since the last reboot (power on) of the phone in the Android activity. We created a PedometerActivity and implemented it with the SensorEventListener interface, so that it can receive the sensor events. We initiated the SensorManager and Sensor object of the step counter and also checked the sensor availability in the OnCreate() method of the activity. We registered the listener in the onResume() method and unregistered it in the onPause() method as a standard practice. We used a TextView to display the total number of steps taken and update its latest value in the onSensorChanged() method. public class PedometerActivity extends Activity implements SensorEventListener{ private SensorManager mSensorManager; private Sensor mSensor; private boolean isSensorPresent = false; private TextView mStepsSinceReboot; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_pedometer); mStepsSinceReboot = (TextView)findViewById(R.id.stepssincereboot); mSensorManager = (SensorManager) this.getSystemService(Context.SENSOR_SERVICE); if(mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_COUNTER) != null) { mSensor = mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_COUNTER); isSensorPresent = true; } else { isSensorPresent = false; } } @Override protected void onResume() { super.onResume(); if(isSensorPresent) { mSensorManager.registerListener(this, mSensor, SensorManager.SENSOR_DELAY_NORMAL); } } @Override protected void onPause() { super.onPause(); if(isSensorPresent) { mSensorManager.unregisterListener(this); } } @Override public void onSensorChanged(SensorEvent event) { mStepsSinceReboot.setText(String.valueOf(event.values[0])); } Time for action – maintaining step history with step detector sensor The Step counter sensor works well when we have to deal with the total number of steps taken by the user since the last reboot (power on) of the phone. It doesn't solve the purpose when we have to maintain history of each and every step taken by the user. The Step counter sensor may combine some steps and process them together, and it will only update with an aggregated count instead of reporting individual step detail. For such cases, the step detector sensor is the right choice. In our next example, we will use the step detector sensor to store the details of each step taken by the user, and we will show the total number of steps for each day, since the application was installed. Our next example will consist of three major components of Android, namely service, SQLite database, and activity. Android service will be used to listen to all the individual step details using the step counter sensor when the app is in the background. All the individual step details will be stored in the SQLite database and finally the activity will be used to display the list of total number of steps along with dates. Let's look at the each component in detail. The first component of our example is PedometerListActivity. We created a ListView in the activity to display the step count along with dates. Inside the onCreate() method of PedometerListActivity, we initiated the ListView and ListAdaptor required to populate the list. Another important task that we do in the onCreate() method is starting the service (StepsService.class), which will listen to all the individual steps' events. We also make a call to the getDataForList() method, which is responsible for fetching the data for ListView. public class PedometerListActivity extends Activity{ private ListView mSensorListView; private ListAdapter mListAdapter; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); mSensorListView = (ListView)findViewById(R.id.steps_list); getDataForList(); mListAdapter = new ListAdapter(); mSensorListView.setAdapter(mListAdapter); Intent mStepsIntent = new Intent(getApplicationContext(), StepsService.class); startService(mStepsIntent); } In our example, the DateStepsModel class is used as a POJO (Plain Old Java Object) class, which is a handy way of grouping logical data together, to store the total number of steps and date. We also use the StepsDBHelper class to read and write the steps data in the database (discussed further in the next section). Inside the getDataForList() method, we initiated the object of the StepsDBHelper class and call the readStepsEntries() method of the StepsDBHelper class, which returns ArrayList of the DateStepsModel objects containing the total number of steps along with dates after reading from database. The ListAdapter class is used for populating the values for ListView, which internally uses ArrayList of DateStepsModel as the data source. The individual list item is the string, which is the concatenation of date and the total number of steps. class DateStepsModel { public String mDate; public int mStepCount; } private StepsDBHelper mStepsDBHelper; private ArrayList<DateStepsModel> mStepCountList; public void getDataForList() { mStepsDBHelper = new StepsDBHelper(this); mStepCountList = mStepsDBHelper.readStepsEntries(); } private class ListAdapter extends BaseAdapter{ private TextView mDateStepCountText; @Override public int getCount() { return mStepCountList.size(); } @Override public Object getItem(int position) { return mStepCountList.get(position); } @Override public long getItemId(int position) { return position; } @Override public View getView(int position, View convertView, ViewGroup parent) { if(convertView==null){ convertView = getLayoutInflater().inflate(R.layout.list_rows, parent, false); } mDateStepCountText = (TextView)convertView.findViewById(R.id.sensor_name); mDateStepCountText.setText(mStepCountList.get(position).mDate + " - Total Steps: " + String.valueOf(mStepCountList.get(position).mStepCount)); return convertView; } } The second component of our example is StepsService, which runs in the background and listens to the step detector sensor until the app is uninstalled. We implemented this service with the SensorEventListener interface so that it can receive the sensor events. We also initiated theobjects of StepsDBHelper, SensorManager, and the step detector sensor inside the OnCreate() method of the service. We only register the listener when the step detector sensor is available on the device. A point to note here is that we never unregistered the listener because we expect our app to log the step information indefinitely until the app is uninstalled. Both step detector and step counter sensors are very low on battery consumptions and are highly optimized at the hardware level, so if the app really requires, it can use them for longer durations without affecting the battery consumption much. We get a step detector sensor callback in the onSensorChanged() method whenever the operating system detects a step, and from CC: specify, we call the createStepsEntry() method of the StepsDBHelperclass to store the step information in the database. public class StepsService extends Service implements SensorEventListener{ private SensorManager mSensorManager; private Sensor mStepDetectorSensor; private StepsDBHelper mStepsDBHelper; @Override public void onCreate() { super.onCreate(); mSensorManager = (SensorManager) this.getSystemService(Context.SENSOR_SERVICE); if(mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_DETECTOR) != null) { mStepDetectorSensor = mSensorManager.getDefaultSensor(Sensor.TYPE_STEP_DETECTOR); mSensorManager.registerListener(this, mStepDetectorSensor, SensorManager.SENSOR_DELAY_NORMAL); mStepsDBHelper = new StepsDBHelper(this); } } @Override public int onStartCommand(Intent intent, int flags, int startId) { return Service.START_STICKY; } @Override public void onSensorChanged(SensorEvent event) { mStepsDBHelper.createStepsEntry(); } The last component of our example is the SQLite database. We created a StepsDBHelper class and extended it from the SQLiteOpenHelper abstract utility class provided by the Android framework to easily manage database operations. In the class, we created a database called StepsDatabase, which is automatically created on the first object creation of the StepsDBHelper class by the OnCreate() method. This database has one table StepsSummary, which consists of only three columns (id, stepscount, and creationdate). The first column, id, is the unique integer identifier for each row of the table and is incremented automatically on creation of every new row. The second column, stepscount, is used to store the total number of steps taken for each date. The third column, creationdate, is used to store the date in the mm/dd/yyyy string format. Inside the createStepsEntry() method, we first check whether there is an existing step count with the current date, and we if find one, then we read the existing step count of the current date and update the step count by incrementing it by 1. If there is no step count with the current date found, then we assume that it is the first step of the current date and we create a new entry in the table with the current date and step count value as 1. The createStepsEntry() method is called from onSensorChanged() of the StepsService class whenever a new step is detected by the step detector sensor. public class StepsDBHelper extends SQLiteOpenHelper { private static final int DATABASE_VERSION = 1; private static final String DATABASE_NAME = "StepsDatabase"; private static final String TABLE_STEPS_SUMMARY = "StepsSummary"; private static final String ID = "id"; private static final String STEPS_COUNT = "stepscount"; private static final String CREATION_DATE = "creationdate";//Date format is mm/dd/yyyy private static final String CREATE_TABLE_STEPS_SUMMARY = "CREATE TABLE " + TABLE_STEPS_SUMMARY + "(" + ID + " INTEGER PRIMARY KEY AUTOINCREMENT," + CREATION_DATE + " TEXT,"+ STEPS_COUNT + " INTEGER"+")"; StepsDBHelper(Context context) { super(context, DATABASE_NAME, null, DATABASE_VERSION); } @Override public void onCreate(SQLiteDatabase db) { db.execSQL(CREATE_TABLE_STEPS_SUMMARY); } public boolean createStepsEntry() { boolean isDateAlreadyPresent = false; boolean createSuccessful = false; int currentDateStepCounts = 0; Calendar mCalendar = Calendar.getInstance(); String todayDate = String.valueOf(mCalendar.get(Calendar.MONTH))+"/" + String.valueOf(mCalendar.get(Calendar.DAY_OF_MONTH))+"/"+String.valueOf(mCalendar.get(Calendar.YEAR)); String selectQuery = "SELECT " + STEPS_COUNT + " FROM " + TABLE_STEPS_SUMMARY + " WHERE " + CREATION_DATE +" = '"+ todayDate+"'"; try { SQLiteDatabase db = this.getReadableDatabase(); Cursor c = db.rawQuery(selectQuery, null); if (c.moveToFirst()) { do { isDateAlreadyPresent = true; currentDateStepCounts = c.getInt((c.getColumnIndex(STEPS_COUNT))); } while (c.moveToNext()); } db.close(); } catch (Exception e) { e.printStackTrace(); } try { SQLiteDatabase db = this.getWritableDatabase(); ContentValues values = new ContentValues(); values.put(CREATION_DATE, todayDate); if(isDateAlreadyPresent) { values.put(STEPS_COUNT, ++currentDateStepCounts); int row = db.update(TABLE_STEPS_SUMMARY, values, CREATION_DATE +" = '"+ todayDate+"'", null); if(row == 1) { createSuccessful = true; } db.close(); } else { values.put(STEPS_COUNT, 1); long row = db.insert(TABLE_STEPS_SUMMARY, null, values); if(row!=-1) { createSuccessful = true; } db.close(); } } catch (Exception e) { e.printStackTrace(); } return createSuccessful; } The readStepsEntries() method is called from PedometerListActivity to display the total number of steps along with the date in the ListView. The readStepsEntries() method reads all the step counts along with their dates from the table and fills the ArrayList of DateStepsModelwhich is used as a data source for populating the ListView in PedometerListActivity. public ArrayList<DateStepsModel> readStepsEntries() { ArrayList<DateStepsModel> mStepCountList = new ArrayList<DateStepsModel>(); String selectQuery = "SELECT * FROM " + TABLE_STEPS_SUMMARY; try { SQLiteDatabase db = this.getReadableDatabase(); Cursor c = db.rawQuery(selectQuery, null); if (c.moveToFirst()) { do { DateStepsModel mDateStepsModel = new DateStepsModel(); mDateStepsModel.mDate = c.getString((c.getColumnIndex(CREATION_DATE))); mDateStepsModel.mStepCount = c.getInt((c.getColumnIndex(STEPS_COUNT))); mStepCountList.add(mDateStepsModel); } while (c.moveToNext()); } db.close(); } catch (Exception e) { e.printStackTrace(); } return mStepCountList; } What just happened? We created a small pedometer utility app that maintains the step history along with dates using the steps detector sensor. We used PedometerListActivityto display the list of the total number of steps along with their dates. StepsServiceis used to listen to all the steps detected by the step detector sensor in the background. And finally, the StepsDBHelperclass is used to create and update the total step count for each date and to read the total step counts along with dates from the database. Resources for Article: Further resources on this subject: Introducing the Android UI [article] Building your first Android Wear Application [article] Mobile Phone Forensics – A First Step into Android Forensics [article]
Read more
  • 0
  • 3
  • 48076

article-image-remote-authentication
Packt
14 Apr 2016
9 min read
Save for later

Remote Authentication

Packt
14 Apr 2016
9 min read
When setting up a Linux system, security is supposed to be an important part of all the stages. A good knowledge of the fundamentals of Linux is essential to implement a good security policy on the machine. In this article by Tajinder Pal Singh Kalsi, author of the book, Practical Linux Security Cookbook, we will discuss the following topics: Remote server / Host access using SSH SSH root login disable or enable Key based Login into SSH for restricting remote access (For more resources related to this topic, see here.) Remote server / host access using SSH SSH or Secure Shell is a protocol which is used to log onto remote systems securely and is the most used method for accessing remote Linux systems. Getting ready To see how to use SSH, we need two Ubuntu systems. One will be used as server and the other as client. How to do it… To use SSH we can use freely available software called—OpenSSH. Once the software is installed it can be used by the command ssh, on the Linux system. We will see how to use this tool in detail. If the software to use SSH is not already installed we have to install it on both the server and the client system. The command to install the tool on the server system is: sudo apt-get install openssh-server The output obtained will be as follows: Next we need to install the client version of the software: sudo apt-get install openssh-client The output obtained will be as follows: For latest versions ssh service starts running as soon as the software is installed. If it is not running by default, we can start the service by using the command: sudo service ssh start The output obtained will be as follows: Now if we want to login from the client system to the server system, the command will be as follows: ssh remote_ip_address Here remote_ip_address refers to the IP address of the server system. Also this command assumes that the username on the client machine is the same as that on the server machine: ssh remote_ip_address If we want to login for different user, the command will be as follows: ssh username@remote_ip_address The output obtained will be as follows: Next we need to configure SSH to use it as per our requirements. The main configuration file for sshd in Ubuntu is located at /etc/ssh/sshd_config. Before making any changes to the original version of this file, create a backup using the command: sudo cp /etc/ssh/sshd_config{,.bak} The configuration file defines the default settings for SSH on the server system. When we open the file in any editor, we can see that the default port declaration on which the sshd server listens for the incoming connections is 22. We can change this to any non-standard port to secure the server from random port scans, hence making it more secure. Suppose we change the port to 888, then next time the client wants to connect to the SSH server, the command will be as follows: ssh -p port_numberremote_ip_address The output obtained will be as follows: As we can see when we run the command without specifying the port number, the connection is refused. Next when we mention the correct port number, the connection is established. How it works… SSH is used to connect a client program to a SSH server. On one system we install the openssh-server package to make it the SSH server and on the other system we install the openssh-client package to use it as client. Now keeping the SSH service running on the server system, we try to connect to it through the client. We use the configuration file of SSH to change the settings like default port for connecting. SSH root login disable or enable The Linux systems have root account by default which is enabled by default. If unauthorized users get ssh root access on the system, it is not a good idea because this will give an attacker access to the complete system. We can disable or enable the root login for ssh as per requirement to prevent the chances of an attacker getting access to the system. Getting Ready We need 2 Linux systems to be used as server and client. On the server system, install the package openssh-server, as shown in the preceding recipe. How to do it… First we will see how to disable SSH Root login and then we will also see how to enable it again Firstly open the main configuration file of ssh—/etc/ssh/sshd_config, in any editor. sudo nano /etc/ssh/sshd_config Now look for the line that reads as follows: PermitRootLogin yes Change the value yes to no. Then save and close the file: PermitRootLogin no The output obtained will be as follows: Once done, restart the SSH daemon service using the command as shown here: Now let's try to login as root. We should get an error – Permission Denied as the root login has been disabled: Now whenever we want to login as root, first we will have to login as normal user. And after that we can use the su command and switch to root user. So, the user accounts which are not listed in /etc/sudoers file will not be able to switch to root user and the system will be more secure: Now if we want to again enable SSH Root login, we just need to edit /etc/ssh/sshd_config file again and change the option no to yes again: PermitRootLogin yes The output obtained will be as follows: Then restart the service again by using the command: Now if we try to login as root again, it will work: How it works… When we try to connect to a remote system using SSH, the remote system checks its configuration file at /etc/ssh/sshd_config and according to the details mentioned in this file it decides whether the connection should be allowed or refused. When we change the value of PermitRootLogin according the working also changes. There's more… Suppose we have many user accounts on the systems, then we need to edit the /etc/ssh/sshd_config file in such a way that remote access is allowed only for few mentioned users. sudo nano /etc/ssh/sshd_config Add the line: AllowUsers tajinder user1 Now restart the ssh service: sudo service ssh restart Now when we try to login with user1, the login is successful. However, when we try to login with user2 which is not added in /etc/ssh/sshd_config file, the login fails and we get the error Permission denied, as shown here: Key based login into SSH for restricting remote access Even though SSH login is protected by using passwords for the user account, we can make it more secure by using Key based authentication into SSH. Getting ready To see how key based authentication works, we would need two Linux system (in our example both our Ubuntu systems). One should have the OpenSSH server package installed on it. How to do it... To use key-based authentication, we need to create a pair of keys—a private key and a public key. On the client or local system, we will execute the following command  to generate the SSH keys pair: ssh-keygen-trsa The output obtained will be as follows: While creating the key, we can accept the defaults values or change them as per our wish. It will also ask for a passphrase, which you can choose anything or else leave it blank. The key-pair will be created in the location—~./ssh/. Change to this directory and then use the command—ls –l to see the details of the key files: We can see that id_rsa file can be read and written only by the owner. This permission ensures that the file is kept secure. Now we need to copy the public key file to the remote SSH server. To do so we run the command: ssh-copy-id 192.168.1.101 The output obtained will be as follows: An SSH session will be started and prompt for entering the password for the user account. Once the correct password has been entered the key will get copied to the remote server. Once the public key has been successfully copied to the remote server, try to login to the server again using the ssh 192.168.1.101 command: We can see that now we are not prompted for the user account's password. Since we had configured the passphrase for the SSH key, it has been asked. Otherwise we would have been logged into the system without being asked for the password. How it works... When we create the SSH key pair and move the public key to the remote system, it works as an authentication method for connecting to the remote system. If the public key present in the remote system matches the public key generated by the local system and also the local system has the private key to complete the key-pair, the login happens. Otherwise, if any key file is missing, login is not allowed. Summary Linux security is a massive subject and everything cannot be covered in just one article. Still, Practical Linux Security Cookbook will give you a lot of recipes for securing your machine. It can be referred to as a practical guide for the administrators and help them configure a more secure machine. Resources for Article: Further resources on this subject: Wireless Attacks in Kali Linux [article] Creating a VM using VirtualBox - Ubuntu Linux [article] Building tiny Web-applications in Ruby using Sinatra [article]
Read more
  • 0
  • 0
  • 2450

article-image-using-registry-and-xlswriter-modules
Packt
14 Apr 2016
12 min read
Save for later

Using the Registry and xlswriter modules

Packt
14 Apr 2016
12 min read
In this article by Chapin Bryce and Preston Miller, the authors of Learning Python for Forensics, we will learn about the features offered by the Registry and xlswriter modules. (For more resources related to this topic, see here.) Working with the Registry module The Registry module, developed by Willi Ballenthin, can be used to obtain keys and values from registry hives. Python provides a built-in registry module called _winreg; however, this module only works on Windows machines. The _winreg module interacts with the registry on the system running the module. It does not support opening external registry hives. The Registry module allows us to interact with the supplied registry hives and can be run on non-Windows machines. The Registry module can be downloaded from https://github.com/williballenthin/python-registry. Click on the releases section to see a list of all the stable versions and download the latest version. For this article, we use version 1.1.0. Once the archived file is downloaded and extracted, we can run the included setup.py file to install the module. In a command prompt, execute the following code in the module's top-level directory as shown: python setup.py install This should install the Registry module successfully on your machine. We can confirm this by opening the Python interactive prompt and typing import Registry. We will receive an error if the module is not installed successfully. With the Registry module installed, let's begin to learn how we can leverage this module for our needs. First, we need to import the Registry class from the Registry module. Then, we use the Registry function to open the registry object that we want to query. Next, we use the open() method to navigate to our key of interest. In this case, we are interested in the RecentDocs registry key. This key contains recent active files separated by extension as shown: >>> from Registry import Registry >>> reg = Registry.Registry('NTUSER.DAT') >>> recent_docs = reg.open('SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs') If we print therecent_docs variable, we can see that it contains 11 values with five subkeys, which may contain additional values and subkeys. Additionally, we can use thetimestamp() method to see the last written time of the registry key. >>> print recent_docs Registry Key CMI-CreateHive{B01E557D-7818-4BA7-9885-E6592398B44E}SoftwareMicrosoftWindowsCurrentVersionExplorerRecentDocs with 11 values and 5 subkeys >>> print recent_docs.timestamp() # Last Written Time 2012-04-23 09:34:12.099998 We can iterate over the values in the recent_docs key using the values() function in a for loop. For each value, we can access the name(), value(), raw_data(), value_type(), and value_type_str() methods. The value() and raw_data() represent the data in different ways. We will use the raw_data() function when we want to work with the underlying binary data and use the value() function to gather an interpreted result. The value_type() and value_type_str() functions display a number or string that identify the type of data, such as REG_BINARY, REG_DWORD, REG_SZ, and so on. >>> for i, value in enumerate(recent_docs.values()): ... print '{}) {}: {}'.format(i, value.name(), value.value()) ... 0) MRUListEx: ???? 1) 0: myDocument.docx 2) 4: oldArchive.zip 3) 2: Salaries.xlsx ... Another useful feature of the Registry module is the means provided for querying for a certain subkey or value. This is provided by the subkey(), value(), or find_key() functions. A RegistryKeyNotFoundException is generated when a subkey is not present while using the subkey() function: >>> if recent_docs.subkey('.docx'): ... print 'Found docx subkey.' ... Found docx subkey. >>> if recent_docs.subkey('.1234abcd'): ... print 'Found 1234abcd subkey.' ... Registry.Registry.RegistryKeyNotFoundException: ... The find_key() function takes a path and can find a subkey through multiple levels. The subkey() and value() functions only search child elements. We can use these functions to confirm that a key or value exists before trying to navigate to them. If a particular key or value cannot be found, a custom exception from the Registry module is raised. Be sure to add error handling to catch this error and also alert the user that the key was not discovered. With the Registry module, finding keys and their values becomes straightforward. However, when the values are not strings and are instead binary data we have to rely on another module to make sense of the mess. For all binary needs, the struct module is an excellent candidate. Read also: Tools for Working with Excel and Python Creating Spreadsheets with the xlsxwriter Module Xlsxwriter is a useful third-party module that writes Excel output. There are a plethora of Excel-supported modules for Python, but we chose this module because it was highly robust and well-documented. As the name suggests, this module can only be used to write Excel spreadsheets. The xlsxwriter module supports cell and conditional formatting, charts, tables, filters, and macros among others. Adding data to a spreadsheet Let's quickly create a script called simplexlsx.v1.py for this example. On lines 1 and 2 we import the xlsxwriter and datetime modules. The data we are going to be plotting, including the header column is stored as nested lists in the school_data variable. Each list is a row of information that we want to store in the output excel sheet, with the first element containing the column names. 001 import xlsxwriter 002 from datetime import datetime 003 004 school_data = [['Department', 'Students', 'Cumulative GPA', 'Final Date'], 005 ['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)], 006 ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)], 007 ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)], 008 ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]] The writeXLSX() function, defined on line 11, is responsible for writing our data in to a spreadsheet. First, we must create our Excel spreadsheet using the Workbook() function supplying the desired name of the file. On line 13, we create a worksheet using the add_worksheet() function. This function can take the desired title of the worksheet or use the default name 'Sheet N', where N is the specific sheet number. 011 def writeXLSX(data): 012 workbook = xlsxwriter.Workbook('MyWorkbook.xlsx') 013 main_sheet = workbook.add_worksheet('MySheet') The date_format variable stores a custom number format that we will use to display our datetime objects in the desired format. On line 17, we begin to enumerate through our data to write. The conditional on line 18 is used to handle the header column which is the first list encountered. We use the write() function and supply a numerical row and column. Alternatively, we can also use the Excel notation, i.e. A1. 015 date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'}) 016 017 for i, entry in enumerate(data): 018 if i == 0: 019 main_sheet.write(i, 0, entry[0]) 020 main_sheet.write(i, 1, entry[1]) 021 main_sheet.write(i, 2, entry[2]) 022 main_sheet.write(i, 3, entry[3]) The write() method will try to write the appropriate type for an object when it can detect the type. However, we can use different write methods to specify the correct format. These specialized writers preserve the data type in Excel so that we can use the appropriate data type specific Excel functions for the object. Since we know the data types within the entry list, we can manually specify when to use the general write() function or the specific write_number() function. 023 else: 024 main_sheet.write(i, 0, entry[0]) 025 main_sheet.write_number(i, 1, entry[1]) 026 main_sheet.write_number(i, 2, entry[2]) For the fourth entry in the list, thedatetime object, we supply the write_datetime() function with our date_format defined on line 15. After our data is written to the workbook, we use the close() function to close and save our data. On line 32, we call the writeXLSX() function passing it to the school_data list we built earlier. 027 main_sheet.write_datetime(i, 3, entry[3], date_format) 028 029 workbook.close() 030 031 032 writeXLSX(school_data) A table of write functions and the objects they preserve is presented below. Function Supported Objects write_string str write_number int, float, long write_datetime datetime objects write_boolean bool write_url str When the script is invoked at the Command Line, a spreadsheet called MyWorkbook.xlsx is created. When we convert this to a table, we can sort it according to any of our values. Had we failed to preserve the data types values such as our dates might be identified as non-number types and prevent us from sorting them appropriately. Building a table Being able to write data to an Excel file and preserve the object type is a step-up over CSV, but we can do better. Often, the first thing an examiner will do with an Excel spreadsheet is convert the data into a table and begin the frenzy of sorting and filtering. We can convert our data range to a table. In fact, writing a table with xlsxwriter is arguably easier than writing each row individually. The following code will be saved into the file simplexlsx.v2.py. For this iteration, we have removed the initial list in the school_data variable that contained the header information. Our new writeXLSX() function writes the header separately. 004 school_data = [['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)], 005 ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)], 006 ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)], 007 ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]] Lines 10 through 14 are identical to the previous iteration of the function. Representing our table on the spreadsheet is accomplished on line 16. 010 def writeXLSX(data): 011 workbook = xlsxwriter.Workbook('MyWorkbook.xlsx') 012 main_sheet = workbook.add_worksheet('MySheet') 013 014 date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'}) The add_table() function takes multiple arguments. First, we pass a string representing the top-left and bottom-right cells of the table in Excel notation. We use the length variable, defined on line 15, to calculate the necessary length of our table. The second argument is a little more confusing; this is a dictionary with two keys, named data and columns. The data key has a value of our data variable, which is perhaps poorly named in this case. The columns key defines each row header and, optionally, its format, as seen on line 19: 015 length = str(len(data) + 1) 016 main_sheet.add_table(('A1:D' + length), {'data': data, 017 'columns': [{'header': 'Department'}, {'header': 'Students'}, 018 {'header': 'Cumulative GPA'}, 019 {'header': 'Final Date', 'format': date_format}]}) 020 workbook.close() In lesser lines than the previous example, we've managed to create a more useful output built as a table. Now our spreadsheet has our specified data already converted into a table and ready to be sorted. There are more possible keys and values that can be supplied during the construction of a table. Please consult the documentation at (http://xlsxwriter.readthedocs.org) for more details on advanced usage. This process is simple when we are working with nested lists representing each row of a worksheet. Data structures not in the specified format require a combination of both methods demonstrated in our previous iterations to achieve the same effect. For example, we can define a table to span across a certain number of rows and columns and then use the write() function for those cells. However, to prevent unnecessary headaches we recommend keeping data in nested lists. Creating charts with Python Lastly, let's create a chart with xlsxwriter. The module supports a variety of different chart types including: line, scatter, bar, column, pie, and area. We use charts to summarize the data in meaningful ways. This is particularly useful when working with large data sets, allowing examiners to gain a high level of understanding of the data before getting into the weeds. Let's modify the previous iteration yet again to display a chart. We will save this modified file as simplexlsx.v3.py. On line 21, we are going to create a variable called department_grades. This variable will be our chart object created by the add_chart()method. For this method, we pass in a dictionary specifying keys and values[SS4] . In this case, we specify the type of the chart to be a column chart. 021 department_grades = workbook.add_chart({'type':'column'}) On line 22, we use theset_title() function and again pass it in a dictionary of parameters. We set the name key equal to our desired title. At this point, we need to tell the chart what data to plot. We do this with the add_series() function. Each category key maps to the Excel notation specifying the horizontal axis data. The vertical axis is represented by the values key. With the data to plot specified, we use theinsert_chart() function to plot the data in the spreadsheet. We give this function a string of the cell to plot the top-left of the chart and then the chart object itself. 022 department_grades.set_title({'name':'Department and Grade distribution'}) 023 department_grades.add_series({'categories':'=MySheet!$A$2:$A$5', 'values':'=MySheet!$C$2:$C$5'}) 024 main_sheet.insert_chart('A8', department_grades) 025 workbook.close() Running this version of the script will convert our data into a table and generate a column chart comparing departments by their grades. We can clearly see that, unsurprisingly, the Forensic Science department has the highest GPA earners in the school's program. This information is easy enough to eyeball for such a small data set. However, when working with data orders of larger magnitude, creating summarizing graphics can be particularly useful to understand the big picture. Be aware that there is a great deal of additional functionality in the xlsxwriter module that we will not use in our script. This is an extremely powerful module and we recommend it for any operation that requires writing Excel spreadsheets. Summary In this article, we began with introducing the Registry module and how it is used to obtain keys and values from registry hives. Next, we dealt with various aspects of spreadsheets, such as cells, tables, and charts using the xlswriter module. Resources for Article: Further resources on this subject: Test all the things with Python [article] An Introduction to Python Lists and Dictionaries [article] Python Data Science Up and Running [article]
Read more
  • 0
  • 0
  • 33740
article-image-probabilistic-graphical-models-r
Packt
14 Apr 2016
18 min read
Save for later

Probabilistic Graphical Models in R

Packt
14 Apr 2016
18 min read
In this article by David Bellot, author of the book, Learning Probabilistic Graphical Models in R, explains that among all the predictions that were made about the 21st century, we may not have expected that we would collect such a formidable amount of data about everything, everyday, and everywhere in the world. The past years have seen an incredible explosion of data collection about our world and lives, and technology is the main driver of what we can certainly call a revolution. We live in the age of information. However, collecting data is nothing if we don't exploit it and if we don't try to extract knowledge out of it. At the beginning of the 20th century, with the birth of statistics, the world was all about collecting data and making statistics. Back then, the only reliable tools were pencils and papers and, of course, the eyes and ears of the observers. Scientific observation was still in its infancy despite the prodigious development of the 19th century. (For more resources related to this topic, see here.) More than a hundred years later, we have computers, electronic sensors, massive data storage, and we are able to store huge amounts of data continuously, not only about our physical world but also about our lives, mainly through the use of social networks, Internet, and mobile phones. Moreover, the density of our storage technology increased so much that we can, nowadays, store months if not years of data into a very small volume that can fit in the palm of our hand. Among all the tools and theories that have been developed to analyze, understand, and manipulate probability and statistics became one of the most used. In this field, we are interested in a special, versatile, and powerful class of models called the probabilistic graphical models (PGM, for short). Probabilistic graphical model is a tool to represent beliefs and uncertain knowledge about facts and events using probabilities. It is also one of the most advanced machine learning techniques nowadays and has many industrial success stories. They can deal with our imperfect knowledge about the world because our knowledge is always limited. We can't observe everything, and we can't represent the entire universe in a computer. We are intrinsically limited as human beings and so are our computers. With Probabilistic Graphical Models, we can build simple learning algorithms or complex expert systems. With new data, we can improve these models and refine them as much as we can, and we can also infer new information or make predictions about unseen situations and events. Probabilistic Graphical Models, seen from the point of view of mathematics, are a way to represent a probability distribution over several variables, which is called a joint probability distribution. In a PGM, such knowledge between variables can be represented with a graph, that is, nodes connected by edges with a specific meaning associated to it. Let's consider an example from the medical world: how to diagnose a cold. This is an example and by no means a medical advice. It is oversimplified for the sake of simplicity. We define several random variables such as the following: Se: This means season of the year N: This means that the nose is blocked H: This means the patient has a headache S: This means that the patient regularly sneezes C: This means that the patient coughs Cold: This means the patient has a cold. Because each of the symptoms can exist at different degrees, it is natural to represent the variable as random variables. For example, if the patient's nose is a bit blocked, we will assign a probability of, say, 60% to this variable, that is P(N=blocked)=0.6 and P(N=not blocked)=0.4. In this example, the probability distribution P(Se,N,H,S,C,Cold) will require 4 * 25 = 128 values in total (4 values for season and 2 values for each of the other random variables). It's quite a lot, and honestly, it's quite difficult to determine things such as the probability that the nose is not blocked, the patient has a headache, the patient sneeze, and so on. However, we can say that a headache is not directly related to cough of a blocked nose, expect when the patient has a cold. Indeed, the patient can have a headache for many other reasons. Moreover, we can say that the Season has quite a direct effect on Sneezing, blocked nose, or Cough but less or no direct effect on Headache. In a Probabilistic Graphical Model, we will represent these dependency relationships with a graph, as follow, where each random variable is a node in the graph, and each relationship is an arrow between 2 nodes: In the graph that follows, there is a direct relation between each node and each variable of the Probabilistic Graphical Model and also a direct relation between arrows and the way we can simplify the joint probability distribution in order to make it tractable. Using a graph as a model to simplify a complex (and sometimes complicated) distribution presents numerous benefits: As we observed in the previous example, and in general when we model a problem, the random variables interacts directly with only a small subsets of other random variables. Therefore, this promotes more compact and tractable models The knowledge and dependencies represented in a graph are easy to understand and communicate The graph induces a compact representation of the joint probability distribution and it is easy to make computations with Algorithms to draw inferences and learn can use the graph theory and the associated algorithms to improve and facilitate all the inference and learning algorithms. Compared to the raw joint probability distribution, using a PGM will speed up computations by several order of magnitude. The junction tree algorithm The Junction Tree Algorithm is one of the main algorithms to do inference on PGM. Its name arises from the fact that before doing the numerical computations, we will transform the graph of the PGM into a tree with a set of properties that allow for efficient computations of posterior probabilities. One of the main aspects is that this algorithm will not only compute the posterior distribution of the variables in the query, but also the posterior distribution of all other variables that are not observed. Therefore, for the same computational price, one can have any posterior distribution. Implementing a junction tree algorithm is a complex task, but fortunately, several R packages contain a full implementation, for example, gRain. Let's say we have several variables A, B, C, D, E,and F. We will consider for the sake of simplicity that each variable is binary so that we won't have too many values to deal with. We will assume the following factorization: This is represented by the following graph: We first start by loading the gRain package into R: library(gRain) Then, we create our set of random variables from A to F: val=c(“true”,”false”) F = cptable(~F, values=c(10,90),levels=val) C = cptable(~C|F, values=c(10,90,20,80),levels=val) E = cptable(~E|F, values=c(50,50,30,70),levels=val) A = cptable(~A|C, values=c(50,50,70,30),levels=val) D = cptable(~D|E, values=c(60,40,70,30),levels=val) B = cptable(~B|A:D, values=c(60,40,70,30,20,80,10,90),levels=val) The cptable function creates a conditional probability table, which is a factor for discrete variables. The probabilities associated to each variable are purely subjective and only serve the purpose of the example. The next step is to compute the junction tree. In most packages, computing the junction tree is done by calling one function because the algorithm just does everything at once: plist = compileCPT(list(F,E,C,A,D,B)) plist Also, we check whether the list of variable is correctly compiled into a probabilistic graphical model and we obtain from the previous code: CPTspec with probabilities:  P( F )  P( E | F )  P( C | F )  P( A | C )  P( D | E )  P( B | A D ) This is indeed the factorization of our distribution, as stated earlier. If we want to check further, we can look at the conditional probability table of a few variables: print(plist$F) print(plist$B) F  true false 0.1   0.9 , , D = true        A B       true false   true   0.6   0.7   false  0.4   0.3 , , D = false          A B       true false   true   0.2   0.1   false  0.8   0.9 The second output is a bit more complex, but if you look carefully, you will see that you have two distributions, P(B|A,D=true) and P(B|A,D=false) which is more readable presentation of P(B|A,D). We finally create the graph and run the junction tree algorithm by calling this: jtree = grain(plist) Again, when we check the result, we obtain: jtree Independence network: Compiled: FALSE Propagated: FALSE   Nodes: chr [1:6] "F" "E" "C" "A" "D" "B" We only need to compute the junction tree once. Then, all queries can be computed with the same junction tree. Of course, if you change the graph, then you need to recompute the junction tree. Let's perform a few queries: querygrain(jtree, nodes=c("F"), type="marginal") $F F  true false 0.1   0.9 Of course, if you ask for the marginal distribution of F, you will obtain the initial conditional probability table because F has no parents.  querygrain(jtree, nodes=c("C"), type="marginal") $C C  true false 0.19  0.81 This is more interesting because it computes the marginal of C while we only stated the conditional distribution of C given F. We didn't need to have such a complex algorithm as the junction tree algorithm to compute such a small marginal. We saw the variable elimination algorithm earlier and that would be enough too. But if you ask for the marginal of B, then the variable elimination will not work because of the loop in the graph. However, the junction tree will give the following: querygrain(jtree, nodes=c("B"), type="marginal") $B B     true    false 0.478564 0.521436   And, can ask more complex distribution, such as the joint distribution of B and A: querygrain(jtree, nodes=c("A","B"), type="joint")        B A           true    false   true  0.309272 0.352728   false 0.169292 0.168708 In fact, any combination can be given like A,B,C: querygrain(jtree, nodes=c("A","B","C"), type="joint") , , B = true          A C           true    false   true  0.044420 0.047630   false 0.264852 0.121662   , , B = false          A C           true    false   true  0.050580 0.047370   false 0.302148 0.121338 Now, we want to observe a variable and compute the posterior distribution. Let's say F=true and we want to propagate this information down to the rest of the network: jtree2 = setEvidence(jtree, evidence=list(F="true")) We can ask for any joint or marginal now: querygrain(jtree, nodes=c("A"), type="marginal") $A A  true false 0.662 0.338 querygrain(jtree2, nodes=c("A"), type="marginal") $A A  true false  0.68  0.32 Here, we see that knowing that F=true changed the marginal distribution on A from its previous marginal (the second query is again with jtree2, the tree with an evidence). And, we can query any other variable: querygrain(jtree, nodes=c("B"), type="marginal") $B B     true    false 0.478564 0.521436   querygrain(jtree2, nodes=c("B"), type="marginal") $B B   true  false 0.4696 0.5304 Learning Building a Probabilistic Graphical Model, generally, requires three steps: defining the random variables, which are the nodes of the graph as well; defining the structure of the graph; and finally defining the numerical parameters of each local distribution. So far, the last step has been done manually and we gave numerical values to each local probability distribution by hand. In many cases, we have access to a wealth of data and we can find the numerical values of those parameters with a method called parameters learning. In other fields, it is also called parameters fitting or model calibration. Learning parameters can be done with several approaches and there is no ultimate solution to the problem because it depends on the goal where the model's user wants to reach. Nevertheless, it is common to use the notion of Maximum Likelihood of a model and also Maximum A Posteriori. As you are now used to the notion of prior and posterior of a distribution, you can already guess what a maximum a posteriori can do. Many algorithms are used, among which we can cite the Expectation Maximization algorithm (EM), which computes the maximum likelihood of a model even when data is missing or variables are not observed at all. It is a very important algorithm, especially for mixture models. A graphical model of a linear model PGM can be used to represent standard statistical models and then extend them. One famous example is the linear regression mode. We can visualize the structure of a linear mode and better understand the relationships between the variable. The linear model captures the relationships between observable variables xand a target variable y. This relation is modeled by a set of parameters, θ. But remember the distribution of y for each data point indexed by i: Here, Xiis a row vector for which the first element is always one to capture the intercept of the linear model. The parameter θ in the following graph is itself composed of the intercept, the coefficient β for each component of X, and the variance σ2 of in the distribution of yi. The PGM for an observation of a linear model can be represented as follows: So, this decomposition leads us to a second version of the graphical model in which we explicitly separate the components of θ: In a PGM, when a rectangle is drawn around a set of nodes with a number or variables in a corner (N for example), it means that the same graph is repeated many times. The likelihood function of a linear model is    , and it can be represented as a PGM. And, the vector β can also be decomposed it into its univariate components too: In this last iterations of the graphical model, we see that the parameters β could have a prior probability on it instead of being fixed. In fact, the parameter  can also be considered as a random variable. For the time being, we will keep it fixed. Latent Dirichlet Allocation The last model we want to show in this article is called the Latent Dirichlet Allocation. It is a generative model that can be represented as a graphical model. It's based on the same idea as the mixture model with one notable exception. In this model, we assume that the data points might be generated by a combination of clusters and not just one cluster at a time, as it was the case before. The LDA model is primarily used in text analysis and classification. Let's consider that a text document is composed of words making sentences and paragraphs. To simplify the problem we can say that each sentence or paragraph is about one specific topic, such as science, animals, sports, and s on. Topics can also be more specific, such as cat topic or European soccer topic. Therefore, there are words that are more likely to come from specific topics. For example, the work cat is likely to come from the topic cat topic. The word stadium is likely to come from the topic European soccer. However, the word ball should come with a higher probability from the topic European soccer, but it is not unlikely to come from the topic cat, because cats like to play with balls too. So, it seems the word ball might belong to two topics at the same time with a different degree of certainty. Other words such as table will certainly belong equally to both topics and presumably to others. They are very generic; expect, of course, if we introduce another topics such as furniture. A document is a collection of words, so a document can have complex relationships with a set of topics. But in the end, it is more likely to see words coming from the same topic or the same topics within a paragraph and to some extent to the document. In general, we model a document with a bag of words model, that is, we consider a document to be a randomly generated set of words, using a specific distribution over the words. If this distribution is uniform over all the words, then the document will be purely random without a specific meaning. However, if this distribution has a specific form, with more probability mass to related words, then the collection of words generated by this model will have a meaning. Of course, generating documents is not really the application we have in mind for such a model. What we are interested in is the analysis of documents, their classification, and automatic understanding. Let's say is  a categorical variable (in other words, a histogram), representing the probability of appearance of all words from a dictionary. Usually, in this kind of model, we restrict ourselves to long words only and remove the small words, like and, to, but, the, a, and so onThese words are usually called stop words. Let w_jbe the jth words in a document. The following three graphs show the progression from representing a document (left-most graph) to representing a collection of documents (the third graph): Let  be a distribution over topics, then in the second graph from the left, we extend this model by choosing the kind of topic that will be selected at any time and then generate a word out of it. Therefore, the variable zi now becomes the variable zij, that is, the topic iis selected for the word j. We can go even further and decide that we want to model a collection of documents, which seems natural if we consider that we have a big data set. Assuming that documents are i.i.d, the next step (the third graph) is a PGM that represents the generative model for M documents. And, because the distribution on  is categorical, we want to be Bayesian about it, mainly because it will help to model not to overfit and because we consider the selection of topics for a document to be a random process. Moreover, we want to apply the same treatment to the word variable by having a Dirichlet prior. This prior is used to avoid non-observed words that have zero probability. It smooths the distribution of words per topic. A uniform Dirichlet prior will induce a uniform prior distribution on all the words. And therefore, the final graph on the right represents the complete model. This is quite a complex graphical model but techniques have been developed to fit the parameters and use this model. If we follow this graphical model carefully, we have a process that generates documents based on a certain set of topics: α chooses the set of topics for a documents From θ, we generate a topic zij From this topic, we generate a word wj In this model, only the words are observable. All the other variables will have to be determined without observation, exactly like in the other mixture models. So, documents are represented as random mixtures over latent topics, in which each topic is represented as a distribution over words. The distribution of a topic mixture based on this graphical mode can be written as follows: You can see in this formula that for each word, we select a topic, hence the product from 1 to N. Integrating over θ and summing over z, the marginal distribution of a document is as follows: The final distribution can be obtained by taking the product of marginal distributions of single documents, so as to get the distribution over a collection of documents (assuming that documents are independently and identically distributed). Here, D is the collection of documents: The main problem to be solved now is how to compute the posterior distribution over θ and z, given a document. By applying the Bayes formula, we know the following: Unfortunately, this is intractable because of the normalization factor at the denominator. The original paper on LDA, therefore, refers to a technique called Variational inference, which aims at transforming a complex Bayesian inference problem into a simpler approximation which can be solved as an (convex) optimization problem. This technique is the third approach to Bayesian inference and has been used on many other problems. Summary The probabilistic graphical model framework offers a powerful and versatile framework to develop and extend many probabilistic models using an elegant graph-based formalism. It has many applications such as in biology, genomics, medicine, finance, robotics, computer vision, automation, engineering, law, and games, for example. Many packages in R exist to deal with all sort of models and data among which gRain or Rstan are very popular. Resources for Article:   Further resources on this subject: Extending ElasticSearch with Scripting [article] Exception Handling in MySQL for Python [article] Breaking the Bank [article]
Read more
  • 0
  • 0
  • 12660

article-image-detecting-fraud-e-commerce-orders-benfords-law
Packt
14 Apr 2016
7 min read
Save for later

Detecting fraud on e-commerce orders with Benford's law

Packt
14 Apr 2016
7 min read
In this article by Andrea Cirillo, author of the book RStudio for R Statistical Computing Cookbook, has explained how to detect fraud on e-commerce orders. Benford's law is a popular empirical law that states that the first digits of a population of data will follow a specific logarithmic distribution. This law was observed by Frank Benford around 1938 and since then has gained increasing popularity as a way to detect anomalous alteration of population of data. Basically, testing a population against Benford's law means verifying that the given population respects this law. If deviations are discovered, the law performs further analysis for items related to those deviations. In this recipe, we will test a population of e-commerce orders against the law, focusing on items deviating from the expected distribution. (For more resources related to this topic, see here.) Getting ready This recipe will use functions from the well-documented benford.analysis package by Carlos Cinelli. We therefore need to install and load this package: install.packages("benford.analysis") library(benford.analysis) In our example, we will use a data frame that stores e-commerce orders, provided within the book as an .Rdata file. In order to make it available within your environment, we need to load this file by running the following command (assuming the file is within your current working directory): load("ecommerce_orders_list.Rdata") How to do it... Perform Benford test on the order amounts: benford_test <- benford(ecommerce_orders_list$order_amount,1) Plot test analysis: plot(benford_test) This will result in the following plot: Highlights supectes digits: suspectsTable(benford_test) This will produce a table showing for each digit absolute differences between expected and observed frequencies. The first digits will therefore be more anomalous ones: > suspectsTable(benford_test)    digits absolute.diff 1:      5     4860.8974 2:      9     3764.0664 3:      1     2876.4653 4:      2     2870.4985 5:      3     2856.0362 6:      4     2706.3959 7:      7     1567.3235 8:      6     1300.7127 9:      8      200.4623 Define a function to extrapolate the first digit from each amount: left = function (string,char){   substr(string,1,char)} Extrapolate the first digit from each amount: ecommerce_orders_list$first_digit <- left(ecommerce_orders_list$order_amount,1) Filter amounts starting with the suspected digit: suspects_orders <- subset(ecommerce_orders_list,first_digit == 5) How it works Step 1 performs the Benford test on the order amounts. In this step, we applied the benford() function to the amounts. Applying this function means evaluating the distribution of the first digits of amounts against the expected Benford distribution. The function will result in the production of the following objects: Object Description Info This object covers the following general information: data.name: This shows the name of the data used n: This shows the number of observations used n.second.order: This shows the number of observations used for second-order analysis number.of.digits: This shows the number of first digits analyzed Data This is a data frame with the following subobjects: lines.used: This shows  the original lines of the dataset data.used: This shows the data used data.mantissa: This shows the log data's mantissa data.digits: This shows the first digits of the data s.o.data This is a data frame with the following subobjects: data.second.order: This shows the differences of the ordered data  data.second.order.digits: This shows the first digits of the second-order analysis Bfd This is a data frame with the following subobjects: digits: This highlights the groups of digits analyzed data.dist: This highlights the distribution of the first digits of the data data.second.order.dist: This highlights the distribution of the first digits of the second-order analysis benford.dist: This shows the theoretical Benford distribution data.second.order.dist.freq: This shows the frequency distribution of the first digits of the second-order analysis data.dist.freq: This shows the frequency distribution of the first digits of the data benford.dist.freq: This shows the theoretical Benford frequency distribution benford.so.dist.freq: This shows the theoretical Benford frequency distribution of the second order analysis. data.summation: This shows the summation of the data values grouped by first digits abs.excess.summation: This shows the absolute excess summation of the data values grouped by first digits difference: This highlights the difference between the data and Benford frequencies squared.diff: This shows the chi-squared difference between the data and Benford frequencies absolute.diff: This highlights the absolute difference between the data and Benford frequencies Mantissa This is a data frame with the following subobjects: mean.mantissa: This shows the mean of the mantissa var.mantissa: This shows the variance of the mantissa ek.mantissa: This shows the excess kurtosis of the mantissa sk.mantissa: This highlights the skewness of the mantissa MAD This object depicts the mean absolute deviation. distortion.factor This object talks about the distortion factor. Stats This object lists of htest class statistics as follows: chisq: This lists the Pearson's Chi-squared test. mantissa.arc.test: This lists the Mantissa Arc test Step 2 plots test results. Running plot on the object resulting from the benford() function will result in a plot showing the following (from upper-left corner to bottom-right corner): First digit distribution Results of second-order test Summation distribution for each digit Results of chi-squared test Summation differences If you look carefully at these plots, you will understand which digits show up a distribution significantly different from the one expected from the Benford law. Nevertheless, in order to have a sounder base for our consideration, we need to look at the suspects table, showing absolute differences between expected and observed frequencies. This is what we will do in the next step. Step 3 highlights suspects digits. Using suspectsTable() we can easily discover which digits presents the greater deviation from the expected distribution. Looking at the so-called suspects table, we can see that number 5 shows up as the first digit within our table. In the next step, we will focus our attention on the orders with amounts having this digit as the first digit. Step 4 defines a function to extrapolate the first digit from each amount. This function leverages the substr() function from the stringr() package and extracts the first digit from the number passed to it as an argument. Step 5 adds a new column to the investigated dataset where the first digit is extrapolated. Step 6 filters amounts starting with the suspected digit. After applying the left function to our sequence of amounts, we can now filter the dataset, retaining only rows whose amounts have 5 as the first digit. We will now be able to perform analytical, testing procedures on those items. Summary In this article, you learned how to apply the R language to an e-commerce fraud detection system. Resources for Article: Further resources on this subject: Recommending Movies at Scale (Python) [article] Visualization of Big Data [article] Big Data Analysis (R and Hadoop) [article]
Read more
  • 0
  • 0
  • 3169

article-image-understanding-proxmox-ve-and-advanced-installation
Packt
13 Apr 2016
12 min read
Save for later

Understanding Proxmox VE and Advanced Installation

Packt
13 Apr 2016
12 min read
In this article by Wasim Ahmed, the author of the book Mastering Proxmox - Second Edition, we will see Virtualization as we all know today is a decade old technology that was first implemented in mainframes of the 1960s. Virtualization was a way to logically divide the mainframe's resources for different application processing. With the rise in energy costs, running under-utilized server hardware is no longer a luxury. Virtualization enables us to do more with less thus save energy and money while creating a virtual green data center without geographical boundaries. (For more resources related to this topic, see here.) A hypervisor is a piece software, hardware, or firmware that creates and manages virtual machines. It is the underlying platform or foundation that allows a virtual world to be built upon. In a way, it is the very building block of all virtualization. A bare metal hypervisor acts as a bridge between physical hardware and the virtual machines by creating an abstraction layer. Because of this unique feature, an entire virtual machine can be moved over a vast distance over the Internet and be made able to function exactly the same. A virtual machine does not see the hardware directly; instead, it sees the layer of the hypervisor, which is the same no matter on what hardware the hypervisor has been installed. The Proxmox Virtual Environment (VE) is a cluster-based hypervisor and one of the best kept secrets in the virtualization world. The reason is simple. It allows you to build an enterprise business-class virtual infrastructure at a small business-class price tag without sacrificing stability, performance, and ease of use. Whether it is a massive data center to serve millions of people, or a small educational institution, or a home serving important family members, Proxmox can handle configuration to suit any situation. If you have picked up this article, no doubt you will be familiar with virtualization and perhaps well versed with other hypervisors, such VMWare, Xen, Hyper-V, and so on. In this article and upcoming articles, we will see the mighty power of Proxmox from inside out. We will examine scenarios and create a complex virtual environment. We will tackle some heavy day-to-day issues and show resolutions, which might just save the day in a production environment. So, strap yourself and let's dive into the virtual world with the mighty hypervisor, Proxmox VE. Understanding Proxmox features Before we dive in, it is necessary to understand why one should choose Proxmox over the other main stream hypervisors. Proxmox is not perfect but stands out among other contenders with some hard to beat features. The following are some of the features that makes Proxmox a real game changer. It is free! Yes, Proxmox is free! To be more accurate, Proxmox has several subscription levels among which the community edition is completely free. One can simply download Proxmox ISO at no cost and raise a fully functional cluster without missing a single feature and without paying anything. The main difference between the paid and community subscription level is that the paid subscription receives updates, which goes through additional testing and refinement. If you are running a production cluster with real workload, it is highly recommended that you purchase support and licensing from Proxmox or Proxmox resellers. Built-in firewall Proxmox VE comes with a robust firewall ready to be configured out of the box. This firewall can be configured to protect the entire Proxmox cluster down to a virtual machine. The Per VM firewall option gives you the ability to configure each VM individually by creating individualized firewall rules, a prominent feature in a multi-tenant virtual environment. Open vSwitch Licensed under Apache 2.0 license, Open vSwitch is a virtual switch designed to work in a multi-server virtual environment. All hypervisors need a bridge between VMs and the outside network. Open vSwitch enhances features of the standard Linux bridge in an ever changing virtual environment. Proxmox fully supports Open vSwitch that allows you to create an intricate virtual environment all the while, reducing virtual network management overhead. For details on Open vSwitch, refer to http://openvswitch.org/. The graphical user interface Proxmox comes with a fully functional graphical user interface or GUI out of the box. The GUI allows an administrator to manage and configure almost all the aspects of a Proxmox cluster. The GUI has been designed keeping simplicity in mind with functions and features separated into menus for easier navigation. The following screenshot shows an example of the Proxmox GUI dashboard: KVM virtual machines KVM or Kernel-based virtual machine is a kernel module that is added to Linux for full virtualization to create isolated fully independent virtual machines. KVM VMs are not dependent on the host operating system in any way, but they do require the virtualization feature in BIOS to be enabled. KVM allows a wide variety of operating systems for virtual machines, such as Linux and Windows. Proxmox provides a very stable environment for KVM-based VMs. Linux containers or LXC Introduced recently in Proxmox VE 4.0, Linux containers allow multiple Linux instances on the same Linux host. All the containers are dependent on the host Linux operating system and only Linux flavors can be virtualized as containers. There are no containers for the Windows operating system. LXC replace prior OpenVZ containers, which were the primary containers in the virtualization method in the previous Proxmox versions. If you are not familiar with LXC and for details on LXC, refer to https://linuxcontainers.org/. Storage plugins Out of the box, Proxmox VE supports a variety of storage systems to store virtual disk images, ISO templates, backups, and so on. All plug-ins are quite stable and work great with Proxmox. Being able to choose different storage systems gives an administrator the flexibility to leverage the existing storage in the network. As of Proxmox VE 4.0, the following storage plug-ins are supported: The local directory mount points iSCSI LVM Group NFS Share GlusterFS Ceph RBD ZFS Vibrant culture Proxmox has a growing community of users who are always helping others to learn Proxmox and troubleshoot various issues. With so many active users around the world and through active participation of Proxmox developers, the community has now become a culture of its own. Feature requests are continuously being worked on, and the existing features are being strengthened on a regular basis. With so many users supporting Proxmox, it is sure here to stay. The basic installation of Proxmox The installation of a Proxmox node is very straightforward. Simply, accept the default options, select localization, and enter the network information to install Proxmox VE. We can summarize the installation process in the following steps: Download ISO from the official Proxmox site and prepare a disc with the image (http://proxmox.com/en/downloads). Boot the node with the disc and hit enter to start the installation from the installation GUI. We can also install Proxmox from a USB drive. Progress through the prompts to select options or type in information. After the installation is complete, access the Proxmox GUI dashboard using the IP address, as follows: https://<proxmox_node_ip:8006 In some cases, it may be necessary to open the firewall port to allow access to the GUI over port 8006. The advanced installation option Although the basic installation works in all scenarios, there may be times when the advanced installation option may be necessary. Only the advanced installation option provides you the ability to customize the main OS drive. A common practice for the operating system drive is to use a mirror RAID array using a controller interface. This provides drive redundancy if one of the drives fails. This same level of redundancy can also be achieved using a software-based RAID array, such as ZFS. Proxmox now offers options to select ZFS-based arrays for the operating system drive right at the beginning of the installation. For details on ZFS, if you are not familiar with ZFS, refer to https://en.wikipedia.org/wiki/ZFS. It is a common question to ask why one should choose ZFS software RAID over tried and tested hardware-based RAID. The simple answer is flexibility. A hardware RAID is locked or fully dependent on the hardware RAID controller interface that created the array, whereas ZFS software-based is not dependent on any hardware, and the array can be easily be ported to different hardware nodes. Should a RAID controller failure occur, the entire array created from that controller is lost unless there is an identical controller interface available for replacement? The ZFS array is only lost when all the drives or maximum tolerable number of drives are lost in the array. Besides ZFS, we can also select other filesystem types, such as ext3, ext4, or xfs from the same advanced option. We can also set the custom disk or partition sizes through the advanced option. The following screenshot shows the installation interface with the Target Hard disk selection page: Click on Options, as shown in the preceding screenshot, to open the advanced option for the Hard disk. The following screenshot shows the option window after clicking on the Options button: In the preceding screenshot, we selected ZFS RAID1 for mirroring and the two drives, Harddisk 0 and Harddisk 1, respectively to install Proxmox. If we pick one of the filesystems such as ext3, ext4, or xfs instead of ZFS, the Hard disk Option dialog box will look like the following screenshot with different set of options: Selecting a filesystem gives us the following advanced options: hdsize: This is the total drive size to be used by the Proxmox installation. swapsize: This defines the swap partition size. maxroot: This defines the maximum size to be used by the root partition. minfree: This defines the minimum free space that should remain after the Proxmox installation. maxvz: This defines the maximum size for data partition. This is usually /var/lib/vz. Debugging the Proxmox installation Debugging features are part of any good operating system. Proxmox has debugging features that will help you during a failed installation. Some common reasons are unsupported hardware, conflicts between devices, ISO image errors, and so on. Debugging mode logs and displays installation activities in real time. When the standard installation fails, we can start the Proxmox installation in debug mode from the main installation interface, as shown in the following screenshot: The debug installation mode will drop us in the following prompt. To start the installation, we need to press Ctrl + D. When there is an error during the installation, we can simply press Ctrl + C to get back to this console to continue with our investigation: From the console, we can check the installation log using the following command: # cat /tmp/install.log From the main installation menu, we can also press e to enter edit mode to change the loader information, as shown in the following screenshot: At times, it may be necessary to edit the loader information when normal booting does not function. This is a common case when Proxmox is unable to show the video output due to UEFI or a nonsupported resolution. In such cases, the booting process may hang. One way to continue with booting is to add the nomodeset argument by editing the loader. The loader will look as follows after editing: linux/boot/linux26 ro ramdisk_size=16777216 rw quiet nomodeset Customizing the Proxmox splash screen When building a custom Proxmox solution, it may be necessary to change the default blue splash screen to something more appealing in order to identify the company or department the server belongs to. In this section, we will see how easily we can integrate any image as the splash screen background. The splash screen image must be in the .tga format and must have fixed standard sizes, such as 640 x 480, 800 x 600, or 1024 x 768. If you do not have any image software that supports the .tga format, you can easily convert an jpg, gif, or png image to the .tga format using a free online image converter (http://image.online-convert.com/convert-to-tga). Once the desired image is ready in the .tga format, the following steps will integrate the image as the Proxmox splash screen: Copy the .tga image in the Proxmox node in the /boot/grub directory. Edit the grub file in /etc/default/grub to add the following code, and click on save: GRUB_BACKGROUND=/boot/grub/<image_name>.tga Run the following command to update the grub configuration: # update-grub Reboot. The following screenshot shows an example of how the splash screen may look like after we add a custom image to it: Picture courtesy of www.techcitynews.com We can also change the font color to make it properly visible, depending on the custom image used. To change the font color, edit the debian theme file in /etc/grub.d/05_debian_theme, and find the following line of code: set_background_image "${GRUB_BACKGROUND}" || set_default_theme Edit the line to add the font color, as shown in the following format. In our example, we have changed the font color to black and highlighted the font color to light blue: set_background_image "${GRUB_BACKGROUND}" "black/black" "light-blue/black" || set_default_theme After making the necessary changes, update grub, and reboot to see the changes. Summary In this article, we looked at why Proxmox is a better option as a hypervisor, what advanced installation options are available during an installation, and why do we choose software RAID for the operating system drive. We also looked at the cost of Proxmox, storage options, and network flexibility using openvswitch. We learned the presence of the debugging features and customization options of the Proxmox splash screen. In next article, we will take a closer look at the Proxmox GUI and see how easy it is to centrally manage a Proxmox cluster from a web browser. Resources for Article:   Further resources on this subject: Proxmox VE Fundamentals [article] Basic Concepts of Proxmox Virtual Environment [article]
Read more
  • 0
  • 0
  • 21868
article-image-cluster-computing-using-scala
Packt
13 Apr 2016
18 min read
Save for later

Cluster Computing Using Scala

Packt
13 Apr 2016
18 min read
In this article by Vytautas Jančauskas the author of the book Scientific Computing with Scala, explains the way of writing software to be run on distributed computing clusters. We will learn the MPJ Express library here. (For more resources related to this topic, see here.) Very often when dealing with intense data processing tasks and simulations of physical phenomena, there comes a time when no matter how many CPU cores and memory your workstation has, it is not enough. At times like these, you will want to turn to supercomputing clusters for help. These distributed computing environments consist of many nodes (each node being a separate computer) connected into a computer network using specialized high bandwidth and low latency connections (or if you are on a budget standard Ethernet hardware is often enough). These computers usually utilize a network filesystem allowing each node to see the same files. They communicate using messaging libraries, such as MPI. Your program will run on separate computers and utilize the message passing framework to exchange data via the computer network. Using MPJ Express for distributed computing MPJ Express is a message passing library for distributed computing. It works in programming languages using Java Virtual Machine (JVM). So, we can use it from Scala. It is similar in functionality and programming interface to MPI. If you know MPI, you will be able to use MPJ Express pretty much the same way. The differences specific to Scala are explained in this section. We will start with how to install it. For further reference, visit the MPJ Express website given here: http://mpj-express.org/ Setting up and running MPJ Express The steps to set up and run MPJ Express are as follows: First, download MPJ Express from the following link. The version at the time of this writing is 0.44.http://mpj-express.org/download.php Unpack the archive and refer to the included README file for installation instructions. Currently, you have to set MPJ_HOME to the folder you unpacked the archive to and add the bin folder in that archive to your path. For example, if you are a Linux user using bash as your shell, you can add the following two lines to your .bashrc file (the file is in your home directory at /home/yourusername/.bashrc): export MPJ_HOME=/home/yourusername/mpj export PATH=$MPJ_HOME/bin:$PATH Here, mpj is the folder you extracted the archive you downloaded from the MPJ Express website to. If you are using a different system, you will have to do the equivalent of the above for your system to use MPJ Express. We will want to use MPJ Express with Scala Build Tool (SBT), which we used previously to build and run all of our programs. Create the following directory structure: scalacluster/ lib/ project/ plugins.sbt build.sbt I have chosen to name the project folder asscalacluster here, but you can call it whatever you want. The .jar files in the lib folder will be accessible to your program now. Copy the contents of the lib folder from the mpj directory to this folder. Finally, create an empty build.sbt and plugins.sbt files. Let’s now write and run a simple "Hello, World!" program to test our setup: import mpi._ object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank val size: Int = MPI.COMM_WORLD.Size println("Hello, World, I'm <" + me + ">") MPI.Finalize() } } This should be familiar to everyone who has ever used MPI. First, we import everything from the mpj package. Then, we initialize MPJ Express by calling MPI.Initialize, the arguments to MPJ Express will be passed from the command-line arguments you will enter when running the program. The MPI.COMM_WORLD.Rank() function returns the MPJ processes rank. A rank is a unique identifier used to distinguish processes from one another. They are used when you want different processes to do different things. A common pattern is to use the process with rank 0 as the master process and the processes with other ranks as workers. Then, you can use the processes rank to decide what action to take in the program. We also determine how many MPJ processes were launched by checking MPI.COMM_WORLD.Size. Our program will simply print a processes rank for now. We will want to run it. If you don't have a distributed computing cluster readily available, don't worry. You can test your programs locally on your desktop or laptop. The same program will work without changes on clusters as well. To run programs written using MPJ Express, you have to use the mpjrun.sh script. This script will be available to you if you have added the bin folder of the MPJ Express archive to your PATH as described in the section on installing MPJ Express. The mpjrun.sh script will setup the environment for your MPJ Express processes and start said processes. The mpjrun.sh script takes a .jar file, so we need to create one. Unfortunately for us, this cannot easily be done using the sbt package command in the directory containing our program. This worked previously, because we used Scala runtime to execute our programs. MPJ Express uses Java. The problem is that the .jar package created with sbt package does not include Scala's standard library. We need what is called a fat .jar—one that contains all the dependencies within itself. One way of generating it is to use a plugin for SBT called sbt-assembly. The website for this plugin is given here: https://github.com/sbt/sbt-assembly There is a simple way of adding the plugin for use in our project. Remember that project/plugins.sbt file we created? All you need to do is add the following line to it (the line may be different for different versions of the plugin. Consult the website): addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1") Now, add the following to the build.sbt file you created: lazy val root = (project in file(".")). settings( name := "mpjtest", version := "1.0", scalaVersion := "2.11.7" ) Then, execute the sbt assembly command from the shell to build the .jar file. The file will be put under the following directory if you are using the preceding build.sbt file. That is, if the folder you put the program and build.sbt in is /home/you/cluster: /home/you/cluster/target/scala-2.11/mpjtest-assembly- 1.0.jar Now, you can run the mpjtest-assembly-1.0.jar file as follows: $ mpjrun.sh -np 4 -jar target/scala-2.11/mpjtest-assembly-1.0.jar MPJ Express (0.44) is started in the multicore configuration Hello, World, I'm <0> Hello, World, I'm <2> Hello, World, I'm <3> Hello, World, I'm <1> Argument -np specifies how many processes to run. Since we specified -np 4, four processes will be started by the script. The order of the "Hello, World" messages can differ on your system since the precise order of execution of different processes is undetermined. If you got the output similar to the one shown here, then congratulations, you have done the majority of the work needed to write and deploy applications using MPJ Express. Using Send and Recv MPJ Express processes can communicate using Send and Recv. These methods constitute arguably the simplest and easiest to understand mode of operation that is also probably the most error prone. We will look at these two first. The following are the signatures for the Send and Recv methods: public void Send(java.lang.Object buf, int offset, int count, Datatype datatype, int dest, int tag) throws MPIException public Status Recv(java.lang.Object buf, int offset, int count, Datatype datatype, int source, int tag) throws MPIException Both of these calls are blocking. This means that after calling Send, your process will block (will not execute the instructions following it) until a corresponding Recv is called by another process. Also Recv will block the process, until a corresponding Send happens. By corresponding, we mean that the dest and source arguments of the calls have the values corresponding to receivers and senders ranks, respectively. The two calls will be enough to implement many complicated communication patterns. However, they are prone to various problems such as deadlocks. Also, they are quite difficult to debug, since you have to make sure that each Send has the correct corresponding Recv and vice versa. The parameters for Send and Recv are basically the same. The meanings of those parameters are summarized in the following table: Argument Type Description Buf java.lang.Object It has to be a one-dimensional Java array. When using from Scala, use the Scala array, which is a one-to-one mapping to a Java array. offset int The start of the data you want to pass from the start of the array. Count int This shows the number items of the array you want to pass. datatype Datatype The type of data in the array. Can be one of the following: MPI.BYTE, MPI.CHAR, MPI.SHORT, MPI.BOOLEAN, MPI.INT, MPI.LONG, MPI.FLOAT, MPI.DOUBLE, MPI.OBJECT, MPI.LB, MPI.UB, and MPI.PACKED. dest/source int Either the destination to send the message to or the source to get the message from. You use the rank of the process to identify sources and destinations. tag int Used to tag the message. Can be used to introduce different message types. Can be ignored for most common applications. Let’s look at a simple program using these calls for communication. We will implement a simple master/worker communication pattern: import mpi._ import scala.util.Random object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { Here, we use an if statement to identify who we are based on our rank. Since each process gets a unique rank, this allows us to determine what action should be taken. In our case, we assigned the role of the master to the process with rank 0 and the role of a worker to processes with other ranks: for (i <- 1 until size) { val buf = Array(Random.nextInt(100)) MPI.COMM_WORLD.Send(buf, 0, 1, MPI.INT, i, 0) println("MASTER: Dear <" + i + "> please do work on " + buf(0)) } We iterate over workers, who have the ranks from 1 to whatever is the argument for number of processes you passed to the mpjrun.sh script. Let’s say that number is four. This gives us one master process and three worker processes. So, each process with a rank from 1 to 3 will get a randomly generated number. We have to put that number in an array even though it is a single number. This is because both Send and Recv methods expect an array as their first argument. We then use the Send method to send the data. We specified the array as argument buf, offset of 0, size of 1, type MPI.INT, destination as the for loop index, and tag as 0. This means that each of our three worker processes will receive a (most probably) different number: for (i <- 1 until size) { val buf = Array(0) MPI.COMM_WORLD.Recv(buf, 0, 1, MPI.INT, i, 0) println("MASTER: Dear <" + i + "> thanks for the reply, which was " + buf(0)) } Finally, we collect the results from the workers. For this, we iterate over the worker ranks and use the Recv method on each one of them. We print the result we got from the worker, and this concludes the master's part. We now move on to the workers: } else { val buf = Array(0) MPI.COMM_WORLD.Recv(buf, 0, 1, MPI.INT, 0, 0) println("<" + me + ">: " + "Understood, doing work on " + buf(0)) buf(0) = buf(0) * buf(0) MPI.COMM_WORLD.Send(buf, 0, 1, MPI.INT, 0, 0) println("<" + me + ">: " + "Reporting back") } The workers code is identical for all of them. They receive a message from the master, calculate the square of it, and send it back: MPI.Finalize() } } After you run the program, the results should be akin to the following, which I got when running this program on my system: MASTER: Dear <1> please do work on 71 MASTER: Dear <2> please do work on 12 MASTER: Dear <3> please do work on 55 <1>: Understood, doing work on 71 <1>: Reported back MASTER: Dear <1> thanks for the reply, which was 5041 <3>: Understood, doing work on 55 <2>: Understood, doing work on 12 <2>: Reported back MASTER: Dear <2> thanks for the reply, which was 144 MASTER: Dear <3> thanks for the reply, which was 3025 <3>: Reported back Sending Scala objects in MPJ Express messages Sometimes, the types provided by MPJ Express for use in the Send and Recv methods are not enough. You may want to send your MPJ Express processes a Scala object. A very realistic example of this would be to send an instance of a Scala case class. These can be used to construct more complicated data types consisting of several different basic types. A simple example is a two-dimensional vector consisting of x and y coordinates. This can be sent as a simple array, but more complicated classes can't. For example, you may want to use a case class as the one shown here. It has two attributes of type String and one attribute of type Int. So what do we do with a data type like this? The simplest answer to that problem is to serialize it. Serializing converts an object to a stream of characters or a string that can be sent over the network (or stored to a file or done other things with) and later on deserialized to get the original object back: scala> case class Person(name: String, surname: String, age: Int) defined class Person scala> val a = Person("Name", "Surname", 25) a: Person = Person(Name,Surname,25) A simple way of serializing is to use a format such as XML or JSON. This can be done automatically using a pickling library. Pickling is a term that comes from the Python programming language. It is the automatic conversion of an arbitrary object into a string representation that can later be de-converted to get the original object back. The reconstructed object will behave the same way as it did before conversion. This allows one to store arbitrary objects to files for example. There is a pickling library available for Scala as well. You can of course do serialization in several different ways (for example, using the powerful support for XML available in Scala). We will use the pickling library that is available from the following website for this example: https://github.com/scala/pickling You can install it by adding the following line to your build.sbt file: libraryDependencies += "org.scala-lang.modules" %% "scala- pickling" % "0.10.1" After doing that, use the following import statements to enable easy pickling in your projects: scala> import scala.pickling.Defaults._ import scala.pickling.Defaults._ scala> import scala.pickling.json._ import scala.pickling.json._ Here, you can see how you can then easily use this library to pickle and unpickle arbitrary objects without the use of annoying boiler plate code: scala> val pklA = a.pickle pklA: pickling.json.pickleFormat.PickleType = JSONPickle({ "$type": "Person", "name": "Name", "surname": "Surname", "age": 25 }) scala> val unpklA = pklA.unpickle[Person] unpklA: Person = Person(Name,Surname,25) Let’s see how this would work in an application using MPJ Express for message passing. A program using pickling to send a case class instance in a message is given here: import mpi._ import scala.pickling.Defaults._ import scala.pickling.json._ case class ArbitraryObject(a: Array[Double], b: Array[Int], c: String) Here, we have chosen to define a fairly complex case class, consisting of two arrays of different types and a string: object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { val obj = ArbitraryObject(Array(1.0, 2.0, 3.0), Array(1, 2, 3), "Hello") val pkl = obj.pickle.value.toCharArray MPI.COMM_WORLD.Send(pkl, 0, pkl.size, MPI.CHAR, 1, 0) In the preceding bit of code, we create an instance of our case class. We then pickle it to JSON and get the string representation of said JSON with the value method. However, to send it in an MPJ message, we need to convert it to a one-dimensional array of one of the supported types. Since it is a string, we convert it to a char array. This is done using the toCharArray method: } else if (me == 1) { val buf = new Array[Char](1000) MPI.COMM_WORLD.Recv(buf, 0, 1000, MPI.CHAR, 0, 0) val msg = buf.mkString val obj = msg.unpickle[ArbitraryObject] On the receiving end, we get the raw char array, convert it back to string using mkString method, and then unpickle it using unpickle[T]. This will return an instance of the case class that we can use as any other instance of a case class. It is in its functionality the same object that was sent to us: println(msg) println(obj.c) } MPI.Finalize() } } The following is the result of running the preceding program. It prints out the JSON representation of our object, and also show that we can access the attributes of said object by printing the c attribute. MPJ Express (0.44) is started in the multicore configuration: { "$type": "ArbitraryObject", "a": [ 1.0, 2.0, 3.0 ], "b": [ 1, 2, 3 ], "c": "Hello" } Hello You can use this method to send arbitrary objects in an MPJ Express message. However, this is just one of many ways of doing this. As mentioned previously, an example of another way is to use the XML representation. XML support is strong in Scala, and you can use it to serialize objects as well. This will usually require you to add some boiler plate code to your program to serialize to XML. The method discussed earlier has the advantage of requiring no boiler plate code. Non-blocking communication So far, we examined only blocking (or synchronous) communication between two processes. This means that the process is blocked (halted their execution) until the Send or Recv methods have been completed successfully. This is simple to understand and enough for most cases. The problem with synchronous communication is that you have to be very careful otherwise deadlocks may occur. Deadlocks are situations when processes wait for each other to release a resource first. Mexican standoff including the dining philosophers problem is one of the famous example of Deadlock in Operating System. The point is that if you are unlucky, you may end up with a program that is seemingly stuck and you don't know why. Using nonlocking communication allows you to avoid these problems most of the time. If you think you may be at risk of deadlocks, you will probably want to use it. The signatures for the primary methods used in asynchronous communication are given here: Request Isend(java.lang.Object buf, int offset, int count, Datatype datatype, int dest, int tag) Isend works similar to its Send counterpart. The main differences are that it does not block (the program continues execution after the call rather than waiting for a corresponding send), and then it returns a Request object. This object is used to check the status of your Send request, block until it is complete if required, and so on: Request Irecv(java.lang.Object buf, int offset, int count, Datatype datatype, int src, int tag) Irecv is again the same as Recv only non-blocking and returns a Request object used to handle your receive request. The operation of these methods can be seen in action in the following example: import mpi._ object MPJTest { def main(args: Array[String]) { MPI.Init(args) val me: Int = MPI.COMM_WORLD.Rank() val size: Int = MPI.COMM_WORLD.Size() if (me == 0) { val requests = for (i <- 0 until 10) yield { val buf = Array(i * i) MPI.COMM_WORLD.Isend(buf, 0, 1, MPI.INT, 1, 0) } } else if (me == 1) { for (i <- 0 until 10) { Thread.sleep(1000) val buf = Array[Int](0) val request = MPI.COMM_WORLD.Irecv (buf, 0, 1, MPI.INT, 0, 0) request.Wait() println("RECEIVED: " + buf(0)) } } MPI.Finalize() } } This is a very simplistic example used simply to demonstrate the basics of using the asynchronous message passing methods. First, the process with rank 0 will send 10 messages to process with rank 1 using Isend. Since Isend does not block, the loop will finish quickly and the messages it sent will be buffered until they are retrieved using Irecv. The second process (the one with rank 1) will wait for one second before retrieving each message. This is to demonstrate the asynchronous nature of these methods. The messages are in the buffer waiting to be retrieved. Therefore, Irecv can be used at your leisure when convenient. The Wait() method of the Request object, it returns, has to be used to retrieve results. The Wait() method blocks until the message is successfully received from the buffer. Summary Extremely computationally intensive programs are usually parallelized and run on supercomputing clusters. These clusters consist of multiple networked computers. Communication between these computers is usually done using messaging libraries such as MPI. These allow you to pass data between processes running on different machines in an efficient manner. In this article, you have learned how to use MPJ Express—an MPI like library for JVM. We saw how to carry out process to process communication as well as collective communication. Most important MPJ Express primitives were covered and example programs using them were given. Resources for Article: Further resources on this subject: Differences in style between Java and Scala code[article] Getting Started with JavaFX[article] Integrating Scala, Groovy, and Flex Development with Apache Maven[article]
Read more
  • 0
  • 0
  • 4814

Packt
13 Apr 2016
7 min read
Save for later

Nginx "expires" directive – Emitting Caching Headers

Packt
13 Apr 2016
7 min read
In this article by Alex Kapranoff, the author of the book Nginx Troubleshooting, explains how all browsers (and even many non-browser HTTP clients) support client-side caching. It is a part of the HTTP standard, albeit one of the most complex caching to understand. Web servers do not control client-side caching to full extent, obviously, but they may issue recommendations about what to cache and how, in the form of special HTTP response headers. This is a topic thoroughly discussed in many great articles and guides, so we will mention it shortly, and with a lean towards problems you may face and how to troubleshoot them. (For more resources related to this topic, see here.) In spite of the fact that browsers have been supporting caching on their side for at least 20 years, configuring cache headers was always a little confusing mostly due to the fact that there two sets of headers designed for the same purpose but having different scopes and totally different formats. There is the Expires: header, which was designed as a quick and dirty solution and also the new (relatively) almost omnipotent Cache-Control: header, which tries to support all the different ways an HTTP cache could work. This is an example of a modern HTTP request-response pair containing the caching headers. First is the request headers sent from the browser (here Firefox 41, but it does not matter): User-Agent:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0" Accept:"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" Accept-Encoding:"gzip, deflate" Connection:"keep-alive" Cache-Control:"max-age=0" Then, the response headers is: Cache-Control:"max-age=1800" Content-Encoding:"gzip" Content-Type:"text/html; charset=UTF-8" Date:"Sun, 10 Oct 2015 13:42:34 GMT" Expires:"Sun, 10 Oct 2015 14:12:34 GMT" We highlighted the parts that are relevant. Note that some directives may be sent by both sides of the conversation. First, the browser sent the Cache-Control: max-age=0 header because the user pressed the F5 key. This is an indication that the user wants to receive a response that is fresh. Normally, the request will not contain this header and will allow any intermediate cache to respond with a stale but still nonexpired response. In this case, the server we talked to responded with a gzipped HTML page encoded in UTF-8 and indicated that the response is okay to use for half an hour. It used both mechanisms available, the modern Cache-Control:max-age=1800 header and the very old Expires:Sun, 10 Oct 2015 14:12:34 GMT header. The X-Cache: "EXPIRED" header is not a standard HTTP header but was also probably (there is no way to know for sure from the outside) emitted by Nginx. It may be an indication that there are, indeed, intermediate caching proxies between the client and the server, and one of them added this header for debugging purposes. The header may also show that the backend software uses some internal caching. Another possible source of this header is a debugging technique used to find problems in the Nginx cache configuration. The idea is to use the cache hit or miss status, which is available in one of the handy internal Nginx variables as a value for an extra header and then to be able to monitor the status from the client side. This is the code that will add such a header: add_header X-Cache $upstream_cache_status; Nginx has a special directive that transparently sets up both of standard cache control headers, and it is named expires. This is a piece of the nginx.conf file using the expires directive: location ~* \.(?:css|js)$ { expires 1y; add_header Cache-Control "public"; } First, the pattern uses the so-called noncapturing parenthesis, which is a feature first appeared in Perl regular expressions. The effect of this regexp is the same as of a simpler \.(css|js)$ pattern, but the regular expression engine is specifically instructed not to create a variable containing the actual string from inside the parenthesis. This is a simple optimization. Then, the expires directive declares that the content of the css and js files will expire after a year of storage. The actual headers as received by the client will look like this: Server: nginx/1.9.8 (Ubuntu) Date: Fri, 11 Mar 2016 22:01:04 GMT Content-Type: text/css Last-Modified: Thu, 10 Mar 2016 05:45:39 GMT Expires: Sat, 11 Mar 2017 22:01:04 GMT Cache-Control: max-age=31536000 The last two lines contain the same information in wildly different forms. The Expires: header is exactly one year after the date in the Date: header, whereas Cache-Control: specifies the age in seconds so that the client do the date arithmetics itself. The last directive in the provided configuration extract adds another Cache-Control: header with a value of public explicitly. What this means is that the content of the HTTP resource is not access-controlled and therefore may be cached not only for one particular user but also anywhere else. A simple and effective strategy that was used in offices to minimize consumed bandwidth is to have an office-wide caching proxy server. When one user requested a resource from a website on the Internet and that resource had a Cache-Control: public designation, the company cache server would store that to serve to other users on the office network. This may not be as popular today due to cheap bandwidth, but because history has a tendency to repeat itself, you need to know how and why Cache-Control: public works. The Nginx expires directive is surprisingly expressive. It may take a number of different values. See this table: off This value turns off Nginx cache headers logic. Nothing will be added, and more importantly, existing headers received from upstreams will not be modified. epoch This is an artificial value used to purge a stored resource from all caches by setting the Expires header to "1 January, 1970 00:00:01 GMT". max This is the opposite of the "epoch" value. The Expires header will be equal to "31 December 2037 23:59:59 GMT", and the Cache-Control max-age set to 10 years. This basically means that the HTTP responses are guaranteed to never change, so clients are free to never request the same thing twice and may use their own stored values. Specific time An actual specific time value means an expiry deadline from the time of the respective request. For example, expires 10w; A negative value for this directive will emit a special header Cache-Control: no-cache. "modified" specific time If you add the keyword "modified" before the time value, then the expiration moment will be computed relatively to the modification time of the file that is served. "@" specific time A time with an @ prefix specifies an absolute time-of-day expiry. This should be less than 24 hours. For example, Expires @17h;. Many web applications choose to emit the caching headers themselves, and this is a good thing. They have more information about which resources change often and which never change. Tampering with the headers that you receive from the upstream may or may not be a thing you want to do. Sometimes, adding headers to a response while proxying it may produce a conflicting set of headers and therefore create an unpredictable behavior. The static files that you serve with Nginx yourself should have the expires directive in place. However, the general advice about upstreams is to always examine the caching headers you get and refrain from overoptimizing by setting up more aggressive caching policy. Resources for Article: Further resources on this subject: Nginx service [article] Fine-tune the NGINX Configuration [article] Nginx Web Services: Configuration and Implementation [article]
Read more
  • 0
  • 0
  • 26567
Modal Close icon
Modal Close icon