Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-benchmarking-and-optimizing-go-code-part-1
Alex Browne
16 Mar 2015
6 min read
Save for later

Benchmarking and Optimizing Go Code, Part 1

Alex Browne
16 Mar 2015
6 min read
In this two part post, you'll learn the basics of how to benchmark and optmize go code. You'll do this by trying out a few different implementations for calculating a specific element in Pascal's Triangle (a.k.a. the binomial coeffecient). I'll assume that you are already familiar with go, but if you aren't, I recommend the interactive tutorial. All of the code for these two posts is available on github. Installation & Set Up To follow along, you will need to install go version 1.2 or later. Also, make sure that you follow these instructions for setting up your go work environment. In particular, you will need to have the GOPATH environment variable pointing to a directory where all your go code will reside. The Goal Our goal is to write a function to calculate a specific element of Pascal's Triangle. In case you aren't familiar with Pascal's Triangle, it looks something like this: Pascal's Triangle follows these simple rules: The first row contains a single element (1) Every subsequent element is the sum of the two elements directly above it (one above and to the left, and the other above and to the right). If either the element above and to the left or the one above and to the right are absent, consider them to be equal to zero. For convenience, we'll index all of the rows and columns in Pascal's Triangle, starting with 0. So the element at the very top of the triangle is at index (0, 0). The element at index (4, 1) would be at row 4 and column 1, which is 1 + 3 = 4. The function we'll be writing will return the element at row n, column m of Pascal's Triangle and has the following signature: func Pascal(n, m int) int So Pascal(0, 0) should return 1 and Pascal(5, 2) would return 10. In these posts, we'll write several different implementations of the Pascal function, test each one for correctness, and benchmark them to see which is the fastest. Project Structure Now's a good time to setup the directory where your code for this project will reside. Somehwere in $GOPATH/src, create a new directory and call it whatever you want. I recommend $GOPATH/src/github.com/your-github-username/go-benchmark-example. Our basic project structure is going to look like this: go-benchmark-example common common.go implementations builtin.go naive.go recursive.go test benchmark_test.go pascal_test.go The implementations package will hold a few different implementations of the Pascal function, each in their own file. The test package is where we will test the implementations for corectness (in pascal_test.go) and then benchmark their performance (in benchmark_test.go). Without further ado, let's start writing code! The Pascaler Interface To make comparing different implementations easier, we'll create an interface that all of the implementations should implement. (You'll see why this is handy when we write the tests and benchmarks). Add the following to common/common.go: package common type Pascaler interface { Pascal(int, int) int } That's it! All we've done is declare an interface that consists of one method, Pascal, which takes two ints as arguments (the row and column) and returns the value of the specified element in Pascal's Triangle. The seemingly odd name "Pascaler" is just the convention in go for an interface with only one method. Naive Implementation The first implementation we'll write is a naive iterative one. The basic idea is to generate the triangle from top to bottom until we reach row n and column m. We call this implementation "naive" because it does not attempt to do anything clever and will probably not perform the best. Add the following to implementations/naive.go: package implementations type naiveType struct{} var Naive = naiveType{} func (p naiveType) Pascal(n, m int) int { // Instantiate a slice to hold n+1 rows rows := make([][]int, n+1) // Start by hard-coding the first two rows if n < 2 { rows = make([][]int, 2) } rows[0] = []int{1} rows[1] = []int{1, 1} // Iterate from top to bottom until we reach row n for i := 2; i <= n; i++ { numColumns := i + 1 rows[i] = make([]int, numColumns) rows[i][0] = 1 rows[i][numColumns-1] = 1 for j := 1; j < numColumns-1; j++ { // Element (i, j) is equal to the sum of the two elements // directly above it rows[i][j] = rows[i-1][j-1] + rows[i-1][j] } } return rows[n][m] } // Implement the Stringer interface so we can print the Naive // object directly with fmt.Print and friends func (p naiveType) String() string { return "Naive Implementation" } Testing for Correctness The next thing we'll do is test our implementation to make sure it is correct. Since we want to test each implementation the same way, we'll write a small utility function called testPascaler, which takes a Pascaler as an argument. The function will iterate through an array of test cases and run each case against the given implementation. Add the following to test/pascal_test.go: import ( "github.com/albrow/go-benchmark-example/common" "github.com/albrow/go-benchmark-example/implementations" "reflect" "testing" ) func TestNaive(t *testing.T) { testPascaler(t, implementations.Naive) } // testPascaler can be used to test any implementation. It uses a // series of test cases and reports any errors using t.Error. func testPascaler(t *testing.T, p common.Pascaler) { // cases is an array of test cases, each consisting of two inputs // n and m and the expected output. cases := []struct { n int m int expected int }{ {0, 0, 1}, {1, 0, 1}, {1, 1, 1}, {4, 1, 4}, {6, 4, 15}, {7, 3, 35}, {7, 0, 1}, {7, 7, 1}, {32, 16, 601080390}, {64, 32, 1832624140942590534}, } // Iterate through each test case and check the result for _, c := range cases { if reflect.TypeOf(p) == reflect.TypeOf(implementations.Recursive) && c.n > 30 { // Skip cases where n is too large for the recursive implementation. // It takes too long and might even timeout. continue } got := p.Pascal(c.n, c.m) if got != c.expected { t.Errorf("Incorrect result for %s with inputs (%d, %d).nExpected %d but got %d.", p, c.n, c.m, c.expected, got) } } } To run the test, just run the following command from your project directory: go test ./.... If everything works as expected and the test passes, you should see the following output: ? github.com/albrow/go-benchmark-example/common [no test files] ? github.com/albrow/go-benchmark-example/implementations [no test files] ok github.com/albrow/go-benchmark-example/test 0.005s Conclusion Follow along in Part 2 where we will cover benchmarking for performance, a recursive implementation, and a bultin implementation. About the Author Alex Browne is a programmer and entrepreneur with about 4 years of product development experience and 3 years experience working on small startups. He has worked with a wide variety of technologies and has single-handedly built many production-grade applications. He is currently working with two co-founders on an early stage startup exploring ways of applying machine learning and computer vision to the manufacturing industry.His favorite programming language is Go.
Read more
  • 0
  • 0
  • 4706

article-image-add-a-twitter-sign-in-to-your-ios-app-with-twitterkit
Doron Katz
15 Mar 2015
5 min read
Save for later

Add a Twitter Sign In To Your iOS App with TwitterKit

Doron Katz
15 Mar 2015
5 min read
What is TwitterKit & Digits? In this post we take a look at Twitter’s new Sign-in API, TwitterKit and Digits, bundled as part of it’s new Fabric suite of tools, announced by Twitter earlier this year, as well as providing you with two quick snippets on on how to integrate Twitter’s sign-in mechanism into your iOS app. Facebook, and to a lesser-extent Google, have for a while dominated the single sign-on paradigm, through their SDK or Accounts.framework on iOS, to encourage developers to provide a consolidated form of login for their users. Twitter has decided to finally get on the band-wagon and work toward improving its brand through increasing sign-on participation, and providing a concise way for users to log-in to their favorite apps without needing to remember individual passwords. By providing a Login via Twitter button, developers will gain the user’s Twitter identity, and subsequently their associated twitter tweets and associations. That is, once the twitter account is identified, the app can engage followers of that account (friends), or to access the user’s history of tweets, to data-mind for a specific keyword or hashtag. In addition to offering a single sign-on, Twitter is also offering Digits, the ability for users to sign-on anonymously using one’s phone number, synonymous with Facebook’s new Anonymous Login API.   The benefits of Digit The rationale behind Digits is to provide users with the option of trusting the app or website and providing their Twitter identification in order to log-in. Another option for the more hesitant ones wanting to protect their social graph history, is to only provide a unique number, which happens to be a mobile number, as a means of identification and authentication. Another benefit for users is that logging in is dead-simple, and rather than having to go through a deterring form of identification questions, you just ask them for their number, from which they will get an authentication confirmation SMS, allowing them to log-in. With a brief introduction to TwitterKit and Digits, let’s show you how simple it is to implement each one. Logging in with TwitterKit Twitter wanted to make implementing its authentication mechanism a more simpler and attractive process for developers, which they did. By using the SDK as part of Twitter’s Fabric suite, you will already get your Twitter app set-up and ready, registered for use with the company’s SDK. TwitterKit aims to leverage the existing Twitter account on the iOS, using the Accounts.framework, which is the preferred and most rudimentary approach, with a fallback to using the OAuth mechanism. The easiest way to implement Twitter authentication is through the generated button, TWTRLogInButton, as we will demonstrate using iOS’Swift language. let authenticationButton = TWTRLogInButton(logInCompletion: { (session, error) in if (session != nil) { //We signed in, storing session in session object. } else { //we get an error, accessible from error object } }) It’s quite simple, leaving you with a TWTRLoginButton button subclass, that users can add to your view hierarchy and have users interact with. Logging in with Digits Having created a login button using TwitterKit, we will now create the same feature using Digits. The simplicity of implementation is maintained with Digits, with the simplest process once again to create a pre-configured button, DGTAuthenticateButton: let authenticationButton = TWTRLogInButton(logInCompletion: { (session, error) in if (session != nil) { //We signed in, storing session in session object. } else { //we get an error, accessible from error object } }) Summary Implementing TwitterKit and Digits are both quite straight forward in iOS, with different intentions. Whereas TwitterKit allows you to have full-access to the authenticated user’s social history, the latter allows for a more abbreviated, privacy-protected approach to authenticating. If at some stage the user decides to trust the app and feels more comfortable providing full access of her or his social history, you can defer catering to that till later in the app usage. The complete iOS reference for TwitterKit and Digits can be found by clicking here. The popularity and uptake of TwitterKit remains to be seen, but as an extra option for developers, when adding Facebook and Google+ login, users will have the option to pick their most trusted social media tool as their choice of authentication. Providing an anonymous mode of login also falls in line with the more privacy-conscious world, and Digits certainly provides a seamless way of implementing, and impressively straight-forward way for users to authenticate using their phone number. We have briefly demonstrated how to interact with Twitter’s SDK using iOS and Swift, but there is also an Android SDK version, with a Web version in the pipeline very soon, according to Twitter. This is certainly worth exploring, along with the rest of the tools offered in the Fabric suite, including analytics and beta-distribution tools, and more. About the author Doron Katz is an established Mobile Project Manager, Architect and Developer, and a pupil of the methodology of Agile Project Management,such as applying Kanban principles. Doron also believes in BehaviourDriven Development (BDD), anticipating user interaction prior to design, that is. Doron is also heavily involved in various technical user groups, such as CocoaHeads Sydney, and Adobe user Group.
Read more
  • 0
  • 0
  • 8469

article-image-sample-lemp-stack
Packt
12 Mar 2015
14 min read
Save for later

A Sample LEMP Stack

Packt
12 Mar 2015
14 min read
This article is written by Michael Peacock, the author of Creating Development Environments with Vagrant (Second Edition). Now that we have a good knowledge of using Vagrant to manage software development projects and how to use the Puppet provisioning tool, let's take a look at how to use these tools to build a Linux, Nginx, MySQL, and PHP (LEMP) development environment with Vagrant. In this article, you will learn the following topics: How to update the package manager How to create a LEMP-based development environment in Vagrant, including the following: How to install the Nginx web server How to customize the Nginx configuration file How to install PHP How to install and configure MySQL How to install e-mail sending services With the exception of MySQL, we will create simple Puppet modules to install and manage the software required. For MySQL, we will use the official Puppet module from Puppet Labs; this module makes it very easy for us to install and configure all aspects of MySQL. (For more resources related to this topic, see here.) Creating the Vagrant project First, we want to create a new project, so let's create a new folder called lemp-stack and initialize a new ubuntu/trusty64 Vagrant project within it by executing the following commands: mkdir lemp-stack cd lemp-stack vagrant init ubuntu/trusty64 ub The easiest way for us to pull in the MySQL Puppet module is to simply add it as a git submodule to our project. In order to add a git submodule, our project needs to be a git repository, so let's initialize it as a git repository now to save time later: git init To make the virtual machine reflective of a real-world production server, instead of forwarding the web server port on the virtual machine to another port on our host machine, we will instead network the virtual machine. This means that we would be able to access the web server via port 80 (which is typical on a production web server) by connecting directly to the virtual machine. In order to ensure a fixed IP address to which we can allocate a hostname on our network, we need to uncomment the following line from our Vagrantfile by removing the # from the start of the line: # config.vm.network "private_network", ip: "192.168.33.10" The IP address can be changed depending on the needs of our project. As this is a sample LEMP stack designed for web-based projects, let's configure our projects directory to a relevant web folder on the virtual machine: config.vm.synced_folder ".", "/var/www/project", type: "nfs" We will still need to configure our web server to point to this folder; however, it is more appropriate than the default mapping location of /vagrant. Before we run our Puppet provisioner to install our LEMP stack, we should instruct Vagrant to run the apt-get update command on the virtual machine. Without this, it isn't always possible to install new packages. So, let's add the following line to our Vagrant file within the |config| block: config.vm.provision "shell", inline: "apt-get update" As we will put our Puppet modules and manifests in a provision folder, we need to configure Vagrant to use the correct folders for our Puppet manifests and modules as well as the default manifest file. Adding the following code to our Vagrantfile will do this for us: config.vm.provision :puppet do |puppet|    puppet.manifests_path = "provision/manifests"    puppet.module_path = "provision/modules"    puppet.manifest_file = "vagrant.pp" end Creating the Puppet manifests Let's start by creating some folders for our Puppet modules and manifests by executing the following commands: mkdir provision cd provision mkdir modules mkdir manifests For each of the modules we want to create, we need to create a folder within the provision/modules folder for the module. Within this folder, we need to create a manifests folder, and within this, our Puppet manifest file, init.pp. Structurally, this looks something like the following: |-- provision |   |-- manifests |   |   `-- vagrant.pp |   `-- modules |       |-- our module |           |-- manifests |               `-- init.pp `-- Vagrantfile Installing Nginx Let's take a look at what is involved to install Nginx through a module and manifest file provision/modules/nginx/manifests/init.pp. First, we define our class, passing in a variable so that we can change the configuration file we use for Nginx (useful for using the same module for different projects or different environments such as staging and production environments), then we need to ensure that the nginx package is installed: class nginx ($file = 'default') {   package {"nginx":    ensure => present } Note that we have not closed the curly bracket for the nginx class. That is because this is just the first snippet of the file; we will close it at the end. Because we want to change our default Nginx configuration file, we should update the contents of the Nginx configuration file with one of our own (this will need to be placed in the provision/modules/nginx/files folder; unless the file parameter is passed to the class, the file default will be used): file { '/etc/nginx/sites-available/default':      source => "puppet:///modules/nginx/${file}",      owner => 'root',      group => 'root',      notify => Service['nginx'],      require => Package['nginx'] } Finally, we need to ensure that the nginx service is actually running once it has been installed: service { "nginx":    ensure => running,    require => Package["nginx"] } } This completes the manifest. We do still, however, need to create a default configuration file for Nginx, which is saved as provision/modules/nginx/files/default. This will be used unless we pass a file parameter to the nginx class when using the module. The sample file here is a basic configuration file, pointing to the public folder within our synced folder. The server name of lemp-stack.local means that Nginx will listen for requests on that hostname and will serve content from our projects folder: server {    listen   80;      root /var/www/project/public;    index index.php index.html index.htm;      server_name lemp-stack.local;      location / {        try_files $uri $uri/ /index.php?$query_string;    }      location ~ .php$ {        try_files $uri =404;        fastcgi_split_path_info ^(.+.php)(/.+)$;        #fastcgi_pass 127.0.0.1:9000;        fastcgi_param SERVER_NAME $host;        fastcgi_pass unix:/var/run/php5-fpm.sock;        fastcgi_index index.php;        fastcgi_intercept_errors on;        include fastcgi_params;    }      location ~ /.ht {        deny all;    }      location ~* .(jpg|jpeg|gif|css|png|js|ico|html)$ {        access_log off;        expires max;    }      location ~* .svgz {        add_header Content-Encoding "gzip";    } } Because this configuration file listens for requests on lemp-stack.local, we need to add a record to the hosts file on our host machine, which will redirect traffic from lemp-stack.local to the IP address of our virtual machine. Installing PHP To install PHP, we need to install a range of related packages, including the Nginx PHP module. This would be in the file provision/modules/php/manifests/init.pp. On more recent (within the past few years) Linux and PHP installations, PHP uses a handler called php-fpm as a bridge between PHP and the web server being used. This means that when new PHP modules are installed or PHP configurations are changed, we need to restart the php-fpm service for these changes to take effect, whereas in the past, it was often the web servers that needed to be restarted or reloaded. To make our simple PHP Puppet module flexible, we need to install the php5-fpm package and restart it when other modules are installed, but only when we use Nginx on our server. To achieve this, we can use a class parameter, which defaults to true. This lets us use the same module in servers that don't have a web server, and where we don't want to have the overhead of the FPM service, such as a server that runs background jobs or processing: class php ($nginx = true) { If the nginx parameter is true, then we need to install php5-fpm. Since this package is only installed when the flag is set to true, we cannot have PHP and its modules requiring or notifying the php-fpm package, as it may not be installed; so instead we need to have the php5-fpm package subscribe to these packages:    if ($nginx) {        package { "php5-fpm":          ensure => present,          subscribe => [Package['php5-dev'], Package['php5-curl'], Package['php5-gd'], Package['php5-imagick'], Package['php5-mcrypt'], Package['php5-mhash'], Package['php5-pspell'], Package['php5-json'], Package['php5-xmlrpc'], Package['php5-xsl'], Package['php5-mysql']]        }    } The rest of the manifest can then simply be the installation of the various PHP modules that are required for a typical LEMP setup:    package { "php5-dev":        ensure => present    }      package { "php5-curl":        ensure => present    }      package { "php5-gd":        ensure => present    }      package { "php5-imagick":        ensure => present    }      package { "php5-mcrypt":        ensure => present    }      package { "php5-mhash":        ensure => present    }      package { "php5-pspell":        ensure => present    }      package { "php5-xmlrpc":        ensure => present    }      package { "php5-xsl":        ensure => present    }      package { "php5-cli":        ensure => present    }      package { "php5-json":        ensure => present    } } Installing the MySQL module Because we are going to use the Puppet module for MySQL provided by Puppet Labs, installing the module is very straightforward; we simply add it as a git submodule to our project with the following command: git submodule add https://github.com/puppetlabs/puppetlabs-mysql.git provision/modules/mysql You might want to use a specific release for this module, as the code changes on a semi-regular basis. A stable release is available at https://github.com/puppetlabs/puppetlabs-mysql/releases/tag/3.1.0. Default manifest Finally, we need to pull these modules together, and install them when our machine is provisioned. To do this, we simply add the following modules to our vagrant.pp manifest file in the provision/manifests folder. Installing Nginx and PHP We need to include our nginx class and optionally provide a filename for the configuration file; if we don't provide one, the default will be used: class {    'nginx':        file => 'default' } Similarly for PHP, we need to include the class and in this case, pass an nginx parameter to ensure that it installs PHP5-FPM too: class {    'php':        nginx => true } Hostname configuration We should tell our Vagrant virtual machine what its hostname is by adding a host resource to our manifest: host { 'lemp-stack.local':    ip => '127.0.0.1',    host_aliases => 'localhost', } E-mail sending services Because some of our projects might involve sending e-mails, we should install e-mail sending services on our virtual machine. As these are simply two packages, it makes more sense to include them in our Vagrant manifest, as opposed to their own modules: package { "postfix":    ensure => present }   package { "mailutils":    ensure => present } MySQL configuration Because the MySQL module is very flexible and manages all aspects of MySQL, there is quite a bit for us to configure. We need to perform the following steps: Create a database. Create a user. Give the user permission to use the database (grants). Configure the MySQL root password. Install the MySQL client. Install the MySQL client bindings for PHP. The MySQL server class has a range of parameters that can be passed to configure it, including databases, users, and grants. So, first, we need to define what the databases, users, and grants are that we want to be configured: $databases = { 'lemp' => {    ensure => 'present',    charset => 'utf8' }, }   $users = { 'lemp@localhost' => {    ensure                   => 'present',    max_connections_per_hour => '0',    max_queries_per_hour     => '0',    max_updates_per_hour     => '0',    max_user_connections     => '0',    password_hash           => 'MySQL-Password-Hash', }, } The password_hash parameter here is for a hash generated by MySQL. You can generate a password hash by connecting to an existing MySQL instance and running a query such as SELECT PASSWORD('password'). The grant maps our user and database and specifies what permissions the user can perform on that database when connecting from a particular host (in this case, localhost—so from the virtual machine itself): $grants = { 'lemp@localhost/lemp.*' => {    ensure     => 'present',    options   => ['GRANT'],    privileges => ['ALL'],    table      => 'lemp.*',    user       => 'lemp@localhost', }, } We then pass these values to the MySQL server class. We also provide a root password for MySQL (unlike earlier, this is provided in plain text), and we can override the options from the MySQL configuration file. This is unlike our own Nginx module that provides a full file—in this instance, the MySQL module provides a template configuration file and the changes are replaced in that template to create a configuration file: class { '::mysql::server': root_password   => 'lemp-root-password', override_options => { 'mysqld' => { 'max_connections' => '1024' } }, databases => $databases, users => $users, grants => $grants, restart => true } As we will have a web server running on this machine, which needs to connect to this database server, we also need the client library and the client bindings for PHP, so that we can include them too: include '::mysql::client'   class { '::mysql::bindings': php_enable => true } Launching the virtual machine In order to launch our new virtual machine, we simply need to run the following command: Vagrant up We should now see our VM boot and the various Puppet phases execute. If all goes well, we should see no errors in this process. Summary In this article, we learned about the steps involved in creating a brand new Vagrant project, configuring it to integrate with our host machine, and setting up a standard LEMP stack using the Puppet provisioning tool. Now you should have a basic understanding of Vagrant and how to use it to ensure that your software projects are managed more effectively! Resources for Article: Further resources on this subject: Android Virtual Device Manager [article] Speeding Vagrant Development With Docker [article] Hyper-V Basics [article]
Read more
  • 0
  • 0
  • 13555

article-image-sharing-your-story
Packt
10 Mar 2015
3 min read
Save for later

Sharing Your Story

Packt
10 Mar 2015
3 min read
In this article by Ashley Chiasson, author of the book Articulate Storyline Essentials, we will see how to preview your story. (For more resources related to this topic, see here.) Previewing your story Previewing a story might sound like a straightforward concept, and it is, but Storyline gives you a ton of different previewing options, and you can pick and choose what works best for you! There are two main ways for you to preview an entire story. The most straightforward way of previewing a story is to select the Preview button from the Home tab. The other way to preview an entire story is to select the Preview icon on the bottom pane of the Storyline interface. You can also use the shortcut key F12 to preview an entire story. Once you choose to preview the full story, the Preview menu will appear. Here you can go through the story as your audience would and make any necessary adjustments prior to publishing the story. Within the Preview menu, you can close the preview; select individual slides; replay a particular slide, scene, or the entire project; and edit an individual slide. Maybe you only want to preview a particular slide or scene. In this instance, you'll want to select the drop-down icon on the Preview button on the Home tab, and then select whether you want to preview This Slide or This Scene. To preview the selected slide, you can use the shortcut key Ctrl + F12. To preview the selected scene, you can use the shortcut key Shift + F12. These options are fantastic and will save you a lot of preview-generating time, particularly when you have a slide- or scene-heavy story and don't want to go through the motions of previewing the entire story each and every time you wish to see a certain piece of the story. It is important to note that not all content within Storyline is available during preview. These items include hyperlinks, imported interactions (for example, from Articulate Engage), web objects, videos from external websites, and course completion/tracking status. Once you have selected Preview, you will be provided with the Preview menu. This menu allows you to do several things: Close the preview Select a different slide (if previewing the entire story or a scene) Replay the slide, scene, or entire course Edit the selected slide within Slide View Once you have previewed your story and have determined that everything is as you want it to be, you're ready to customize your Storyline player and publish! Summary This article explained how to preview your story. Storyline makes it easy to customize your learners' experience and share your story. Previewing your story allows you to streamline your development; without a preview feature, you would have to publish every single time you wanted to see a slide—no one has time for that! You should now feel comfortable working with the player customization options, so let your imagination flow and create a custom player for your story! If you're looking to dig a bit deeper into Articulate Storyline's capabilities, please check out Learning Articulate Storyline by Stephanie Harnett, and stay tuned for Mastering Articulate Storyline by Ashley Chiasson (slated for release in mid 2015), where you'll learn all about pushing Storyline's features and functionality to the absolute limits! Resources for Article: Further resources on this subject: Creating Your Course with Presenter [article] Rapid Development [article] Moodle for Online Communities [article]
Read more
  • 0
  • 0
  • 973

article-image-pricing-double-no-touch-option
Packt
10 Mar 2015
19 min read
Save for later

Pricing the Double-no-touch option

Packt
10 Mar 2015
19 min read
In this article by Balázs Márkus, coauthor of the book Mastering R for Quantitative Finance, you will learn about pricing and life of Double-no-touch (DNT) option. (For more resources related to this topic, see here.) A Double-no-touch (DNT) option is a binary option that pays a fixed amount of cash at expiry. Unfortunately, the fExoticOptions package does not contain a formula for this option at present. We will show two different ways to price DNTs that incorporate two different pricing approaches. In this section, we will call the function dnt1, and for the second approach, we will use dnt2 as the name for the function. Hui (1996) showed how a one-touch double barrier binary option can be priced. In his terminology, "one-touch" means that a single trade is enough to trigger the knock-out event, and "double barrier" binary means that there are two barriers and this is a binary option. We call this DNT as it is commonly used on the FX markets. This is a good example for the fact that many popular exotic options are running under more than one name. In Haug (2007a), the Hui-formula is already translated into the generalized framework. S, r, b, s, and T have the same meaning. K means the payout (dollar amount) while L and U are the lower and upper barriers. Where Implementing the Hui (1996) function to R starts with a big question mark: what should we do with an infinite sum? How high a number should we substitute as infinity? Interestingly, for practical purposes, small number like 5 or 10 could often play the role of infinity rather well. Hui (1996) states that convergence is fast most of the time. We are a bit skeptical about this since a will be used as an exponent. If b is negative and sigma is small enough, the (S/L)a part in the formula could turn out to be a problem. First, we will try with normal parameters and see how quick the convergence is: dnt1 <- function(S, K, U, L, sigma, T, r, b, N = 20, ploterror = FALSE){    if ( L > S | S > U) return(0)    Z <- log(U/L)    alpha <- -1/2*(2*b/sigma^2 - 1)    beta <- -1/4*(2*b/sigma^2 - 1)^2 - 2*r/sigma^2    v <- rep(0, N)    for (i in 1:N)        v[i] <- 2*pi*i*K/(Z^2) * (((S/L)^alpha - (-1)^i*(S/U)^alpha ) /            (alpha^2+(i*pi/Z)^2)) * sin(i*pi/Z*log(S/L)) *              exp(-1/2 * ((i*pi/Z)^2-beta) * sigma^2*T)    if (ploterror) barplot(v, main = "Formula Error");    sum(v) } print(dnt1(100, 10, 120, 80, 0.1, 0.25, 0.05, 0.03, 20, TRUE)) The following screenshot shows the result of the preceding code: The Formula Error chart shows that after the seventh step, additional steps were not influencing the result. This means that for practical purposes, the infinite sum can be quickly estimated by calculating only the first seven steps. This looks like a very quick convergence indeed. However, this could be pure luck or coincidence. What about decreasing the volatility down to 3 percent? We have to set N as 50 to see the convergence: print(dnt1(100, 10, 120, 80, 0.03, 0.25, 0.05, 0.03, 50, TRUE)) The preceding command gives the following output: Not so impressive? 50 steps are still not that bad. What about decreasing the volatility even lower? At 1 percent, the formula with these parameters simply blows up. First, this looks catastrophic; however, the price of a DNT was already 98.75 percent of the payout when we used 3 percent volatility. Logic says that the DNT price should be a monotone-decreasing function of volatility, so we already know that the price of the DNT should be worth at least 98.75 percent if volatility is below 3 percent. Another issue is that if we choose an extreme high U or extreme low L, calculation errors emerge. However, similar to the problem with volatility, common sense helps here too; the price of a DNT should increase if we make U higher or L lower. There is still another trick. Since all the problem comes from the a parameter, we can try setting b as 0, which will make a equal to 0.5. If we also set r to 0, the price of a DNT converges into 100 percent as the volatility drops. Anyway, whenever we substitute an infinite sum by a finite sum, it is always good to know when it will work and when it will not. We made a new code that takes into consideration that convergence is not always quick. The trick is that the function calculates the next step as long as the last step made any significant change. This is still not good for all the parameters as there is no cure for very low volatility, except that we accept the fact that if implied volatilities are below 1 percent, than this is an extreme market situation in which case DNT options should not be priced by this formula: dnt1 <- function(S, K, U, L, sigma, Time, r, b) { if ( L > S | S > U) return(0) Z <- log(U/L) alpha <- -1/2*(2*b/sigma^2 - 1) beta <- -1/4*(2*b/sigma^2 - 1)^2 - 2*r/sigma^2 p <- 0 i <- a <- 1 while (abs(a) > 0.0001){    a <- 2*pi*i*K/(Z^2) * (((S/L)^alpha - (-1)^i*(S/U)^alpha ) /      (alpha^2 + (i *pi / Z)^2) ) * sin(i * pi / Z * log(S/L)) *        exp(-1/2*((i*pi/Z)^2-beta) * sigma^2 * Time)    p <- p + a    i <- i + 1 } p } Now that we have a nice formula, it is possible to draw some DNT-related charts to get more familiar with this option. Later, we will use a particular AUDUSD DNT option with the following parameters: L equal to 0.9200, U equal to 0.9600, K (payout) equal to USD 1 million, T equal to 0.25 years, volatility equal to 6 percent, r_AUD equal to 2.75 percent, r_USD equal to 0.25 percent, and b equal to -2.5 percent. We will calculate and plot all the possible values of this DNT from 0.9200 to 0.9600; each step will be one pip (0.0001), so we will use 2,000 steps. The following code plots a graph of price of underlying: x <- seq(0.92, 0.96, length = 2000) y <- z <- rep(0, 2000) for (i in 1:2000){    y[i] <- dnt1(x[i], 1e6, 0.96, 0.92, 0.06, 0.25, 0.0025, -0.0250)    z[i] <- dnt1(x[i], 1e6, 0.96, 0.92, 0.065, 0.25, 0.0025, -0.0250) } matplot(x, cbind(y,z), type = "l", lwd = 2, lty = 1,    main = "Price of a DNT with volatility 6% and 6.5% ", cex.main = 0.8, xlab = "Price of underlying" ) The following output is the result of the preceding code: It can be clearly seen that even a small change in volatility can have a huge impact on the price of a DNT. Looking at this chart is an intuitive way to find that vega must be negative. Interestingly enough even just taking a quick look at this chart can convince us that the absolute value of vega is decreasing if we are getting closer to the barriers. Most end users think that the biggest risk is when the spot is getting close to the trigger. This is because end users really think about binary options in a binary way. As long as the DNT is alive, they focus on the positive outcome. However, for a dynamic hedger, the risk of a DNT is not that interesting when the value of the DNT is already small. It is also very interesting that since the T-Bill price is independent of the volatility and since the DNT + DOT = T-Bill equation holds, an increasing volatility will decrease the price of the DNT by the exact same amount just like it will increase the price of the DOT. It is not surprising that the vega of the DOT should be the exact mirror of the DNT. We can use the GetGreeks function to estimate vega, gamma, delta, and theta. For gamma we can use the GetGreeks function in the following way: GetGreeks <- function(FUN, arg, epsilon,...) {    all_args1 <- all_args2 <- list(...)    all_args1[[arg]] <- as.numeric(all_args1[[arg]] + epsilon)    all_args2[[arg]] <- as.numeric(all_args2[[arg]] - epsilon)    (do.call(FUN, all_args1) -        do.call(FUN, all_args2)) / (2 * epsilon) } Gamma <- function(FUN, epsilon, S, ...) {    arg1 <- list(S, ...)    arg2 <- list(S + 2 * epsilon, ...)    arg3 <- list(S - 2 * epsilon, ...)    y1 <- (do.call(FUN, arg2) - do.call(FUN, arg1)) / (2 * epsilon)    y2 <- (do.call(FUN, arg1) - do.call(FUN, arg3)) / (2 * epsilon)  (y1 - y2) / (2 * epsilon) } x = seq(0.9202, 0.9598, length = 200) delta <- vega <- theta <- gamma <- rep(0, 200)   for(i in 1:200){ delta[i] <- GetGreeks(FUN = dnt1, arg = 1, epsilon = 0.0001,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.02, -0.02) vega[i] <-   GetGreeks(FUN = dnt1, arg = 5, epsilon = 0.0005,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.0025, -0.025) theta[i] <- - GetGreeks(FUN = dnt1, arg = 6, epsilon = 1/365,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.0025, -0.025) gamma[i] <- Gamma(FUN = dnt1, epsilon = 0.0001, S = x[i], K =    1e6, U = 0.96, L = 0.92, sigma = 0.06, Time = 0.5, r = 0.02, b = -0.02) }   windows() plot(x, vega, type = "l", xlab = "S",ylab = "", main = "Vega") The following chart is the result of the preceding code: After having a look at the value chart, the delta of a DNT is also very close to intuitions; if we are coming close to the higher barrier, our delta gets negative, and if we are coming closer to the lower barrier, the delta gets positive as follows: windows() plot(x, delta, type = "l", xlab = "S",ylab = "", main = "Delta") This is really a non-convex situation; if we would like to do a dynamic delta hedge, we will lose money for sure. If the spot price goes up, the delta of the DNT decreases, so we should buy some AUDUSD as a hedge. However, if the spot price goes down, we should sell some AUDUSD. Imagine a scenario where AUDUSD goes up 20 pips in the morning and then goes down 20 pips in the afternoon. For a dynamic hedger, this means buying some AUDUSD after the price moved up and selling this very same amount after the price comes down. The changing of the delta can be described by the gamma as follows: windows() plot(x, gamma, type = "l", xlab = "S",ylab = "", main = "Gamma") Negative gamma means that if the spot goes up, our delta is decreasing, but if the spot goes down, our delta is increasing. This doesn't sound great. For this inconvenient non-convex situation, there is some compensation, that is, the value of theta is positive. If nothing happens, but one day passes, the DNT will automatically worth more. Here, we use theta as minus 1 times the partial derivative, since if (T-t) is the time left, we check how the value changes as t increases by one day: windows() plot(x, theta, type = "l", xlab = "S",ylab = "", main = "Theta") The more negative the gamma, the more positive our theta. This is how time compensates for the potential losses generated by the negative gamma. Risk-neutral pricing also implicates that negative gamma should be compensated by a positive theta. This is the main message of the Black-Scholes framework for vanilla options, but this is also true for exotics. See Taleb (1997) and Wilmott (2006). We already introduced the Black-Scholes surface before; now, we can go into more detail. This surface is also a nice interpretation of how theta and delta work. It shows the price of an option for different spot prices and times to maturity, so the slope of this surface is the theta for one direction and delta for the other. The code for this is as follows: BS_surf <- function(S, Time, FUN, ...) { n <- length(S) k <- length(Time) m <- matrix(0, n, k) for (i in 1:n) {    for (j in 1:k) {      l <- list(S = S[i], Time = Time[j], ...)      m[i,j] <- do.call(FUN, l)      } } persp3D(z = m, xlab = "underlying", ylab = "Time",    zlab = "option price", phi = 30, theta = 30, bty = "b2") } BS_surf(seq(0.92,0.96,length = 200), seq(1/365, 1/48, length = 200), dnt1, K = 1000000, U = 0.96, L = 0.92, r = 0.0025, b = -0.0250,    sigma = 0.2) The preceding code gives the following output: We can see what was already suspected; DNT likes when time is passing and the spot is moving to the middle of the (L,U) interval. Another way to price the Double-no-touch option Static replication is always the most elegant way of pricing. The no-arbitrage argument will let us say that if, at some time in the future, two portfolios have the same value for sure, then their price should be equal any time before this. We will show how double-knock-out (DKO) options could be used to build a DNT. We will need to use a trick; the strike price could be the same as one of the barriers. For a DKO call, the strike price should be lower than the upper barrier because if the strike price is not lower than the upper barrier, the DKO call would be knocked out before it could become in-the-money, so in this case, the option would be worthless as nobody can ever exercise it in-the-money. However, we can choose the strike price to be equal to the lower barrier. For a put, the strike price should be higher than the lower barrier, so why not make it equal to the upper barrier. This way, the DKO call and DKO put option will have a very convenient feature; if they are still alive, they will both expiry in-the-money. Now, we are almost done. We just have to add the DKO prices, and we will get a DNT that has a payout of (U-L) dollars. Since DNT prices are linear in the payout, we only have to multiply the result by K*(U-L): dnt2 <- function(S, K, U, L, sigma, T, r, b) {      a <- DoubleBarrierOption("co", S, L, L, U, T, r, b, sigma, 0,        0,title = NULL, description = NULL)    z <- a@price    b <- DoubleBarrierOption("po", S, U, L, U, T, r, b, sigma, 0,        0,title = NULL, description = NULL)    y <- b@price    (z + y) / (U - L) * K } Now, we have two functions for which we can compare the results: dnt1(0.9266, 1000000, 0.9600, 0.9200, 0.06, 0.25, 0.0025, -0.025) [1] 48564.59   dnt2(0.9266, 1000000, 0.9600, 0.9200, 0.06, 0.25, 0.0025, -0.025) [1] 48564.45 For a DNT with a USD 1 million contingent payout and an initial market value of over 48,000 dollars, it is very nice to see that the difference in the prices is only 14 cents. Technically, however, having a second pricing function is not a big help since low volatility is also an issue for dnt2. We will use dnt1 for the rest of the article. The life of a Double-no-touch option – a simulation How has the DNT price been evolving during the second quarter of 2014? We have the open-high-low-close type time series with five minute frequency for AUDUSD, so we know all the extreme prices: d <- read.table("audusd.csv", colClasses = c("character", rep("numeric",5)), sep = ";", header = TRUE) underlying <- as.vector(t(d[, 2:5])) t <- rep( d[,6], each = 4) n <- length(t) option_price <- rep(0, n)   for (i in 1:n) { option_price[i] <- dnt1(S = underlying[i], K = 1000000,    U = 0.9600, L = 0.9200, sigma = 0.06, T = t[i]/(60*24*365),      r = 0.0025, b = -0.0250) } a <- min(option_price) b <- max(option_price) option_price_transformed = (option_price - a) * 0.03 / (b - a) + 0.92   par(mar = c(6, 3, 3, 5)) matplot(cbind(underlying,option_price_transformed), type = "l",    lty = 1, col = c("grey", "red"),    main = "Price of underlying and DNT",    xaxt = "n", yaxt = "n", ylim = c(0.91,0.97),    ylab = "", xlab = "Remaining time") abline(h = c(0.92, 0.96), col = "green") axis(side = 2, at = pretty(option_price_transformed),    col.axis = "grey", col = "grey") axis(side = 4, at = pretty(option_price_transformed),    labels = round(seq(a/1000,1000,length = 7)), las = 2,    col = "red", col.axis = "red") axis(side = 1, at = seq(1,n, length=6),    labels = round(t[round(seq(1,n, length=6))]/60/24)) The following is the output for the preceding code: The price of a DNT is shown in red on the right axis (divided by 1000), and the actual AUDUSD price is shown in grey on the left axis. The green lines are the barriers of 0.9200 and 0.9600. The chart shows that in 2014 Q2, the AUDUSD currency pair was traded inside the (0.9200; 0.9600) interval; thus, the payout of the DNT would have been USD 1 million. This DNT looks like a very good investment; however, reality is just one trajectory out of an a priori almost infinite set. It could have happened differently. For example, on May 02, 2014, there were still 59 days left until expiry, and AUDUSD was traded at 0.9203, just three pips away from the lower barrier. At this point, the price of this DNT was only USD 5,302 dollars which is shown in the following code: dnt1(0.9203, 1000000, 0.9600, 0.9200, 0.06, 59/365, 0.0025, -0.025) [1] 5302.213 Compare this USD 5,302 to the initial USD 48,564 option price! In the following simulation, we will show some different trajectories. All of them start from the same 0.9266 AUDUSD spot price as it was on the dawn of April 01, and we will see how many of them stayed inside the (0.9200; 0.9600) interval. To make it simple, we will simulate geometric Brown motions by using the same 6 percent volatility as we used to price the DNT: library(matrixStats) DNT_sim <- function(S0 = 0.9266, mu = 0, sigma = 0.06, U = 0.96, L = 0.92, N = 5) {    dt <- 5 / (365 * 24 * 60)    t <- seq(0, 0.25, by = dt)    Time <- length(t)      W <- matrix(rnorm((Time - 1) * N), Time - 1, N)    W <- apply(W, 2, cumsum)    W <- sqrt(dt) * rbind(rep(0, N), W)    S <- S0 * exp((mu - sigma^2 / 2) * t + sigma * W )    option_price <- matrix(0, Time, N)      for (i in 1:N)        for (j in 1:Time)          option_price[j,i] <- dnt1(S[j,i], K = 1000000, U, L, sigma,              0.25-t[j], r = 0.0025,                b = -0.0250)*(min(S[1:j,i]) > L & max(S[1:j,i]) < U)      survivals <- sum(option_price[Time,] > 0)    dev.new(width = 19, height = 10)      par(mfrow = c(1,2))    matplot(t,S, type = "l", main = "Underlying price",        xlab = paste("Survived", survivals, "from", N), ylab = "")    abline( h = c(U,L), col = "blue")    matplot(t, option_price, type = "l", main = "DNT price",        xlab = "", ylab = "")} set.seed(214) system.time(DNT_sim()) The following is the output for the preceding code: Here, the only surviving trajectory is the red one; in all other cases, the DNT hits either the higher or the lower barrier. The line set.seed(214) grants that this simulation will look the same anytime we run this. One out of five is still not that bad; it would suggest that for an end user or gambler who does no dynamic hedging, this option has an approximate value of 20 percent of the payout (especially since the interest rates are low, the time value of money is not important). However, five trajectories are still too few to jump to such conclusions. We should check the DNT survivorship ratio for a much higher number of trajectories. The ratio of the surviving trajectories could be a good estimator of the a priori real-world survivorship probability of this DNT; thus, the end user value of it. Before increasing N rapidly, we should keep in mind how much time this simulation took. For my computer, it took 50.75 seconds for N = 5, and 153.11 seconds for N = 15. The following is the output for N = 15: Now, 3 out of 15 survived, so the estimated survivorship ratio is still 3/15, which is equal to 20 percent. Looks like this is a very nice product; the price is around 5 percent of the payout, while 20 percent is the estimated survivorship ratio. Just out of curiosity, run the simulation for N equal to 200. This should take about 30 minutes. The following is the output for N = 200: The results are shocking; now, only 12 out of 200 survive, and the ratio is only 6 percent! So to get a better picture, we should run the simulation for a larger N. The movie Whatever Works by Woody Allen (starring Larry David) is 92 minutes long; in simulation time, that is N = 541. For this N = 541, there are only 38 surviving trajectories, resulting in a survivorship ratio of 7 percent. What is the real expected survivorship ratio? Is it 20 percent, 6 percent, or 7 percent? We simply don't know at this point. Mathematicians warn us that the law of large numbers requires large numbers, where large is much more than 541, so it would be advisable to run this simulation for as large an N as time allows. Of course, getting a better computer also helps to do more N during the same time. Anyway, from this point of view, Hui's (1996) relatively fast converging DNT pricing formula gets some respect. Summary We started this article by introducing exotic options. In a brief theoretical summary, we explained how exotics are linked together. There are many types of exotics. We showed one possible way of classification that is consistent with the fExoticOptions package. We showed how the Black-Scholes surface (a 3D chart that contains the price of a derivative dependent on time and the underlying price) can be constructed for any pricing function. Resources for Article: Further resources on this subject: What is Quantitative Finance? [article] Learning Option Pricing [article] Derivatives Pricing [article]
Read more
  • 0
  • 0
  • 8088

article-image-evidence-acquisition-and-analysis-icloud
Packt
09 Mar 2015
10 min read
Save for later

Evidence Acquisition and Analysis from iCloud

Packt
09 Mar 2015
10 min read
This article by Mattia Epifani and Pasquale Stirparo, the authors of the book, Learning iOS Forensics, introduces the cloud system provided by Apple to all its users through which they can save their backups and other files on remote servers. In the first part of this article, we will show you the main characteristics of such a service and then the techniques to create and recover a backup from iCloud. (For more resources related to this topic, see here.) iCloud iCloud is a free cloud storage and cloud computing service designed by Apple to replace MobileMe. The service allows users to store data (music, pictures, videos, and applications) on remote servers and share them on devices with iOS 5 or later operating systems, on Apple computers running OS X Lion or later, or on a PC with Windows Vista or later. Similar to its predecessor, MobileMe, iCloud allows users to synchronize data between devices (e-mail, contacts, calendars, bookmarks, notes, reminders, iWork documents, and so on), or to make a backup of an iOS device (iPhone, iPad, or iPod touch) on remote servers rather than using iTunes and your local computer. The iCloud service was announced on June 6, 2011 during the Apple Worldwide Developers Conference but became operational to the public from October 12, 2011. The MobileMe service was disabled as a result on June 30, 2012 and all users were transferred to the new environment. In July 2013, iCloud had more than 320 million users. Each iCloud account has 5 GB of free storage for the owners of iDevice with iOS 5 or later and Mac users with Lion or later. Purchases made through iTunes (music, apps, videos, movies, and so on) are not calculated in the count of the occupied space and can be stored in iCloud and downloaded on all devices associated with the Apple ID of the user. Moreover, the user has the option to purchase additional storage in denominations of 20, 200, 500, or 1,000 GB. Access to the iCloud service can be made through integrated applications on devices such as iDevice and Mac computers. Also, to synchronize data on a PC, you need to install the iCloud Control Panel application, which can be downloaded for free from the Apple website. To synchronize contacts, e-mails, and appointments in the calendar on the PC, the user must have Microsoft Outlook 2007 or 2010, while for the synchronization of bookmarks they need Internet Explorer 9 or Safari. iDevice backup on iCloud iCloud allows users to make online backups of iDevices so that they will be able to restore their data even on a different iDevice (for example, in case of replacement of devices). The choice of which backup mode to use can be done directly in the settings of the device or through iTunes when the device is connected to the PC or Mac, as follows: Once the user has activated the service, the device automatically backs up every time the following scenarios occur: It is connected to the power cable It is connected to a Wi-Fi network Its screen is locked iCloud online backups are incremental through subsequent snapshots and each snapshot is the current status of the device at the time of its creation. The structure of the backup stored on iCloud is entirely analogous to that of the backup made with iTunes. iDevice backup acquisition Backups that are made online are, to all intents and purposes, not encrypted. Technically, they are encrypted, but the encryption key is stored with the encrypted files. This choice was made by Apple in order for users to be able to restore the backup on a different device than the one that created it. Currently, the acquisition of the iCloud backup is supported by two types of commercial software (Elcomsoft Phone Password Breaker (EPPB) and Wondershare Dr.Fone) and one open source tool (iLoot, which is available at https://github.com/hackappcom/iloot). The interesting aspect is that the same technique was used in the iCloud hack performed in 2014, when personal photos and videos were hacked from the respective iCloud services and released over the Internet (more information is available at http://en.wikipedia.org/wiki/2014_celebrity_photo_hack). Though there is no such strong evidence yet that describes how the hack was made, it is believed that Apple's Find my iPhone service was responsible for this and Apple did not implement any security measure to lockdown account after a particular number of wrong login attempts, which directly arises the possibility of exploitation (brute force, in this case). The tool used to brute force the iCloud password, named iBrute, is still available at https://github.com/hackappcom/ibrute, but has not been working since January 2015. Case study – iDevice backup acquisition and EPPB with usernames and passwords As reported on the software manufacturer's website, EPPB allows the acquisition of data stored on a backup online. Moreover, online backups can be acquired without having the original iOS device in hand. All that's needed to access online backups stored in the cloud service are the original user's credentials, including their Apple ID, accompanied with the corresponding password. The login credentials in iCloud can be retrieved as follows: Using social engineering techniques From a PC (or a Mac) on which they are stored: iTunes Password Decryptor (http://securityxploded.com/) WebBrowserPassView (http://www.nirsoft.net/) Directly from the device (iPhone/iPad/iPod touch) by extracting the credentials stored in the keychain Once credentials have been extracted, the download of the backup is very simple. Follow the step-by-step instructions provided in the program by entering username and password in Download backup from iCloud dialog by going to Tools | Apple | Download backup from iCloud | Password and clicking on Sign in, as shown in the following screenshot: At this point, the software displays a screen that shows all the backups present in the user account and allows you to download data. It is important to notice the possibility of using the following two options: Restore original file names: If enabled, this option interprets the contents of the Manifest.mbdb file, rebuilding the backup with the same tree structure into domains and sub-domains. If the investigator intends to carry out the analysis with traditional software for data extraction from backups, it is recommended that you disable this option because, if enabled, that software will no longer be able to parse the backup. Download only specific data: This option is very useful when the investigator needs to download only some specific information. Currently, the software supports Call history, Messages, Attachments, Contacts, Safari data, Google data, Calendar, Notes, Info & Settings, Camera Roll, Social Communications, and so on. In this case, the Restore original file names option is automatically activated and it cannot be disabled. Once you have chosen the destination folder for the download, the backup starts. The time required to download depends on the size of the storage space available to the user and the number of snapshots stored within that space. Case study – iDevice backup acquisition and EPPB with authentication token The Forensic edition of Phone Password Breaker from Elcomsoft is a tool that gives a digital forensics examiner the power to obtain iCloud data without having the original Apple ID and password. This kind of access is made possible via the use of an authentication token extracted from the user's computer. These tokens can be obtained from any suspect's computer where iCloud Control Panel is installed. In order to obtain the token, the user must have been logged in to iCloud Control Panel on that PC at the time of acquisition, so it means that the acquisition can be performed only in a live environment or in a virtualized image of the suspect computer connected to Internet. More information about this tool is available at http://www.elcomsoft.com/eppb.html. To extract the authentication token from the iCloud Control Panel, the analyst needs to use a small executable file on the machine called atex.exe. The executable file can be launched from an external pen drive during a live forensics activity. Open Command Prompt and launch the atex –l command to list all the local iCloud users as follows: Then, launch atex.exe again with the getToken parameter (-t) and enter the username of the specific local Windows user and the password for this user's Windows account. A file called icloud_token_<timestamp>.txt will be created in the directory from which atex.exe was launched. The file contains the Apple ID of the current iCloud Control Panel user and its authentication token. Now that the analyst has the authentication token, they can start the EPPB software and navigate to Tools | Apple | Download backup from iCloud | Token and copy and paste the token (be careful to copy the entire second row from the .txt file created by the atex.exe tool) into the software and click on Sign in, as shown in the following screenshot. At this point, the software shows the screen for downloading the iCloud backups stored within the iCloud space of the user, in a similar way as you provide a username and password. The procedure for the Mac OS X version is exactly the same. Just launch the atex Mac version from a shell and follow the steps shown previously in the Windows environment: sudo atex –l: This command is used to get the list of all iCloud users. sudo atex –t –u <username>: This command is used to get the authentication token for a specific user. You will need to enter the user's system password when prompted. Case study – iDevice backup acquisition with iLoot The same activity can be performed using the open source tool called iLoot (available at https://github.com/hackappcom/iloot). It requires Python and some dependencies. We suggest checking out the website for the latest version and requirements. By accessing the help (iloot.py –h), we can see the various available options. We can choose the output folder if we want to download one specified snapshot, if we want the backup being downloaded in original iTunes format or with the Domain-style directories, if we want to download only specific information (for example, call history, SMS, photos, and so on), or only a specific domain, as follows: To download the backup, you just only need to insert the account credentials, as shown in the following screenshot: At the end of the process, you will find the backup in the output folder (the default folder's name is /output). Summary In this article, we introduced the iCloud service provided by Apple to store files on remote servers and backup their iDevice devices. In particular, we showed the techniques to download the backups stored on iCloud when you know the user credentials (Apple ID and password) and when you have access to a computer where it is installed and use the iCloud Control Panel software. Resources for Article: Further resources on this subject: Introduction to Mobile Forensics [article] Processing the Case [article] BackTrack Forensics [article]
Read more
  • 0
  • 0
  • 9901
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-creating-and-managing-vmfs-datastores
Packt
05 Mar 2015
5 min read
Save for later

Creating and Managing VMFS Datastores

Packt
05 Mar 2015
5 min read
In this article by Abhilash G B, author of VMware vSphere 5.5 Cookbook, we will learn how to expand or grow a VMFS datastore with the help of two methods: using the Increase Datastore Capacity wizard and using the ESXi CLI tool vmkfstools. (For more resources related to this topic, see here.) Expanding/growing a VMFS datastore It is likely that you would run out of free space on a VMFS volume over time as you end up deploying more and more VMs on it, especially in a growing environment. Fortunately, accommodating additional free space on a VMFS volume is possible. However, this requires that the LUN either has free space left on it or it has been expanded/resized in the storage array. The procedure to resize/expand the LUN in the storage array differs from vendor to vendor, we assume that the LUN either has free space on it or has already been expanded. The following flowchart provides a high-level overview of the procedure: How to do it... We can expand a VMFS datastore using two methods: Using the Increase Datastore Capacity wizard Using the ESXi CLI tool vmkfstools Before attempting to grow the VMFS datastore, issue a rescan on the HBAs to ensure that the ESXi sees the increased size of the LUN. Also, make note of the NAA ID, LUN number, and the size of the LUN backing the VMFS datastore that you are trying to expand/grow. Using the Increase Datastore Capacity wizard We will go through the following process to expand an existing VMFS datastore using the vSphere Web Client's GUI. Use the vSphere Web Client to connect to vCenter Server. Navigate to Home | Storage. With the data center object selected, navigate to Related Objects | Datastores: Right-click on the datastore you intend to expand and click on Increase Datastore Capacity...:  Select the LUN backing the datastore and click on Next:  Use the Partition Configuration drop-down menu to select the free space left in DS01 to expand the datastore: On the Ready to Complete screen, review the information and click on Finish to expand the datastore: Using the ESXi CLI tool vmkfstools A VMFS volume can also be expanded using the vmkfstools tool. As with the use of any command-line tool, it can sometimes become difficult to remember the process if you are not doing it often enough to know it like the back of your hand. Hence, I have devised the following flowchart to provide an overview of the command-line steps that needs to be taken to expand a VMFS volume: Now that we know what the order of the steps would be from the flowchart, let's delve right into the procedure: Identify the datastore you want to expand using the following command, and make a note of the corresponding NAA ID: esxcli storage vmfs extent list Here, the NAA ID corresponding to the DS01 datastore is naa.6000eb30adde4c1b0000000000000083. Verify if the ESXi sees the new size of the LUN backing the datastore by issuing the following command: esxcli storage core device list -d naa.6000eb30adde4c1b0000000000000083 Get the current partition table information using the following command:Syntax: partedUtil getptbl "Devfs Path of the device" Command: partedUtil getptbl /vmfs/devices/disks/ naa.6000eb30adde4c1b0000000000000083 Calculate the new last sector value. Moving the last sector value closer to the total sector value is necessary in order to use additional space.The formula to calculate the last sector value is as follows: (Total number of sectors) – (Start sector value) = Last sector value So, the last sector value to be used is as follows: (31457280 – 2048) = 31455232 Resize the VMFS partition by issuing the following command:Syntax: partedUtil resize "Devfs Path" PartitionNumber NewStartingSector NewEndingSector Command: partedUtil resize /vmfs/devices/disks/ naa.6000eb30adde4c1b0000000000000083 1 2048 31455232 Issue the following command to grow the VMFS filesystem:Command syntax: vmkfstools –-growfs <Devfs Path: Partition Number> <Same Devfs Path: Partition Number> Command: vmkfstools --growfs /vmfs/devices/disks/ naa.6000eb30adde4c1b0000000000000083:1 /vmfs/devices/disks/ naa.6000eb30adde4c1b0000000000000083:1 Once the command is executed successfully, it will take you back to the root prompt. There is no on-screen output for this command. How it works... Expanding a VMFS datastore refers to the act of increasing its size within its own extent. This is possible only if there is free space available immediately after the extent. The maximum size of a LUN is 64 TB, so the maximum size of a VMFS volume is also 64 TB. The virtual machines hosted on this VMFS datastore can continue to be in the power-on state while this task is being accomplished. Summary This article walks you through the process of creating and managing VMFS datastores. Resources for Article: Further resources on this subject: Introduction Vsphere Distributed Switches? [article] Introduction Vmware Horizon Mirage [article] Backups Vmware View Infrastructure [article]
Read more
  • 0
  • 0
  • 5816

article-image-learning-random-forest-using-mahout
Packt
05 Mar 2015
11 min read
Save for later

Learning Random Forest Using Mahout

Packt
05 Mar 2015
11 min read
In this article by Ashish Gupta, author of the book Learning Apache Mahout Classification, we will learn about Random forest, which is one of the most popular techniques in classification. It starts with a machine learning technique called decision tree. In this article, we will explore the following topics: Decision tree Random forest Using Mahout for Random forest (For more resources related to this topic, see here.) Decision tree A decision tree is used for classification and regression problems. In simple terms, it is a predictive model that uses binary rules to calculate the target variable. In a decision tree, we use an iterative process of splitting the data into partitions, then we split it further on branches. As in other classification model creation processes, we start with the training dataset in which target variables or class labels are defined. The algorithm tries to break all the records in training datasets into two parts based on one of the explanatory variables. The partitioning is then applied to each new partition, and this process is continued until no more partitioning can be done. The core of the algorithm is to find out the rule that determines the initial split. There are algorithms to create decision trees, such as Iterative Dichotomiser 3 (ID3), Classification and Regression Tree (CART), Chi-squared Automatic Interaction Detector (CHAID), and so on. A good explanation for ID3 can be found at http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.html. Forming the explanatory variables to choose the best splitter in a node, the algorithm considers each variable in turn. Every possible split is considered and tried, and the best split is the one that produces the largest decrease in diversity of the classification label within each partition. This is repeated for all variables, and the winner is chosen as the best splitter for that node. The process is continued in the next node until we reach a node where we can make the decision. We create a decision tree from a training dataset so it can suffer from the overfitting problem. This behavior creates a problem with real datasets. To improve this situation, a process called pruning is used. In this process, we remove the branches and leaves of the tree to improve the performance. Algorithms used to build the tree work best at the starting or root node since all the information is available there. Later on, with each split, data is less and towards the end of the tree, a particular node can show patterns that are related to the set of data which is used to split. These patterns create problems when we use them to predict the real dataset. Pruning methods let the tree grow and remove the smaller branches that fail to generalize. Now take an example to understand the decision tree. Consider we have a iris flower dataset. This dataset is hugely popular in the machine learning field. It was introduced by Sir Ronald Fisher. It contains 50 samples from each of three species of iris flower (Iris setosa, Iris virginica, and Iris versicolor). The four explanatory variables are the length and width of the sepals and petals in centimeters, and the target variable is the class to which the flower belongs. As you can see in the preceding diagram, all the groups were earlier considered as Sentosa species and then the explanatory variable and petal length were further used to divide the groups. At each step, the calculation for misclassified items was also done, which shows how many items were wrongly classified. Moreover, the petal width variable was taken into account. Usually, items at leaf nodes are correctly classified. Random forest The Random forest algorithm was developed by Leo Breiman and Adele Cutler. Random forests grow many classification trees. They are an ensemble learning method for classification and regression that constructs a number of decision trees at training time and also outputs the class that is the mode of the classes outputted by individual trees. Single decision trees show the bias–variance tradeoff. So they usually have high variance or high bias. The following are the parameters in the algorithm: Bias: This is an error caused by an erroneous assumption in the learning algorithm Variance: This is an error that ranges from sensitivity to small fluctuations in the training set Random forests attempt to mitigate this problem by averaging to find a natural balance between two extremes. A Random forest works on the idea of bagging, which is to average noisy and unbiased models to create a model with low variance. A Random forest algorithm works as a large collection of decorrelated decision trees. To understand the idea of a Random forest algorithm, let's work with an example. Consider we have a training dataset that has lots of features (explanatory variables) and target variables or classes: We create a sample set from the given dataset: A different set of random features were taken into account to create the random sub-dataset. Now, from these sub-datasets, different decision trees will be created. So actually we have created a forest of the different decision trees. Using these different trees, we will create a ranking system for all the classifiers. To predict the class of a new unknown item, we will use all the decision trees and separately find out which class these trees are predicting. See the following diagram for a better understanding of this concept: Different decision trees to predict the class of an unknown item In this particular case, we have four different decision trees. We predict the class of an unknown dataset with each of the trees. As per the preceding figure, the first decision tree provides class 2 as the predicted class, the second decision tree predicts class 5, the third decision tree predicts class 5, and the fourth decision tree predicts class 3. Now, a Random forest will vote for each class. So we have one vote each for class 2 and class 3 and two votes for class 5. Therefore, it has decided that for the new unknown dataset, the predicted class is class 5. So the class that gets a higher vote is decided for the new dataset. A Random forest has a lot of benefits in classification and a few of them are mentioned in the following list: Combination of learning models increases the accuracy of the classification Runs effectively on large datasets as well The generated forest can be saved and used for other datasets as well Can handle a large amount of explanatory variables Now that we have understood the Random forest theoretically, let's move on to Mahout and use the Random forest algorithm, which is available in Apache Mahout. Using Mahout for Random forest Mahout has implementation for the Random forest algorithm. It is very easy to understand and use. So let's get started. Dataset We will use the NSL-KDD dataset. Since 1999, KDD'99 has been the most widely used dataset for the evaluation of anomaly detection methods. This dataset is prepared by S. J. Stolfo and is built based on the data captured in the DARPA'98 IDS evaluation program (R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman, "Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation," discex, vol. 02, p. 1012, 2000). DARPA'98 is about 4 GB of compressed raw (binary) tcp dump data of 7 weeks of network traffic, which can be processed into about 5 million connection records, each with about 100 bytes. The two weeks of test data have around 2 million connection records. The KDD training dataset consists of approximately 4,900,000 single connection vectors, each of which contains 41 features and is labeled as either normal or an attack, with exactly one specific attack type. NSL-KDD is a dataset suggested to solve some of the inherent problems of the KDD'99 dataset. You can download this dataset from http://nsl.cs.unb.ca/NSL-KDD/. We will download the KDDTrain+_20Percent.ARFF and KDDTest+.ARFF datasets. In KDDTrain+_20Percent.ARFF and KDDTest+.ARFF, remove the first 44 lines (that is, all lines starting with @attribute). If this is not done, we will not be able to generate a descriptor file. Steps to use the Random forest algorithm in Mahout The steps to implement the Random forest algorithm in Apache Mahout are as follows: Transfer the test and training datasets to hdfs using the following commands: hadoop fs -mkdir /user/hue/KDDTrainhadoop fs -mkdir /user/hue/KDDTesthadoop fs –put /tmp/KDDTrain+_20Percent.arff /user/hue/KDDTrainhadoop fs –put /tmp/KDDTest+.arff /user/hue/KDDTest Generate the descriptor file. Before you build a Random forest model based on the training data in KDDTrain+.arff, a descriptor file is required. This is because all information in the training dataset needs to be labeled. From the labeled dataset, the algorithm can understand which one is numerical and categorical. Use the following command to generate descriptor file: hadoop jar $MAHOUT_HOME/core/target/mahout-core-xyz.job.jarorg.apache.mahout.classifier.df.tools.Describe-p /user/hue/KDDTrain/KDDTrain+_20Percent.arff-f /user/hue/KDDTrain/KDDTrain+.info-d N 3 C 2 N C 4 N C 8 N 2 C 19 N L Jar: Mahout core jar (xyz stands for version). If you have directly installed Mahout, it can be found under the /usr/lib/mahout folder. The main class Describe is used here and it takes three parameters: The p path for the data to be described. The f location for the generated descriptor file. d is the information for the attribute on the data. N 3 C 2 N C 4 N C 8 N 2 C 19 N L defines that the dataset is starting with a numeric (N), followed by three categorical attributes, and so on. In the last, L defines the label. The output of the previous command is shown in the following screenshot: Build the Random forest using the following command: hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-xyz-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest-Dmapred.max.split.size=1874231 -d /user/hue/KDDTrain/KDDTrain+_20Percent.arff-ds /user/hue/KDDTrain/KDDTrain+.info-sl 5 -p -t 100 –o /user/hue/ nsl-forest Jar: Mahout example jar (xyz stands for version). If you have directly installed Mahout, it can be found under the /usr/lib/mahout folder. The main class build forest is used to build the forest with other arguments, which are shown in the following list: Dmapred.max.split.size indicates to Hadoop the maximum size of each partition. d stands for the data path. ds stands for the location of the descriptor file. sl is a variable to select randomly at each tree node. Here, each tree is built using five randomly selected attributes per node. p uses partial data implementation. t stands for the number of trees to grow. Here, the commands build 100 trees using partial implementation. o stands for the output path that will contain the decision forest. In the end, the process will show the following result: Use this model to classify the new dataset: hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-xyz-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest-i /user/hue/KDDTest/KDDTest+.arff-ds /user/hue/KDDTrain/KDDTrain+.info -m /user/hue/nsl-forest -a –mr-o /user/hue/predictions Jar: Mahout example jar (xyz stands for version). If you have directly installed Mahout, it can be found under the /usr/lib/mahout folder. The class to test the forest has the following parameters: I indicates the path for the test data ds stands for the location of the descriptor file m stands for the location of the generated forest from the previous command a informs to run the analyzer to compute the confusion matrix mr informs Hadoop to distribute the classification o stands for the location to store the predictions in The job provides the following confusion matrix: So, from the confusion matrix, it is clear that 9,396 instances were correctly classified and 315 normal instances were incorrectly classified as anomalies. And the accuracy percentage is 77.7635 (correctly classified instances by the model / classified instances). The output file in the prediction folder contains the list where 0 and 1. 0 defines the normal dataset and 1 defines the anomaly. Summary In this article, we discussed the Random forest algorithm. We started our discussion by understanding the decision tree and continued with an understanding of the Random forest. We took up the NSL-KDD dataset, which is used to build predictive systems for cyber security. We used Mahout to build the Random forest tree, and used it with the test dataset and generated the confusion matrix and other statistics for the output. Resources for Article: Further resources on this subject: Implementing the Naïve Bayes classifier in Mahout [article] About Cassandra [article] Tuning Solr JVM and Container [article]
Read more
  • 0
  • 1
  • 4176

article-image-hadoop-and-mapreduce
Packt
05 Mar 2015
43 min read
Save for later

Hadoop and MapReduce

Packt
05 Mar 2015
43 min read
In this article by the author, Thilina Gunarathne, of the book, Hadoop MapReduce v2 Cookbook - Second Edition, we will learn about Hadoop and MadReduce. We are living in the era of big data, where exponential growth of phenomena such as web, social networking, smartphones, and so on are producing petabytes of data on a daily basis. Gaining insights from analyzing these very large amounts of data has become a must-have competitive advantage for many industries. However, the size and the possibly unstructured nature of these data sources make it impossible to use traditional solutions such as relational databases to store and analyze these datasets. (For more resources related to this topic, see here.) Storage, processing, and analyzing petabytes of data in a meaningful and timely manner require many compute nodes with thousands of disks and thousands of processors together with the ability to efficiently communicate massive amounts of data among them. Such a scale makes failures such as disk failures, compute node failures, network failures, and so on a common occurrence making fault tolerance a very important aspect of such systems. Other common challenges that arise include the significant cost of resources, handling communication latencies, handling heterogeneous compute resources, synchronization across nodes, and load balancing. As you can infer, developing and maintaining distributed parallel applications to process massive amounts of data while handling all these issues is not an easy task. This is where Apache Hadoop comes to our rescue. Google is one of the first organizations to face the problem of processing massive amounts of data. Google built a framework for large-scale data processing borrowing the map and reduce paradigms from the functional programming world and named it as MapReduce. At the foundation of Google, MapReduce was the Google File System, which is a high throughput parallel filesystem that enables the reliable storage of massive amounts of data using commodity computers. Seminal research publications that introduced Google MapReduce and Google File System concepts can be found at http://research.google.com/archive/mapreduce.html and http://research.google.com/archive/gfs.html.Apache Hadoop MapReduce is the most widely known and widely used open source implementation of the Google MapReduce paradigm. Apache Hadoop Distributed File System (HDFS) provides an open source implementation of the Google File Systems concept. Apache Hadoop MapReduce, HDFS, and YARN provide a scalable, fault-tolerant, distributed platform for storage and processing of very large datasets across clusters of commodity computers. Unlike in traditional High Performance Computing (HPC) clusters, Hadoop uses the same set of compute nodes for data storage as well as to perform the computations, allowing Hadoop to improve the performance of large scale computations by collocating computations with the storage. Also, the hardware cost of a Hadoop cluster is orders of magnitude cheaper than HPC clusters and database appliances due to the usage of commodity hardware and commodity interconnects. Together Hadoop-based frameworks have become the de-facto standard for storing and processing big data. Hadoop Distributed File System – HDFS HDFS is a block structured distributed filesystem that is designed to store petabytes of data reliably on compute clusters made out of commodity hardware. HDFS overlays on top of the existing filesystem of the compute nodes and stores files by breaking them into coarser grained blocks (for example, 128 MB). HDFS performs better with large files. HDFS distributes the data blocks of large files across to all the nodes of the cluster to facilitate the very high parallel aggregate read bandwidth when processing the data. HDFS also stores redundant copies of these data blocks in multiple nodes to ensure reliability and fault tolerance. Data processing frameworks such as MapReduce exploit these distributed sets of data blocks and the redundancy to maximize the data local processing of large datasets, where most of the data blocks would get processed locally in the same physical node as they are stored. HDFS consists of NameNode and DataNode services providing the basis for the distributed filesystem. NameNode stores, manages, and serves the metadata of the filesystem. NameNode does not store any real data blocks. DataNode is a per node service that manages the actual data block storage in the DataNodes. When retrieving data, client applications first contact the NameNode to get the list of locations the requested data resides in and then contact the DataNodes directly to retrieve the actual data. The following diagram depicts a high-level overview of the structure of HDFS: Hadoop v2 brings in several performance, scalability, and reliability improvements to HDFS. One of the most important among those is the High Availability (HA) support for the HDFS NameNode, which provides manual and automatic failover capabilities for the HDFS NameNode service. This solves the widely known NameNode single point of failure weakness of HDFS. Automatic NameNode high availability of Hadoop v2 uses Apache ZooKeeper for failure detection and for active NameNode election. Another important new feature is the support for HDFS federation. HDFS federation enables the usage of multiple independent HDFS namespaces in a single HDFS cluster. These namespaces would be managed by independent NameNodes, but share the DataNodes of the cluster to store the data. The HDFS federation feature improves the horizontal scalability of HDFS by allowing us to distribute the workload of NameNodes. Other important improvements of HDFS in Hadoop v2 include the support for HDFS snapshots, heterogeneous storage hierarchy support (Hadoop 2.3 or higher), in-memory data caching support (Hadoop 2.3 or higher), and many performance improvements. Almost all the Hadoop ecosystem data processing technologies utilize HDFS as the primary data storage. HDFS can be considered as the most important component of the Hadoop ecosystem due to its central nature in the Hadoop architecture. Hadoop YARN YARN (Yet Another Resource Negotiator) is the major new improvement introduced in Hadoop v2. YARN is a resource management system that allows multiple distributed processing frameworks to effectively share the compute resources of a Hadoop cluster and to utilize the data stored in HDFS. YARN is a central component in the Hadoop v2 ecosystem and provides a common platform for many different types of distributed applications. The batch processing based MapReduce framework was the only natively supported data processing framework in Hadoop v1. While MapReduce works well for analyzing large amounts of data, MapReduce by itself is not sufficient enough to support the growing number of other distributed processing use cases such as real-time data computations, graph computations, iterative computations, and real-time data queries. The goal of YARN is to allow users to utilize multiple distributed application frameworks that provide such capabilities side by side sharing a single cluster and the HDFS filesystem. Some examples of the current YARN applications include the MapReduce framework, Tez high performance processing framework, Spark processing engine, and the Storm real-time stream processing framework. The following diagram depicts the high-level architecture of the YARN ecosystem: The YARN ResourceManager process is the central resource scheduler that manages and allocates resources to the different applications (also known as jobs) submitted to the cluster. YARN NodeManager is a per node process that manages the resources of a single compute node. Scheduler component of the ResourceManager allocates resources in response to the resource requests made by the applications, taking into consideration the cluster capacity and the other scheduling policies that can be specified through the YARN policy plugin framework. YARN has a concept called containers, which is the unit of resource allocation. Each allocated container has the rights to a certain amount of CPU and memory in a particular compute node. Applications can request resources from YARN by specifying the required number of containers and the CPU and memory required by each container. ApplicationMaster is a per-application process that coordinates the computations for a single application. The first step of executing a YARN application is to deploy the ApplicationMaster. After an application is submitted by a YARN client, the ResourceManager allocates a container and deploys the ApplicationMaster for that application. Once deployed, the ApplicationMaster is responsible for requesting and negotiating the necessary resource containers from the ResourceManager. Once the resources are allocated by the ResourceManager, ApplicationMaster coordinates with the NodeManagers to launch and monitor the application containers in the allocated resources. The shifting of application coordination responsibilities to the ApplicationMaster reduces the burden on the ResourceManager and allows it to focus solely on managing the cluster resources. Also having separate ApplicationMasters for each submitted application improves the scalability of the cluster as opposed to having a single process bottleneck to coordinate all the application instances. The following diagram depicts the interactions between various YARN components, when a MapReduce application is submitted to the cluster: While YARN supports many different distributed application execution frameworks, our focus in this article is mostly on traditional MapReduce and related technologies. Hadoop MapReduce Hadoop MapReduce is a data processing framework that can be utilized to process massive amounts of data stored in HDFS. As we mentioned earlier, distributed processing of a massive amount of data in a reliable and efficient manner is not an easy task. Hadoop MapReduce aims to make it easy for users by providing a clean abstraction for programmers by providing automatic parallelization of the programs and by providing framework managed fault tolerance support. MapReduce programming model consists of Map and Reduce functions. The Map function receives each record of the input data (lines of a file, rows of a database, and so on) as key-value pairs and outputs key-value pairs as the result. By design, each Map function invocation is independent of each other allowing the framework to use divide and conquer to execute the computation in parallel. This also allows duplicate executions or re-executions of the Map tasks in case of failures or load imbalances without affecting the results of the computation. Typically, Hadoop creates a single Map task instance for each HDFS data block of the input data. The number of Map function invocations inside a Map task instance is equal to the number of data records in the input data block of the particular Map task instance. Hadoop MapReduce groups the output key-value records of all the Map tasks of a computation by the key and distributes them to the Reduce tasks. This distribution and transmission of data to the Reduce tasks is called the Shuffle phase of the MapReduce computation. Input data to each Reduce task would also be sorted and grouped by the key. The Reduce function gets invoked for each key and the group of values of that key (reduce <key, list_of_values>) in the sorted order of the keys. In a typical MapReduce program, users only have to implement the Map and Reduce functions and Hadoop takes care of scheduling and executing them in parallel. Hadoop will rerun any failed tasks and also provide measures to mitigate any unbalanced computations. Have a look at the following diagram for a better understanding of the MapReduce data and computational flows: In Hadoop 1.x, the MapReduce (MR1) components consisted of the JobTracker process, which ran on a master node managing the cluster and coordinating the jobs, and TaskTrackers, which ran on each compute node launching and coordinating the tasks executing in that node. Neither of these processes exist in Hadoop 2.x MapReduce (MR2). In MR2, the job coordinating responsibility of JobTracker is handled by an ApplicationMaster that will get deployed on-demand through YARN. The cluster management and job scheduling responsibilities of JobTracker are handled in MR2 by the YARN ResourceManager. JobHistoryServer has taken over the responsibility of providing information about the completed MR2 jobs. YARN NodeManagers provide the functionality that is somewhat similar to MR1 TaskTrackers by managing resources and launching containers (which in the case of MapReduce 2 houses Map or Reduce tasks) in the compute nodes. Hadoop installation modes Hadoop v2 provides three installation choices: Local mode: The local mode allows us to run MapReduce computation using just the unzipped Hadoop distribution. This nondistributed mode executes all parts of Hadoop MapReduce within a single Java process and uses the local filesystem as the storage. The local mode is very useful for testing/debugging the MapReduce applications locally. Pseudo distributed mode: Using this mode, we can run Hadoop on a single machine emulating a distributed cluster. This mode runs the different services of Hadoop as different Java processes, but within a single machine. This mode is good to let you play and experiment with Hadoop. Distributed mode: This is the real distributed mode that supports clusters that span from a few nodes to thousands of nodes. For production clusters, we recommend using one of the many packaged Hadoop distributions as opposed to installing Hadoop from scratch using the Hadoop release binaries, unless you have a specific use case that requires a vanilla Hadoop installation. Refer to the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe for more information on Hadoop distributions. Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution The Hadoop YARN ecosystem now contains many useful components providing a wide range of data processing, storing, and querying functionalities for the data stored in HDFS. However, manually installing and configuring all of these components to work together correctly using individual release artifacts is quite a challenging task. Other challenges of such an approach include the monitoring and maintenance of the cluster and the multiple Hadoop components. Luckily, there exist several commercial software vendors that provide well integrated packaged Hadoop distributions to make it much easier to provision and maintain a Hadoop YARN ecosystem in our clusters. These distributions often come with easy GUI-based installers that guide you through the whole installation process and allow you to select and install the components that you require in your Hadoop cluster. They also provide tools to easily monitor the cluster and to perform maintenance operations. For regular production clusters, we recommend using a packaged Hadoop distribution from one of the well-known vendors to make your Hadoop journey much easier. Some of these commercial Hadoop distributions (or editions of the distribution) have licenses that allow us to use them free of charge with optional paid support agreements. Hortonworks Data Platform (HDP) is one such well-known Hadoop YARN distribution that is available free of charge. All the components of HDP are available as free and open source software. You can download HDP from http://hortonworks.com/hdp/downloads/. Refer to the installation guides available in the download page for instructions on the installation. Cloudera CDH is another well-known Hadoop YARN distribution. The Express edition of CDH is available free of charge. Some components of the Cloudera distribution are proprietary and available only for paying clients. You can download Cloudera Express from http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-express.html. Refer to the installation guides available on the download page for instructions on the installation. Hortonworks HDP, Cloudera CDH, and some of the other vendors provide fully configured quick start virtual machine images that you can download and run on your local machine using a virtualization software product. These virtual machines are an excellent resource to learn and try the different Hadoop components as well as for evaluation purposes before deciding on a Hadoop distribution for your cluster. Apache Bigtop is an open source project that aims to provide packaging and integration/interoperability testing for the various Hadoop ecosystem components. Bigtop also provides a vendor neutral packaged Hadoop distribution. While it is not as sophisticated as the commercial distributions, Bigtop is easier to install and maintain than using release binaries of each of the Hadoop components. In this recipe, we provide steps to use Apache Bigtop to install Hadoop ecosystem in your local machine. Benchmarking Hadoop MapReduce using TeraSort Hadoop TeraSort is a well-known benchmark that aims to sort 1 TB of data as fast as possible using Hadoop MapReduce. TeraSort benchmark stresses almost every part of the Hadoop MapReduce framework as well as the HDFS filesystem making it an ideal choice to fine-tune the configuration of a Hadoop cluster. The original TeraSort benchmark sorts 10 million 100 byte records making the total data size 1 TB. However, we can specify the number of records, making it possible to configure the total size of data. Getting ready You must set up and deploy HDFS and Hadoop v2 YARN MapReduce prior to running these benchmarks, and locate the hadoop-mapreduce-examples-*.jar file in your Hadoop installation. How to do it... The following steps will show you how to run the TeraSort benchmark on the Hadoop cluster: The first step of the TeraSort benchmark is the data generation. You can use the teragen command to generate the input data for the TeraSort benchmark. The first parameter of teragen is the number of records and the second parameter is the HDFS directory to generate the data. The following command generates 1 GB of data consisting of 10 million records to the tera-in directory in HDFS. Change the location of the hadoop-mapreduce-examples-*.jar file in the following commands according to your Hadoop installation: $ hadoop jar $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 10000000 tera-in It's a good idea to specify the number of Map tasks to the teragen computation to speed up the data generation. This can be done by specifying the –Dmapred.map.tasks parameter. Also, you can increase the HDFS block size for the generated data so that the Map tasks of the TeraSort computation would be coarser grained (the number of Map tasks for a Hadoop computation typically equals the number of input data blocks). This can be done by specifying the –Ddfs.block.size parameter. $ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen –Ddfs.block.size=536870912 –Dmapred.map.tasks=256 10000000 tera-in The second step of the TeraSort benchmark is the execution of the TeraSort MapReduce computation on the data generated in step 1 using the following command. The first parameter of the terasort command is the input of HDFS data directory, and the second part of the terasort command is the output of the HDFS data directory. $ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort tera-in tera-out It's a good idea to specify the number of Reduce tasks to the TeraSort computation to speed up the Reducer part of the computation. This can be done by specifying the –Dmapred.reduce.tasks parameter as follows: $ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort –Dmapred.reduce.tasks=32 tera-in tera-out The last step of the TeraSort benchmark is the validation of the results. This can be done using the teravalidate application as follows. The first parameter is the directory with the sorted data and the second parameter is the directory to store the report containing the results. $ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoopmapreduce- examples-*.jar teravalidate tera-out tera-validate How it works... TeraSort uses the sorting capability of the MapReduce framework together with a custom range Partitioner to divide the Map output among the Reduce tasks ensuring the global sorted order. Optimizing Hadoop YARN and MapReduce configurations for cluster deployments In this recipe, we explore some of the important configuration options of Hadoop YARN and Hadoop MapReduce. Commercial Hadoop distributions typically provide a GUI-based approach to specify Hadoop configurations. YARN allocates resource containers to the applications based on the resource requests made by the applications and the available resource capacity of the cluster. A resource request by an application would consist of the number of containers required and the resource requirement of each container. Currently, most container resource requirements are specified using the amount of memory. Hence, our focus in this recipe will be mainly on configuring the memory allocation of a YARN cluster. Getting ready Set up a Hadoop cluster by following the earlier recipes. How to do it... The following instructions will show you how to configure the memory allocation in a YARN cluster. The number of tasks per node is derived using this configuration: The following property specifies the amount of memory (RAM) that can be used by YARN containers in a worker node. It's advisable to set this slightly less than the amount of physical RAM present in the node, leaving some memory for the OS and other non-Hadoop processes. Add or modify the following lines in the yarn-site.xml file: <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>100240</value> </property> The following property specifies the minimum amount of memory (RAM) that can be allocated to a YARN container in a worker node. Add or modify the following lines in the yarn-site.xml file to configure this property. If we assume that all the YARN resource-requests request containers with only the minimum amount of memory, the maximum number of concurrent resource containers that can be executed in a node equals (YARN memory per node specified in step 1)/(YARN minimum allocation configured below). Based on this relationship, we can use the value of the following property to achieve the desired number of resource containers per node.The number of resource containers per node is recommended to be less than or equal to the minimum of (2*number CPU cores) or (2* number of disks). <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value> </property> Restart the YARN ResourceManager and NodeManager services by running sbin/stop-yarn.sh and sbin/start-yarn.sh from the HADOOP_HOME directory. The following instructions will show you how to configure the memory requirements of the MapReduce applications. The following properties define the maximum amount of memory (RAM) that will be available to each Map and Reduce task. These memory values will be used when MapReduce applications request resources from YARN for Map and Reduce task containers. Add the following lines to the mapred-site.xml file: <property> <name>mapreduce.map.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> The following properties define the JVM heap size of the Map and Reduce tasks respectively. Set these values to be slightly less than the corresponding values in step 4, so that they won't exceed the resource limits of the YARN containers. Add the following lines to the mapred-site.xml file: <property> <name>mapreduce.map.java.opts</name> <value>-Xmx2560m</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx5120m</value> </property> How it works... We can control Hadoop configurations through the following four configuration files. Hadoop reloads the configurations from these configuration files after a cluster restart: core-site.xml: Contains the configurations common to the whole Hadoop distribution hdfs-site.xml: Contains configurations for HDFS mapred-site.xml: Contains configurations for MapReduce yarn-site.xml: Contains configurations for the YARN ResourceManager and NodeManager processes Each configuration file has name-value pairs expressed in XML format, defining the configurations of different aspects of Hadoop. The following is an example of a property in a configuration file. The <configuration> tag is the top-level parent XML container and <property> tags, which define individual properties, are specified as child tags inside the <configuration> tag: <configuration>   <property>     <name>mapreduce.reduce.shuffle.parallelcopies</name>     <value>20</value>   </property>...</configuration> Some configurations can be configured on a per-job basis using the job.getConfiguration().set(name, value) method from the Hadoop MapReduce job driver code. There's more... There are many similar important configuration properties defined in Hadoop. The following are some of them: conf/core-site.xml Name Default value Description fs.inmemory.size.mb 200 Amount of memory allocated to the in-memory filesystem that is used to merge map outputs at reducers in MBs io.file.buffer.size 131072 Size of the read/write buffer used by sequence files conf/mapred-site.xml Name Default value Description mapreduce.reduce.shuffle.parallelcopies 20 Maximum number of parallel copies the reduce step will execute to fetch output from many parallel jobs mapreduce.task.io.sort.factor 50 Maximum number of streams merged while sorting files mapreduce.task.io.sort.mb 200 Memory limit while sorting data in MBs conf/hdfs-site.xml Name Default value Description dfs.blocksize 134217728 HDFS block size dfs.namenode.handler.count 200 Number of server threads to handle RPC calls in NameNodes You can find a list of deprecated properties in the latest version of Hadoop and the new replacement properties for them at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.The following documents provide the list of properties, their default values, and the descriptions of each of the configuration files mentioned earlier: Common configuration: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml HDFS configuration: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml YARN configuration: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml MapReduce configuration: http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml Unit testing Hadoop MapReduce applications using MRUnit MRUnit is a JUnit-based Java library that allows us to unit test Hadoop MapReduce programs. This makes it easy to develop as well as to maintain Hadoop MapReduce code bases. MRUnit supports testing Mappers and Reducers separately as well as testing MapReduce computations as a whole. In this recipe, we'll be exploring all three testing scenarios. Getting ready We use Gradle as the build tool for our sample code base. How to do it... The following steps show you how to perform unit testing of a Mapper using MRUnit: In the setUp method of the test class, initialize an MRUnit MapDriver instance with the Mapper class you want to test. In this example, we are going to test the Mapper of the WordCount MapReduce application we discussed in earlier recipes: public class WordCountWithToolsTest {   MapDriver<Object, Text, Text, IntWritable> mapDriver;   @Before public void setUp() {    WordCountWithTools.TokenizerMapper mapper =       new WordCountWithTools.TokenizerMapper();    mapDriver = MapDriver.newMapDriver(mapper); } …… } Write a test function to test the Mapper logic. Provide the test input to the Mapper using the MapDriver.withInput method. Then, provide the expected result of the Mapper execution using the MapDriver.withOutput method. Now, invoke the test using the MapDriver.runTest method. The MapDriver.withAll and MapDriver.withAllOutput methods allow us to provide a list of test inputs and a list of expected outputs, rather than adding them individually. @Test public void testWordCountMapper() throws IOException {    IntWritable inKey = new IntWritable(0);    mapDriver.withInput(inKey, new Text("Test Quick"));    ….    mapDriver.withOutput(new Text("Test"),new     IntWritable(1));    mapDriver.withOutput(new Text("Quick"),new     IntWritable(1));    …    mapDriver.runTest(); } The following step shows you how to perform unit testing of a Reducer using MRUnit. Similar to step 1 and 2, initialize a ReduceDriver by providing the Reducer class under test and then configure the ReduceDriver with the test input and the expected output. The input to the reduce function should conform to a key with a list of values. Also, in this test, we use the ReduceDriver.withAllOutput method to provide a list of expected outputs. public class WordCountWithToolsTest { ReduceDriver<Text,IntWritable,Text,IntWritable>   reduceDriver;   @Before public void setUp() {    WordCountWithTools.IntSumReducer reducer =       new WordCountWithTools.IntSumReducer();    reduceDriver = ReduceDriver.newReduceDriver(reducer); }   @Test public void testWordCountReduce() throws IOException {    ArrayList<IntWritable> reduceInList =       new ArrayList<IntWritable>();    reduceInList.add(new IntWritable(1));    reduceInList.add(new IntWritable(2));      reduceDriver.withInput(new Text("Quick"),     reduceInList);    ...    ArrayList<Pair<Text, IntWritable>> reduceOutList =       new ArrayList<Pair<Text,IntWritable>>();    reduceOutList.add(new Pair<Text, IntWritable>     (new Text("Quick"),new IntWritable(3)));    ...    reduceDriver.withAllOutput(reduceOutList);    reduceDriver.runTest(); } } The following steps show you how to perform unit testing on a whole MapReduce computation using MRUnit. In this step, initialize a MapReduceDriver by providing the Mapper class and Reducer class of the MapReduce program that you want to test. Then, configure the MapReduceDriver with the test input data and the expected output data. When executed, this test will execute the MapReduce execution flow starting from the Map input stage to the Reduce output stage. It's possible to provide a combiner implementation to this test as well. public class WordCountWithToolsTest { …… MapReduceDriver<Object, Text, Text, IntWritable, Text,IntWritable> mapReduceDriver; @Before public void setUp() { .... mapReduceDriver = MapReduceDriver. newMapReduceDriver(mapper, reducer); } @Test public void testWordCountMapReduce() throws IOException { IntWritable inKey = new IntWritable(0); mapReduceDriver.withInput(inKey, new Text ("Test Quick")); …… ArrayList<Pair<Text, IntWritable>> reduceOutList = new ArrayList<Pair<Text,IntWritable>>(); reduceOutList.add(new Pair<Text, IntWritable> (new Text("Quick"),new IntWritable(2))); …… mapReduceDriver.withAllOutput(reduceOutList); mapReduceDriver.runTest(); } } The Gradle build script (or any other Java build mechanism) can be configured to execute these unit tests with every build. We can add the MRUnit dependency to the Gradle build file as follows: dependencies { testCompile group: 'org.apache.mrunit', name: 'mrunit',   version: '1.1.+',classifier: 'hadoop2' …… } Use the following Gradle command to execute only the WordCountWithToolsTest unit test. This command executes any test class that matches the pattern **/WordCountWith*.class: $ gradle –Dtest.single=WordCountWith test :chapter3:compileJava UP-TO-DATE :chapter3:processResources UP-TO-DATE :chapter3:classes UP-TO-DATE :chapter3:compileTestJava UP-TO-DATE :chapter3:processTestResources UP-TO-DATE :chapter3:testClasses UP-TO-DATE :chapter3:test BUILD SUCCESSFUL Total time: 27.193 secs You can also execute MRUnit-based unit tests in your IDE. You can use the gradle eclipse or gradle idea commands to generate the project files for the Eclipse and IDEA IDE respectively. Generating an inverted index using Hadoop MapReduce Simple text searching systems rely on inverted index to look up the set of documents that contain a given word or a term. In this recipe, we implement a simple inverted index building application that computes a list of terms in the documents, the set of documents that contains each term, and the term frequency in each of the documents. Retrieval of results from an inverted index can be as simple as returning the set of documents that contains the given terms or can involve much more complex operations such as returning the set of documents ordered based on a particular ranking. Getting ready You must have Apache Hadoop v2 configured and installed to follow this recipe. Gradle is needed for the compiling and building of the source code. How to do it... In the following steps, we use a MapReduce program to build an inverted index for a text dataset: Create a directory in HDFS and upload a text dataset. This dataset should consist of one or more text files. $ hdfs dfs -mkdir input_dir $ hdfs dfs -put *.txt input_dir You can download the text versions of the Project Gutenberg books by following the instructions given at http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages. Make sure to provide the filetypes query parameter of the download request as txt. Unzip the downloaded files. You can use the unzipped text files as the text dataset for this recipe. Compile the source by running the gradle build command from the chapter 8 folder of the source repository. Run the inverted indexing MapReduce job using the following command.Provide the HDFS directory where you uploaded the input data in step 2 as the first argument and provide an HDFS path to store the output as the second argument: $ hadoop jar hcb-c8-samples.jar chapter8.invertindex.TextOutInvertedIndexMapReduce input_dir output_dir Check the output directory for the results by running the following command. The output of this program will consist of the term followed by a comma-separated list of filename and frequency: $ hdfs dfs -cat output_dir/* ARE three.txt:1,one.txt:1,four.txt:1,two.txt:1, AS three.txt:2,one.txt:2,four.txt:2,two.txt:2, AUGUSTA three.txt:1, About three.txt:1,two.txt:1, Abroad three.txt:2, We used the text outputting inverted indexing MapReduce program in step 3 for the clarity of understanding the algorithm. Run the program by substituting the command in step 3 with the following command: $ hadoop jar hcb-c8-samples.jar chapter8.invertindex.InvertedIndexMapReduce input_dir seq_output_dir How it works... The Map Function receives a chunk of an input document as the input and outputs the term and <docid, 1> pair for each word. In the Map function, we first replace all the non-alphanumeric characters from the input text value before tokenizing it as follows: public void map(Object key, Text value, ……… { String valString = value.toString().replaceAll("[^a-zA-Z0-9]+"," "); StringTokenizer itr = new StringTokenizer(valString); StringTokenizer(value.toString()); FileSplit fileSplit = (FileSplit) context.getInputSplit(); String fileName = fileSplit.getPath().getName(); while (itr.hasMoreTokens()) { term.set(itr.nextToken()); docFrequency.set(fileName, 1); context.write(term, docFrequency); } } We use the getInputSplit() method of MapContext to obtain a reference to InputSplit assigned to the current Map task. The InputSplits class for this computation are instances of FileSplit due to the usage of a FileInputFormat based InputFormat. Then we use the getPath() method of FileSplit to obtain the path of the file containing the current split and extract the filename from it. We use this extracted filename as the document ID when constructing the inverted index. The Reduce function receives IDs and frequencies of all the documents that contain the term (Key) as the input. The Reduce function then outputs the term and a list of document IDs and the number of occurrences of the term in each document as the output: public void reduce(Text key, Iterable values,Context context) …………{ HashMap<Text, IntWritable> map = new HashMap<Text, IntWritable>(); for (TermFrequencyWritable val : values) { Text docID = new Text(val.getDocumentID()); int freq = val.getFreq().get(); if (map.get(docID) != null) { map.put(docID, new IntWritable(map.get(docID).get() + freq)); } else { map.put(docID, new IntWritable(freq)); } } MapWritable outputMap = new MapWritable(); outputMap.putAll(map); context.write(key, outputMap); } In the preceding model, we output a record for each word, generating a large amount of intermediate data between Map tasks and Reduce tasks. We use the following combiner to aggregate the terms emitted by the Map tasks, reducing the amount of Intermediate data that needs to be transferred between Map and Reduce tasks: public void reduce(Text key, Iterable values …… { int count = 0; String id = ""; for (TermFrequencyWritable val : values) { count++; if (count == 1) { id = val.getDocumentID().toString(); } } TermFrequencyWritable writable = new TermFrequencyWritable(); writable.set(id, count); context.write(key, writable); } In the driver program, we set the Mapper, Reducer, and the Combiner classes. Also, we specify both Output Value and the MapOutput Value properties as we use different value types for the Map tasks and the reduce tasks. … job.setMapperClass(IndexingMapper.class); job.setReducerClass(IndexingReducer.class); job.setCombinerClass(IndexingCombiner.class); … job.setMapOutputValueClass(TermFrequencyWritable.class); job.setOutputValueClass(MapWritable.class); job.setOutputFormatClass(SequenceFileOutputFormat.class); There's more... We can improve this indexing program by performing optimizations such as filtering stop words, substituting words with word stems, storing more information about the context of the word, and so on, making indexing a much more complex problem. Luckily, there exist several open source indexing frameworks that we can use for indexing purposes. The later recipes of this article will explore indexing using Apache Solr and Elasticsearch, which are based on the Apache Lucene indexing engine. The upcoming section introduces the usage of MapFileOutputFormat to store InvertedIndex in an indexed random accessible manner. Outputting a random accessible indexed InvertedIndex Apache Hadoop supports a file format called MapFile that can be used to store an index into the data stored in SequenceFiles. MapFile is very useful when we need to random access records stored in a large SequenceFile. You can use the MapFileOutputFormat format to output MapFiles, which would consist of a SequenceFile containing the actual data and another file containing the index into the SequenceFile. The chapter8/invertindex/MapFileOutInvertedIndexMR.java MapReduce program in the source folder of chapter8 utilizes MapFiles to store a secondary index into our inverted index. You can execute that program by using the following command. The third parameter (sample_lookup_term) should be a word that is present in your input dataset: $ hadoop jar hcb-c8-samples.jar      chapter8.invertindex.MapFileOutInvertedIndexMR      input_dir indexed_output_dir sample_lookup_term If you check indexed_output_dir, you will be able to see folders named as part-r-xxxxx with each containing a data and an index file. We can load these indexes to MapFileOutputFormat and perform random lookups for the data. An example of a simple lookup using this method is given in the MapFileOutInvertedIndexMR.java program as follows: MapFile.Reader[] indexReaders = MapFileOutputFormat.getReaders(new Path(args[1]), getConf());MapWritable value = new MapWritable();Text lookupKey = new Text(args[2]);// Performing the lookup for the values if the lookupKeyWritable map = MapFileOutputFormat.getEntry(indexReaders,new HashPartitioner<Text, MapWritable>(), lookupKey, value); In order to use this feature, you need to make sure to disable Hadoop from writing a _SUCCESS file in the output folder by setting the following property. The presence of the _SUCCESS file might cause an error when using MapFileOutputFormat to lookup the values in the index: job.getConfiguration().setBoolean     ("mapreduce.fileoutputcommitter.marksuccessfuljobs", false); Data preprocessing using Hadoop streaming and Python Data preprocessing is an important and often required component in data analytics. Data preprocessing becomes even more important when consuming unstructured text data generated from multiple different sources. Data preprocessing steps include operations such as cleaning the data, extracting important features from data, removing duplicate items from the datasets, converting data formats, and many more. Hadoop MapReduce provides an ideal environment to perform these tasks in parallel when processing massive datasets. Apart from using Java MapReduce programs or Pig scripts or Hive scripts to preprocess the data, Hadoop also contains several other tools and features that are useful in performing these data preprocessing operations. One such feature is the InputFormats, which provides us with the ability to support custom data formats by implementing custom InputFormats. Another feature is the Hadoop Streaming support, which allows us to use our favorite scripting languages to perform the actual data cleansing and extraction, while Hadoop will parallelize the computation to hundreds of compute and storage resources. In this recipe, we are going to use Hadoop Streaming with a Python script-based Mapper to perform data extraction and format conversion. Getting ready Check whether Python is already installed on the Hadoop worker nodes. If not, install Python on all the Hadoop worker nodes. How to do it... The following steps show how to clean and extract data from the 20news dataset and store the data as a tab-separated file: Download and extract the 20news dataset from http://qwone.com/~jason/20Newsgroups/20news-19997.tar.gz: $ wget http://qwone.com/~jason/20Newsgroups/20news-19997.tar.gz$ tar –xzf 20news-19997.tar.gz Upload the extracted data to the HDFS. In order to save the compute time and resources, you can use only a subset of the dataset: $ hdfs dfs -mkdir 20news-all$ hdfs dfs –put <extracted_folder> 20news-all Extract the resource package and locate the MailPreProcessor.py Python script. Locate the hadoop-streaming.jar JAR file of the Hadoop installation in your machine. Run the following Hadoop Streaming command using that JAR. /usr/lib/hadoop-mapreduce/ is the hadoop-streaming JAR file's location for the BigTop-based Hadoop installations: $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input 20news-all/*/* -output 20news-cleaned -mapper MailPreProcessor.py -file MailPreProcessor.py Inspect the results using the following command: > hdfs dfs –cat 20news-cleaned/part-* | more How it works... Hadoop uses the default TextInputFormat as the input specification for the previous computation. Usage of the TextInputFormat generates a Map task for each file in the input dataset and generates a Map input record for each line. Hadoop streaming provides the input to the Map application through the standard input: line = sys.stdin.readline(); while line: …. if (doneHeaders):    list.append( line ) elif line.find( "Message-ID:" ) != -1:    messageID = line[ len("Message-ID:"):] …. elif line == "":    doneHeaders = True      line = sys.stdin.readline(); The preceding Python code reads the input lines from the standard input until it reaches the end of the file. We parse the headers of the newsgroup file till we encounter the empty line that demarcates the headers from the message contents. The message content will be read in to a list line by line: value = ' '.join( list ) value = fromAddress + "t" ……"t" + value print '%st%s' % (messageID, value) The preceding code segment merges the message content to a single string and constructs the output value of the streaming application as a tab-delimited set of selected headers, followed by the message content. The output key value is the Message-ID header extracted from the input file. The output is written to the standard output by using a tab to delimit the key and the value. There's more... We can generate the output of the preceding computation in the Hadoop SequenceFile format by specifying SequenceFileOutputFormat as the OutputFormat of the streaming computations: $ hadoop jar /usr/lib/Hadoop-mapreduce/hadoop-streaming.jar -input 20news-all/*/* -output 20news-cleaned -mapper MailPreProcessor.py -file MailPreProcessor.py -outputformat          org.apache.hadoop.mapred.SequenceFileOutputFormat -file MailPreProcessor.py It is a good practice to store the data as SequenceFiles (or other Hadoop binary file formats such as Avro) after the first pass of the input data because SequenceFiles takes up less space and supports compression. You can use hdfs dfs -text <path_to_sequencefile> to output the contents of a SequenceFile to text: $ hdfs dfs –text 20news-seq/part-* | more However, for the preceding command to work, any Writable classes that are used in the SequenceFile should be available in the Hadoop classpath. Loading large datasets to an Apache HBase data store – importtsv and bulkload The Apache HBase data store is very useful when storing large-scale data in a semi-structured manner, so that it can be used for further processing using Hadoop MapReduce programs or to provide a random access data storage for client applications. In this recipe, we are going to import a large text dataset to HBase using the importtsv and bulkload tools. Getting ready Install and deploy Apache HBase in your Hadoop cluster. Make sure Python is installed in your Hadoop compute nodes. How to do it… The following steps show you how to load the TSV (tab-separated value) converted 20news dataset in to an HBase table: Follow the Data preprocessing using Hadoop streaming and Python recipe to perform the preprocessing of data for this recipe. We assume that the output of the following step 4 of that recipe is stored in an HDFS folder named "20news-cleaned": $ hadoop jar    /usr/lib/hadoop-mapreduce/hadoop-streaming.jar    -input 20news-all/*/*    -output 20news-cleaned    -mapper MailPreProcessor.py -file MailPreProcessor.py Start the HBase shell: $ hbase shell Create a table named 20news-data by executing the following command in the HBase shell. Older versions of the importtsv (used in the next step) command can handle only a single column family. Hence, we are using only a single column family when creating the HBase table: hbase(main):001:0> create '20news-data','h' Execute the following command to import the preprocessed data to the HBase table created earlier: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,h:from,h:group,h:subj,h:msg 20news-data 20news-cleaned Start the HBase Shell and use the count and scan commands of the HBase shell to verify the contents of the table: hbase(main):010:0> count '20news-data'           12xxx row(s) in 0.0250 seconds hbase(main):010:0> scan '20news-data', {LIMIT => 10} ROW                                       COLUMN+CELL                                                                           <1993Apr29.103624.1383@cronkite.ocis.te column=h:c1,       timestamp=1354028803355, value= katop@astro.ocis.temple.edu   (Chris Katopis)> <1993Apr29.103624.1383@cronkite.ocis.te column=h:c2,     timestamp=1354028803355, value= sci.electronics   ...... The following are the steps to load the 20news dataset to an HBase table using the bulkload feature: Follow steps 1 to 3, but create the table with a different name: hbase(main):001:0> create '20news-bulk','h' Use the following command to generate an HBase bulkload datafile: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,h:from,h:group,h:subj,h:msg -Dimporttsv.bulk.output=hbaseloaddir 20news-bulk–source 20news-cleaned List the files to verify that the bulkload datafiles are generated: $ hadoop fs -ls 20news-bulk-source ...... drwxr-xr-x   - thilina supergroup         0 2014-04-27 10:06 /user/thilina/20news-bulk-source/h   $ hadoop fs -ls 20news-bulk-source/h -rw-r--r--   1 thilina supergroup     19110 2014-04-27 10:06 /user/thilina/20news-bulk-source/h/4796511868534757870 The following command loads the data to the HBase table by moving the output files to the correct location: $ hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles 20news-bulk-source 20news-bulk......14/04/27 10:10:00 INFO mapreduce.LoadIncrementalHFiles: Tryingto load hfile=hdfs://127.0.0.1:9000/user/thilina/20news-bulksource/h/4796511868534757870 first= <1993Apr29.103624.1383@cronkite.ocis.temple.edu>last= <stephens.736002130@ngis>...... Start the HBase Shell and use the count and scan commands of the HBase shell to verify the contents of the table: hbase(main):010:0> count '20news-bulk'             hbase(main):010:0> scan '20news-bulk', {LIMIT => 10} How it works... The MailPreProcessor.py Python script extracts a selected set of data fields from the newsboard message and outputs them as a tab-separated dataset: value = fromAddress + "t" + newsgroup +"t" + subject +"t" + value print '%st%s' % (messageID, value) We import the tab-separated dataset generated by the Streaming MapReduce computations to HBase using the importtsv tool. The importtsv tool requires the data to have no other tab characters except for the tab characters that separate the data fields. Hence, we remove any tab characters that may be present in the input data by using the following snippet of the Python script: line = line.strip() line = re.sub('t',' ',line) The importtsv tool supports the loading of data into HBase directly using the Put operations as well as by generating the HBase internal HFiles as well. The following command loads the data to HBase directly using the Put operations. Our generated dataset contains a Key and four fields in the values. We specify the data fields to the table column name mapping for the dataset using the -Dimporttsv.columns parameter. This mapping consists of listing the respective table column names in the order of the tab-separated data fields in the input dataset: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=<data field to table column mappings>    <HBase tablename> <HDFS input directory> We can use the following command to generate HBase HFiles for the dataset. These HFiles can be directly loaded to HBase without going through the HBase APIs, thereby reducing the amount of CPU and network resources needed: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=<filed to column mappings> -Dimporttsv.bulk.output=<path for hfile output> <HBase tablename> <HDFS input directory> These generated HFiles can be loaded into HBase tables by simply moving the files to the right location. This moving can be performed by using the completebulkload command: $ hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <HDFS path for hfiles> <table name> There's more... You can use the importtsv tool that has datasets with other data-filed separator characters as well by specifying the '-Dimporttsv.separator' parameter. The following is an example of using a comma as the separator character to import a comma-separated dataset in to an HBase table: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=<data field to table column mappings>    <HBase tablename> <HDFS input directory> Look out for Bad Lines in the MapReduce job console output or in the Hadoop monitoring console. One reason for Bad Lines is to have unwanted delimiter characters. The Python script we used in the data-cleaning step removes any extra tabs in the message: 14/03/27 00:38:10 INFO mapred.JobClient:   ImportTsv 14/03/27 00:38:10 INFO mapred.JobClient:     Bad Lines=2 Data de-duplication using HBase HBase supports the storing of multiple versions of column values for each record. When querying, HBase returns the latest version of values, unless we specifically mention a time period. This feature of HBase can be used to perform automatic de-duplication by making sure we use the same RowKey for duplicate values. In our 20news example, we use MessageID as the RowKey for the records, ensuring duplicate messages will appear as different versions of the same data record. HBase allows us to configure the maximum or minimum number of versions per column family. Setting the maximum number of versions to a low value will reduce the data usage by discarding the old versions. Refer to http://hbase.apache.org/book/schema.versions.html for more information on setting the maximum or minimum number of versions. Summary In this article, we have learned about getting started with Hadoop, Benchmarking Hadoop MapReduce, optimizing Hadoop YARN, unit testing, generating an inverted index, data processing, and loading large datasets to an Apache HBase data store. Resources for Article: Further resources on this subject: Hive in Hadoop [article] Evolution of Hadoop [article] Learning Data Analytics with R and Hadoop [article]
Read more
  • 0
  • 0
  • 7562

article-image-introduction-mobile-web-arcgis-development
Packt
05 Mar 2015
9 min read
Save for later

Introduction to Mobile Web ArcGIS Development

Packt
05 Mar 2015
9 min read
In this article by Matthew Sheehan, author of the book Developing Mobile Web ArcGIS Applications, GIS is a location-focused technology. Esri's ArcGIS platform provides a complete set of GIS capabilities to store, access, analyze, query, and visualize spatial data. The advent of cloud and mobile computing has increased the focus on location technology dramatically. By leveraging the GPS capabilities of most mobile devices, mobile users are very interested in finding out who and what is near them. Mobile maps and apps that have a location component are increasingly becoming more popular. ArcGIS provides a complete set of developer tools to satisfy these new location-centric demands. (For more resources related to this topic, see here.) Web ArcGIS development The following screenshot illustrates different screen sizes: Web ArcGIS development is focused to provide users access to ArcGIS Server, ArcGIS Online, or Portal for ArcGIS services. This includes map visualization and a slew of geospatial services, which include providing users the ability to search, identify, buffer, measure, and much more. Portal for ArcGIS provides the same experience as ArcGIS Online, but within an organization's infrastructure (on-premises or in the cloud). This is a particularly good solution where there are concerns around security. The ArcGIS JavaScript API is the most popular among the web development tools provided by Esri. The API can be used both for mobile and desktop web application development. Esri describes the ArcGIS API for JavaScript as "a lightweight way to embed maps and tasks in web applications." You can get these maps from ArcGIS Online, your own ArcGIS Server or others' servers." The API documents can be found at https://developers.arcgis.com/Javascript/jshelp/. Mobile Web Development is different Developers new to mobile web development need to take into consideration the many differences in mobile development as compared to traditional desktop web development. These differences include screen size, user interaction, design, functionality, and responsiveness. Let's discuss these differences in more depth. Screen size and pixel density There are large variations in the screen sizes of mobile devices. These can range from 3.5 inch for smartphones to more than 10.1 inches for tablets. Similarly, pixel density varies enormously between devices. These differences affect both the look and feel of your mobile ArcGIS app and user interaction. When building a mobile web application, the target device and pixel density needs careful consideration. Interaction Mobile ArcGIS app interaction is based on the finger rather than the mouse. That means tap, swipe and pinch. The following screenshot illustrates the Smartphone ArcGIS finger interactions: Often, it is convenient to include some of the more traditional map interaction tools, such as zoom sliders, but most users will pan and zoom using finger gestures. Any onscreen elements such as buttons need to be large to allow for the lack of precision of a finger tap. Mobile ArcGIS app also provides new data input mechanisms. Data input relies on a screen-based, touch-driven keyboard. Usually, multiple keyboards are available, these are for character input, numeric input, and date input respectively. Voice input might also be another input mechanism. Providing mobile users feedback is very important. If a button is tapped, it is good practice to give a visual cue of state change, such as changing the button color. For example, as shown in the following screenshots, a green-coloured button with the label, 'Online', changes to red and the label changes to 'Offline' after a user tap: Design A simple and intuitive design is key to the success of any mobile ArcGIS application. If your mobile application is hard to use or understand, you will soon lose users' attention and interest. Workflows should be easy. One screen should flow logically to the next. It should be obvious to the user how to return to the previous or home screen. The following screenshot describes small pop ups open to attributes screen: ArcGIS mobile applications are usually map-focused . Tools often sit over the map or hide the map completely. Developers need to carefully consider the design of the application so that it is easy for users to return to the map. GIS maps are made up of a basemap with the so called feature overlays. These overlays are map layers and represent geographic features, either through a point, line, or a polygon . Points may represent water valves, lines may be buried pipelines, while polygons may represent parks. Mobile devices can change orientation from profile to landscape. On screen elements will appear different in each of these modes. Orientation again needs consideration during mobile application development. The Bootstrap JavaScript framework helps automatically adjust onscreen elements based on orientation and a device's screen size. Often, mobile ArcGIS applications have a multi-column layout. This commonly includes a layer list on the left side, a central map, and a set of tools on the right side. This works well on larger devices, but less well on smaller smartphones. Applications often need to be designed so that they can collapse into a single column when they are accessed from these smaller devices. The following screenshot illustrates the multiple versus single column layouts: Mobile applications often need to be styled for a specific platform. Apple and Android have styling and design guidelines. Much can be done with stylesheets to ensure that an application has the look and feel of a specific platform. Functionality Mobile ArcGIS applications often have different uses to their desktop-based cousins. Desktop web ArcGIS applications are often built with many tools and are commonly used for analysis. In contrast, mobile ArGIS applications provide a simpler, focused functionality. The mobile user base is also more varied and includes both GIS and non-GIS trained users. Maintenance staff, surveyors, attorneys, engineers, and consumers are increasingly interested in using mobile ArcGIS applications. Developers need to carefully consider both the target users and the functionality provided when they build any mobile ArcGIS web application. Mobile ArcGIS users do not want applications that provide a plethora of complex tools. Being simple and focused is the key. Responsiveness Users expect mobile applications to be fast and responsive. Think about how people use their mobile devices today. Gaming and social media are extremely popular. High performance is always expected . Extra effort and attention is needed during mobile web development to optimize performance. Sure, network issues are out of your hands, but much can be done to ensure that your mobile ArcGIS application is fast. Mobile Browsers There are an increasing number of mobile browsers that are now available. These include Safari, Chrome, Opera, Dolphin, and IE. Many are built using the WebKit rendering engine, which provides the most current browser capabilities. The following icons shows different mobile web browsers that are now available: As with desktop web development, cross-browser testing is crucial as you cannot predict which browsers your users will use. Mobile functionality that works well in one browser may not work quite so well in a different browser. There are an increasing number of resources to help you with the challenges of testing such as modernizer.com, yepnopejs.com, and caniuse.com. See the excellent article in Mashable on mobile testing tools at http://mashable.com/2014/02/26/browser-testing-tools/. Web, native and hybrid mobile applications The following screenshot illustrates three different types of mobile applications: There are three main types of mobile applications: web, native, and hybrid. At one point of time, native was the preferred approach for mobile development . However, web technology is advancing at a rapid rate and in many situations, it could now be argued to be actually a better choice than native. Flexibility is a key benefits of mobile web as one code base runs across all devices and platforms. Another key advantage of a web approach is that a browser-based application, built with the ArcGIS JavaScript API, can be converted into an installable native-like application using technologies such as PhoneGap. These are called hybrid mobile apps. Mobile frameworks, toolkits and libraries JavaScript is one of the most popular languages in the world. It is an implementation of the ECMAScript language open standard . There are a plethora of available tools, built by the JavaScript community. The ArcGIS JavacSript API is built on the Dojo framework. This is an open source JavaScript framework used for constructing dynamic web user interfaces. Dojo provides modules, widgets, and plugins, making it an very flexible development framework. Another extremely useful framework is jQuery Mobile. This is another excellent option for ArcGIS JavaScript developers. Bootstrap is a popular framework for developing responsive mobile applications, as the following screenshot illustrates Bootstrap: The framework provides automatic layout adaptation. This includes adapting to changes in device orientation and screen size. Using this framework means your ArcGIS web application will look good and be usable on all mobile devices: smartphones, phablet, and tablets. PhoneGap/Cordova allows developers to convert web mobile applications to installable hybrid apps. This is another excellent open source framework. The following screenshot illustrates how to convert a web app to hybrid using Phonegap: Hybrid apps built using PhoneGap can access mobile resources, such as GPS, camera, SD card, compass, and accelerometer, and be distributed through the various mobile app stores just like a native app. Cordova is the open source framework that is managed by the Apache Foundation. PhoneGap is based on Cordova and PhoneGap is owned and managed by Adobe. For more information, go to http://cordova.apache.org/. The names are often used interchangeably, and in fact, the two are very similar. However, there is a legal and technical difference between them. Summary In this article, we introduced mobile web development by leveraging the ArcGIS JavaScript API. We discussed some of the key areas in which mobile web development is different than traditional desktop development. These areas include screen size, user interaction, design, functionality, and responsiveness. Some of the many advantages of mobile web ArcGIS development were also discussed, including flexibility wherein one application runs on all platforms and devices. Finally, we introduced hybrid apps and how a browser-based web application can be converted into an installable app using PhoneGap. As location becomes ever more important to mobile users, so will the demand for web-based ArcGIS mobile applications. Therefore, those who understand the ArcGIS JavaScript API should have a bright future ahead. Resources for Article: Further resources on this subject: Adding Graphics to the Map [article] ArcGIS Spatial Analyst [article] Python functions – Avoid repeating code [article]
Read more
  • 0
  • 0
  • 2568
article-image-megaman-clone-part-1
Travis and
05 Mar 2015
6 min read
Save for later

Unity 2D: Creating a Megaman Clone | Part 1

Travis and
05 Mar 2015
6 min read
In this post, we're going to be making a simple mega man clone. Now that sounds like it's going to be a huge undertaking, and in reality making an entire mega man clone would be huge, but what we are going to focus on is building a simple shoot functionality and enemies that can be destroyed by a bullet. Now I’m going to skip a lot of the basics on creating things like squares and such to keep the pace of this higher, so if you need information on some of these types of things, we recommend looking at some of the other more beginner oriented articles on this site. My screenshots for this lesson will be using the Unity 5.0 beta version, but the directions should be applicable to Unity 4.6+. So let's begin! Getting Started First create your new game. Make it a 2D project, and then make 2 3D cube objects for this. Name one cube “Ground”, and the other Cube “Player”. First select the ground object and switch its transform attributes to the following. While still on the Ground object, add a material to it called “Grey”, and give it the same color, and remove the BoxCollider component in the inspector, and add a Box Collider 2D component Once you have your cube, resize him so that he's thinner, as mega man has never been known to be an obese robot. We're also going to add a Rigidbody 2D, a Box Collider 2D, and a material called “Blue” that is blue for use later. Also for ease, change the tag on the player to the tag “Player”. Great, here is how your inspector should look for the “Player” object.   Lastly, for simplicity sake, just create a directional light object in the scene, and then drag it out of view. This will just allow us to at least see the seen. With our basic objects in place, try clicking play on the scene and watch what happens! Your player should just drop on to the ground object we created. Perfect, we have gravity and some world bounds, so let's create movement controls! Now, Unity comes with a bunch of pre-made controls that we can use, but there is no better way to learn scripting than scripting ourselves! So, first, let's get some lateral movement going. Create a new C# script called PlayerMovement and attach it to the Player object. First we’re going to make some simple movement controls using the Input class and use what axis is being pressed to see what direction our character is moving in. This allows us to use only one segment of code for all of our characters movement, making editing later much easier to do. Here is what your code will look like:   We move our MovePlayer into its own method for only the reason of keeping our code cleaner, and allowing it to be easy later to know what is being done by sections of code. Filling up our Update method is only going to cause headaches later. Next, we're going to implement some simple jump functionality. But before that, looking at this code we already have, we think we should actually start something that, if not learned early on, is a tough habit to build later. What we’re going to learn is references. These references will hold a link to different components, whether they are attached to the same Game Object or attached to a completely different one. You see, every time you reference any component, it’s going to take time to have to actually search for that component, and then perform the code you have stated to do with it. Doing this will cause small loss of performance each time you do it. While this may not seem like a big deal, this small problem tends to build up over a ton of classes and objects later on, so it's best to just start with references right when we start. So let’s build some references, as well as add some variables and methods for our jump function.   As you can see in the linked code, we have our rigidbody and our transform now put into 2 private variables that we can use later on that we wont have to travel through our game object to find. Now you may think, "hey I get all these from the gameobject.transform and the gameobject.rigidbody calls, do I need these references?" Well the truth is, every time you right those, they are actually calling the 2 GetComponent calls we do in the Start method every time. So by saving a reference, we make sure that Unity doesn’t have to make these calls over and over. So lets quickly finish up our jump functionality and test our game! add the following code to get our like character jumping and running around the screen. Alright, you should be able to test this out for yourself, but let’s quickly go over what was used. We created an isOnGround boolean that only let's us jump if we are on the ground, and a jumpPower float that lets us change our players' jumpPower for testing and things like power-ups. Next in the Jump method, we watch for when the player clicks space and the player is on the ground, and add force to the object's rigidbody moving them up in y by our jumpPower. Lastly, we use simple collision detection to see if our player has collided with anything. If they have, then we assume they are on the ground, and allow them to jump again. Now there are some obvious errors this will cause us later, but for simplicity we have decided to use it for this article, and that is that if the player touches anything else, and this includes a platform above him, he will be allowed to jump again. Now there are a number of solutions for this problem that we may fix in the next post, but for now, this will allow us to at least move and jump our character correctly. So that wraps up this post. In part 2, we will get our character a weapon and some enemies and start being able to destroy some evil robots. See you then! For more Unity tutorials make sure you check out our Unity page. Click here and start exploring. About the Authors Denny is a Mobile Application Developer at Canadian Tire Development Operations. While working, Denny regularly uses Unity to create in-store experiences, but also works on other technologies like Famous, Phaser.IO, LibGDX, and CreateJS when creating game-like apps. He also enjoys making non-game mobile apps, but who cares about that, am I right? Travis is a Software Engineer, living in the bitter region of Winnipeg, Canada. His work and hobbies include Game Development with Unity or Phaser.IO, as well as Mobile App Development. He can enjoy a good video game or two, but only if he knows he'll win!
Read more
  • 0
  • 2
  • 11002

article-image-advanced-cypher-tricks
Packt
05 Mar 2015
8 min read
Save for later

Advanced Cypher tricks

Packt
05 Mar 2015
8 min read
Cypher is a highly efficient language that not only makes querying simpler but also strives to optimize the result-generation process to the maximum. A lot more optimization in performance can be achieved with the help of knowledge related to the data domain of the application being used to restructure queries. This article by Sonal Raj, the author of Neo4j High Performance, covers a few tricks that you can implement with Cypher for optimization. (For more resources related to this topic, see here.) Query optimizations There are certain techniques you can adopt in order to get the maximum performance out of your Cypher queries. Some of them are: Avoid global data scans: The manual mode of optimizing the performance of queries depends on the developer's effort to reduce the traversal domain and to make sure that only the essential data is obtained in results. A global scan searches the entire graph, which is fine for smaller graphs but not for large datasets. For example: START n =node(*) MATCH (n)-[:KNOWS]-(m) WHERE n.identity = "Batman" RETURN m Since Cypher is a greedy pattern-matching language, it avoids discrimination unless explicitly told to. Filtering data with a start point should be undertaken at the initial stages of execution to speed up the result-generation process. In Neo4j versions greater than 2.0, the START statement in the preceding query is not required, and unless otherwise specified, the entire graph is searched. The use of labels in the graphs and in queries can help to optimize the search process for the pattern. For example: START n =node(*) MATCH (n:superheroes)-[:KNOWS]-(m) WHERE n.identity = "Batman" RETURN m Using the superheroes label in the preceding query helps to shrink the domain, thereby making the operation faster. This is referred to as a label-based scan. Indexing and constraints for faster search: Searches in the graph space can be optimized and made faster if the data is indexed, or we apply some sort of constraint on it. In this way, the traversal avoids redundant matches and goes straight to the desired index location. To apply an index on a label, you can use the following: CREATE INDEX ON: superheroes(identity) Otherwise, to create a constraint on the particular property such as making the value of the property unique so that it can be directly referenced, we can use the following: CREATE CONSTRAINT ON n:superheroes ASSERT n.identity IS UNIQUE We will learn more about indexing, its types, and its utilities in making Neo4j more efficient for large dataset-based operations in the next sections. Avoid Cartesian Products Generation: When creating queries, we should include entities that are connected in some way. The use of unspecific or nonrelated entities can end up generating a lot of unused or unintended results. For example: MATCH (m:Game), (p:Player) This will end up mapping all possible games with all possible players and that can lead to undesired results. Let's use an example to see how to avoid Cartesian products in queries: MATCH ( a:Actor), (m:Movie), (s:Series) RETURN COUNT(DISTINCT a), COUNT(DISTINCT m), COUNT(DISTINCTs) This statement will find all possible triplets of the Actor, Movie, and Series labels and then filter the results. An optimized form of querying will include successive counting to get a final result as follows: MATCH (a:Actor) WITH COUNT(a) as actors MATCH (m:Movie) WITH COUNT(m) as movies, actors MATCH (s:Series) RETURN COUNT(s) as series, movies, actors This increases the 10x improvement in the execution time of this query on the same dataset. Use more patterns in MATCH rather than WHERE: It is advisable to keep most of the patterns used in the MATCH clause. The WHERE clause is not exactly meant for pattern matching; rather it is used to filter the results when used with START and WITH. However, when used with MATCH, it implements constraints to the patterns described. Thus, the pattern matching is faster when you use the pattern with the MATCH section. After finding starting points—either by using scans, indexes, or already-bound points—the execution engine will use pattern matching to find matching subgraphs. As Cypher is declarative, it can change the order of these operations. Predicates in WHERE clauses can be evaluated before, during, or after pattern matching. Split MATCH patterns further: Rather than having multiple match patterns in the same MATCH statement in a comma-separated fashion, you can split the patterns in several distinct MATCH statements. This process considerably decreases the query time since it can now search on smaller or reduced datasets at each successive match stage. When splitting the MATCH statements, you must keep in mind that the best practices include keeping the pattern with labels of the smallest cardinality at the head of the statement. You must also try to keep those patterns generating smaller intermediate result sets at the beginning of the match statements block. Profiling of queries: You can monitor your queries' processing details in the profile of the response that you can achieve with the PROFILE keyword, or setting profile parameter to True while making the request. Some useful information can be in the form of _db_hits that show you how many times an entity (node, relationship, or property) has been encountered. Returning data in a Cypher response has substantial overhead. So, you should strive to restrict returning complete nodes or relationships wherever possible and instead, simply return the desired properties or values computed from the properties. Parameters in queries: The execution engine of Cypher tries to optimize and transform queries into relevant execution plans. In order to optimize the amount of resources dedicated to this task, the use of parameters as compared to literals is preferred. With this technique, Cypher can re-utilize the existing queries rather than parsing or compiling the literal-hbased queries to build fresh execution plans: MATCH (p:Player) –[:PLAYED]-(game) WHERE p.id = {pid} RETURN game When Cypher is building execution plans, it looks at the schema to see whether it can find useful indexes. These index decisions are only valid until the schema changes, so adding or removing indexes leads to the execution plan cache being flushed. Add the direction arrowhead in cases where the graph is to be queries in a directed manner. This will reduce a lot of redundant operations. Graph model optimizations Sometimes, the query optimizations can be a great way to improve the performance of the application using Neo4j, but you can incorporate some fundamental practices while you define your database so that it can make things easier and faster for usage: Explicit definition: If the graph model we are working upon contains implicit relationships between components. A higher efficiency in queries can be achieved when we define these relations in an explicit manner. This leads to faster comparisons but it comes with a drawback that now the graph would require more storage space for an additional entity for all occurrences of data. Let's see this in action with the help of an example. In the following diagram, we see that when two players have played in the same game, they are most likely to know each other. So, instead of going through the game entity for every pair of connected players, we can define the KNOWS relationship explicitly between the players. Property refactoring: This refers to the situation where complex time-consuming operations in the WHERE or MATCH clause can be included directly as properties in the nodes of the graph. This not only saves computation time resulting in much faster queries but it also leads to more organized data storage practices in the graph database for utility. For example: MATCH (m:Movie) WHERE m.releaseDate >1343779201 AND m.releaseDate< 1369094401 RETURN m This query is to compare whether a movie has been released in a particular year; it can be optimized if the release year of the movie is inherently stored in the properties of the movie nodes in the graph as the year range 2012-2013. So, for the new format of the data, the query will now change to this: MATCH (m:Movie)-[:CONTAINS]->(d) WHERE s.name = "2012-2013" RETURN g This gives a marked improvement in the performance of the query in terms of its execution time. Summary These are the various tricks that can be implemented in Cypher for optimization. Resources for Article: Further resources on this subject: Recommender systems dissected [Article] Working with a Neo4j Embedded Database [Article] Adding Graphics to the Map [Article]
Read more
  • 0
  • 0
  • 8190

article-image-our-app-and-tool-stack
Packt
04 Mar 2015
33 min read
Save for later

Our App and Tool Stack

Packt
04 Mar 2015
33 min read
In this article by Zachariah Moreno, author of the book AngularJS Deployment Essentials, you will learn how to do the following: Minimize efforts and maximize results using a tool stack optimized for AngularJS development Access the krakn app via GitHub Scaffold an Angular app with Yeoman, Grunt, and Bower Set up a local Node.js development server Read through krakn's source code Before NASA or Space X launches a vessel into the cosmos, there is a tremendous amount of planning and preparation involved. The guiding principle when planning for any successful mission is similar to minimizing efforts and resources while retaining maximum return on the mission. Our principles for development and deployment are no exception to this axiom, and you will gain a firmer working knowledge of how to do so in this article. (For more resources related to this topic, see here.) The right tools for the job Web applications can be compared to buildings; without tools, neither would be a pleasure to build. This makes tools an indispensable factor in both development and construction. When tools are combined, they form a workflow that can be repeated across any project built with the same stack, facilitating the practices of design, development, and deployment. The argument can be made that it is just as paramount to document workflow as an application's source code or API. Along with grouping tools into categories based on the phases of building applications, it is also useful to group tools based on the opinions of a respective project—in our case, Angular, Ionic, and Firebase. I call tools grouped into opinionated workflows tool stacks. For example, the remainder of this article discusses the tool stack used to build the application that we will deploy across environments in this book. In contrast, if you were to build a Ruby on Rails application, the tool stack would be completely different because the project's opinions are different. Our app is called krakn, and it functions as a real-time chat application built on top of the opinions of Angular, the Ionic Framework, and Firebase. You can find all of krakn's source code at https://github.com/zachmoreno/krakn. Version control with Git and GitHub Git is a command-line interface (CLI) developed by Linus Torvalds, to use on the famed Linux kernel. Git is mostly popular due to its distributed architecture making it nearly impossible for corruption to occur. Git's distributed architecture means that any remote repository has all of the same information as your local repository. It is useful to think of Git as a free insurance policy for my code. You will need to install Git using the instructions provided at www.git-scm.com/ for your development workstation's operating system. GitHub.com has played a notable role in Git's popularization, turning its functionality into a social network focused on open source code contributions. With a pricing model that incentivizes Open Source contributions and licensing for private, GitHub elevated the use of Git to heights never seen before. If you don't already have an account on GitHub, now is the perfect time to visit github.com to provision a free account. I mentioned earlier that krakn's code is available for forking at github.com/ZachMoreno/krakn. This means that any person with a GitHub account has the ability to view my version of krakn, and clone a copy of their own for further modifications or contributions. In GitHub's web application, forking manifests itself as a button located to the right of the repository's title, which in this case is XachMoreno/krakn. When you click on the button, you will see an animation that simulates the hardcore forking action. This results in a cloned repository under your account that will have a title to the tune of YourName/krakn. Node.js Node.js, commonly known as Node, is a community-driven server environment built on Google Chrome's V8 JavaScript runtime that is entirely event driven and facilitates a nonblocking I/O model. According to www.nodejs.org, it is best suited for: "Data-intensive real-time applications that run across distributed devices." So what does all this boil down to? Node empowers web developers to write JavaScript both on the client and server with bidirectional real-time I/O. The advent of Node has empowered developers to take their skills from the client to the server, evolving from frontend to full stack (like a caterpillar evolving into a butterfly). Not only do these skills facilitate a pay increase, they also advance the Web towards the same functionality as the traditional desktop or native application. For our purposes, we use Node as a tool; a tool to build real-time applications in the fewest number of keystrokes, videos watched, and words read as possible. Node is, in fact, a modular tool through its extensible package interface, called Node Package Manager (NPM). You will use NPM as a means to install the remainder of our tool stack. NPM The NPM is a means to install Node packages on your local or remote server. NPM is how we will install the majority of the tools and software used in this book. This is achieved by running the $ npm install –g [PackageName] command in your command line or terminal. To search the full list of Node packages, visit www.npmjs.org or run $ npm search [Search Term] in your command line or terminal as shown in the following screenshot: Yeoman's workflow Yeoman is a CLI that is the glue that holds your tools into your opinionated workflow. Although the term opinionated might sound off-putting, you must first consider the wisdom and experience of the developers and community before you who maintain Yeoman. In this context, opinionated means a little more than a collection of the best practices that are all aimed at improving your developer's experience of building static websites, single page applications, and everything in between. Opinionated does not mean that you are locked into what someone else feels is best for you, nor does it mean that you must strictly adhere to the opinions or best practices included. Yeoman is general enough to help you build nearly anything for the Web as well as improving your workflow while developing it. The tools that make up Yeoman's workflow are Yo, Grunt.js, Bower, and a few others that are more-or-less optional, but are probably worth your time. Yo Apart from having one of the hippest namespaces, Yo is a powerful code generator that is intelligent enough to scaffold most sites and applications. By default, instantiating a yo command assumes that you mean to scaffold something at a project level, but yo can also be scoped more granularly by means of sub-generators. For example, the command for instantiating a new vanilla Angular project is as follows: $ yo angular radicalApp Yo will not finish your request until you provide some further information about your desired Angular project. This is achieved by asking you a series of relevant questions, and based on your answers, yo will scaffold a familiar application folder/file structure, along with all the boilerplate code. Note that if you have worked with the angular-seed project, then the Angular application that yo generates will look very familiar to you. Once you have an Angular app scaffolded, you can begin using sub-generator commands. The following command scaffolds a new route, radicalRoute, within radicalApp: $ yo angular:route radicalRoute The :route sub-generator is a very powerful command, as it automates all of the following key tasks: It creates a new file, radicalApp/scripts/controllers/radicalRoute.js, that contains the controller logic for the radicalRoute view It creates another new file, radicalApp/views/radicalRoute.html, that contains the associated view markup and directives Lastly, it adds an additional route within, radicalApp/scripts/app.js, that connects the view to the controller Additionally, the sub-generators for yo angular include the following: :controller :directive :filter :service :provider :factory :value :constant :decorator :view All the sub-generators allow you to execute finer detailed commands for scaffolding smaller components when compared to :route, which executes a combination of sub-generators. Installing Yo Within your workstation's terminal or command-line application type, insert the following command, followed by a return: $ npm install -g yo If you are a Linux or Mac user, you might want to prefix the command with sudo, as follows: $ sudo npm install –g yo Grunt Grunt.js is a task runner that enhances your existing and/or Yeoman's workflow by automating repetitive tasks. Each time you generate a new project with yo, it creates a /Gruntfile.js file that wires up all of the curated tasks. You might have noticed that installing Yo also installs all of Yo's dependencies. Reading through /Gruntfile.js should incite a fair amount of awe, as it gives you a snapshot of what is going on under the hood of Yeoman's curated Grunt tasks and its dependencies. Generating a vanilla Angular app produces a /Gruntfile.js file, as it is responsible for performing the following tasks: It defines where Yo places Bower packages, which is covered in the next section It defines the path where the grunt build command places the production-ready code It initializes the watch task to run: JSHint when JavaScript files are saved Karma's test runner when JavaScript files are saved Compass when SCSS or SASS files are saved The saved /Gruntfile.js file It initializes LiveReload when any HTML or CSS files are saved It configures the grunt server command to run a Node.js server on localhost:9000, or to show test results on localhost:9001 It autoprefixes CSS rules on LiveReload and grunt build It renames files for optimizing browser caching It configures the grunt build command to minify images, SVG, HTML, and CSS files or to safely minify Angular files Let us pause for a moment to reflect on the amount of time it would take to find, learn, and implement each dependency into our existing workflow for each project we undertake. Ok, we should now have a greater appreciation for Yeoman and its community. For the vast majority of the time, you will likely only use a few Grunt commands, which include the following: $ grunt server $ grunt test $ grunt build Bower If Yo scaffolds our application's structure and files, and Grunt automates repetitive tasks for us, then what does Bower bring to the party? Bower is web development's missing package manager. Its functionality parallels that of Ruby Gems for the Ruby on Rails MVC framework, but is not limited to any single framework or technology stack. The explicit use of Bower is not required by the Yeoman workflow, but as I mentioned previously, the use of Bower is configured automatically for you in your project's /Gruntfile.js file. How does managing packages improve our development workflow? With all of the time we've been spending in our command lines and terminals, it is handy to have the ability to automate the management of third-party dependencies within our application. This ability manifests itself in a few simple commands, the most ubiquitous being the following command: $ bower install [PackageName] --save With this command, Bower will automate the following steps: First, search its packages for the specified package name Download the latest stable version of the package if found Move the package to the location defined in your project's /Gruntfile.js file, typically a folder named /bower_components Insert dependencies in the form of <link> elements for CSS files in the document's <head> element, and <script> elements for JavaScript files right above the document's closing </body> tag, to the package's files within your project's /index.html file This process is one that web developers are more than familiar with because adding a JavaScript library or new dependency happens multiple times within every project. Bower speeds up our existing manual process through automation and improves it by providing the latest stable version of a package and then notifying us of an update if one is available. This last part, "notifying us of an update if … available", is important because as a web developer advances from one project to the next, it is easy to overlook keeping dependencies as up to date as possible. This is achieved by running the following command: $ bower update This command returns all the available updates, if available, and will go through the same process of inserting new references where applicable. Bower.io includes all of the documentation on how to use Bower to its fullest potential along with the ability to search through all of the available Bower packages. Searching for available Bower packages can also be achieved by running the following command: $ bower search [SearchTerm] If you cannot find the specific dependency for which you search, and the project is on GitHub, consider contributing a bower.json file to the project's root and inviting the owner to register it by running the following command: $ bower register [ThePackageName] [GitEndpoint] Registration allows you to install your dependency by running the next command: $ bower install [ThePackageName] The Ionic framework The Ionic framework is a truly remarkable advancement in bridging the gap between web applications and native mobile applications. In some ways, Ionic parallels Yeoman where it assembles tools that were already available to developers into a neat package, and structures a workflow around them, inherently improving our experience as developers. If Ionic is analogous to Yeoman, then what are the tools that make up Ionic's workflow? The tools that, when combined, make Ionic noteworthy are Apache Cordova, Angular, Ionic's suite of Angular directives, and Ionic's mobile UI framework. Batarang An invaluable piece to our Angular tool stack is the Google Chrome Developer Tools extension, Batarang, by Brian Ford. Batarang adds a third-party panel (on the right-hand side of Console) to DevTools that facilitates Angular's specific inspection in the event of debugging. We can view data in the scopes of each model, analyze each expression's performance, and view a beautiful visualization of service dependencies all from within Batarang. Because Angular augments the DOM with ng- attributes, it also provides a Properties pane within the Elements panel, to inspect the models attached to a given element's scope. The extension is easy to install from either the Chrome Web Store or the project's GitHub repository and inspection can be enabled by performing the following steps: Firstly, open the Chrome Developer Tools. You should then navigate to the AngularJS panel. Finally, select the Enable checkbox on the far right tab. Your active Chrome tab will then be reloaded automatically, and the AngularJS panel will begin populating the inspection data. In addition, you can leverage the Angular pane with the Elements panel to view Angular-specific properties at an elemental level, and observe the $scope variable from within the Console panel. Sublime Text and Editor Integration While developing any Angular app, it is helpful to augment our workflow further with Angular-specific syntax completion, snippets, go to definition, and quick panel search in the form of a Sublime Text package. Perform the following steps: If you haven't installed Sublime Text already, you need to first install Package Control. Otherwise, continue with the next step. Once installed, press command + Shift + P in Sublime. Then, you need to select the Package Control: Install Package option. Finally, type angularjs and press Enter on your keyboard. In addition to support within Sublime, Angular enhancements exist for lots of popular editors, including WebStorm, Coda, and TextMate. Krakn As a quick refresher, krakn was constructed using all of the tools that are covered in this article. These include Git, GitHub, Node.js, NPM, Yeoman's workflow, Yo, Grunt, Bower, Batarang, and Sublime Text. The application builds on Angular, Firebase, the Ionic Framework, and a few other minor dependencies. The workflow I used to develop krakn went something like the following. Follow these steps to achieve the same thing. Note that you can skip the remainder of this section if you'd like to get straight to the deployment action, and feel free to rename things where necessary. Setting up Git and GitHub The workflow I followed while developing krakn begins with initializing our local Git repository and connecting it to our remote master repository on GitHub. In order to install and set up both, perform the following steps: Firstly, install all the tool stack dependencies, and create a folder called krakn. Following this, run $ git init, and you will create a README.md file. You should then run $ git add README.md and commit README.md to the local master branch. You then need to create a new remote repository on GitHub called XachMoreno/krakn. Following this, run the following command: $ git remote add origin git@github.com:[YourGitHubUserName] /krakn.git Conclude the setup by running $ git push –u origin master. Scaffolding the app with Yo Scaffolding our app couldn't be easier with the yo ionic generator. To do this, perform the following steps: Firstly, install Yo by running $ npm install -g yo. After this, install generator-ionicjs by running $ npm install -g generator-ionicjs. To conclude the scaffolding of your application, run the yo ionic command. Development After scaffolding the folder structure and boilerplate code, our workflow advances to the development phase, which is encompassed in the following steps: To begin, run grunt server. You are now in a position to make changes, for example, these being deletions or additions. Once these are saved, LiveReload will automatically reload your browser. You can then review the changes in the browser. Repeat steps 2-4 until you are ready to advance to the predeployment phase. Views, controllers, and routes Being a simple chat application, krakn has only a handful of views/routes. They are login, chat, account, menu, and about. The menu view is present in all the other views in the form of an off-canvas menu. The login view The default view/route/controller is named login. The login view utilizes the Firebase's Simple Login feature to authenticate users before proceeding to the rest of the application. Apart from logging into krakn, users can register a new account by entering their desired credentials. An interesting part of the login view is the use of the ng-show directive to toggle the second password field if the user selects the register button. However, the ng-model directive is the first step here, as it is used to pass the input text from the view to the controller and ultimately, the Firebase Simple Login. Other than the Angular magic, this view uses the ion-view directive, grid, and buttons that are all core to Ionic. Each view within an Ionic app is wrapped within an ion-view directive that contains a title attribute as follows: <ion-view title="Login"> The login view uses the standard input elements that contain a ng-model attribute to bind the input's value back to the controller's $scope as follows:   <input type="text" placeholder="you@email.com" ng-model= "data.email" />     <input type="password" placeholder=  "embody strength" ng-model="data.pass" />     <input type="password" placeholder=  "embody strength" ng-model="data.confirm" /> The Log In and Register buttons call their respective functions using the ng-click attribute, with the value set to the function's name as follows:   <button class="button button-block button-positive" ng-  click="login()" ng-hide="createMode">Log In</button> The Register and Cancel buttons set the value of $scope.createMode to true or false to show or hide the correct buttons for either action:   <button class="button button-block button-calm" ng-  click="createMode = true" ng-hide=  "createMode">Register</button>   <button class="button button-block button-calm" ng-  show="createMode" ng-click=  "createAccount()">Create Account</button>     <button class="button button-block button-  assertive" ng-show="createMode" ng-click="createMode =   false">Cancel</button> $scope.err is displayed only when you want to show the feedback to the user:   <p ng-show="err" class="assertive text-center">{{err}}</p>   </ion-view> The login controller is dependent on Firebase's loginService module and Angular's core $location module: controller('LoginCtrl', ['$scope', 'loginService', '$location',   function($scope, loginService, $location) { Ionic's directives tend to create isolated scopes, so it was useful here to wrap our controller's variables within a $scope.data object to avoid issues within the isolated scope as follows:     $scope.data = {       "email"   : null,       "pass"   : null,       "confirm"  : null,       "createMode" : false     } The login() function easily checks the credentials before authentication and sends feedback to the user if needed:     $scope.login = function(cb) {       $scope.err = null;       if( !$scope.data.email ) {         $scope.err = 'Please enter an email address';       }       else if( !$scope.data.pass ) {         $scope.err = 'Please enter a password';       } If the credentials are sound, we send them to Firebase for authentication, and when we receive a success callback, we route the user to the chat view using $location.path() as follows:       else {         loginService.login($scope.data.email,         $scope.data.pass, function(err, user) {          $scope.err = err? err + '' : null;          if( !err ) {           cb && cb(user);           $location.path('krakn/chat');          }        });       }     }; The createAccount() function works in much the same way as login(), except that it ensures that the users don't already exist before adding them to your Firebase and logging them in:     $scope.createAccount = function() {       $scope.err = null;       if( assertValidLoginAttempt() ) {        loginService.createAccount($scope.data.email,    $scope.data.pass,          function(err, user) {           if( err ) {             $scope.err = err? err + '' : null;           }           else {             // must be logged in before I can write to     my profile             $scope.login(function() {              loginService.createProfile(user.uid,     user.email);              $location.path('krakn/account');             });           }          });       }     }; The assertValidLoginAttempt() function is a function used to ensure that no errors are received through the account creation and authentication flows:     function assertValidLoginAttempt() {       if( !$scope.data.email ) {        $scope.err = 'Please enter an email address';       }       else if( !$scope.data.pass ) {        $scope.err = 'Please enter a password';       }       else if( $scope.data.pass !== $scope.data.confirm ) {        $scope.err = 'Passwords do not match';       }       return !$scope.err;     }    }]) The chat view Keeping vegan practices aside, the meat and potatoes of krakn's functionality lives within the chat view/controller/route. The design is similar to most SMS clients, with the input in the footer of the view and messages listed chronologically in the main content area. The ng-repeat directive is used to display a message every time a message is added to the messages collection in Firebase. If you submit a message successfully, unsuccessfully, or without any text, feedback is provided via the placeholder attribute of the message input. There are two filters being utilized within the chat view: orderByPriority and timeAgo. The orderByPriority filter is defined within the firebase module that uses the Firebase object IDs that ensure objects are always chronological. The timeAgo filter is an open source Angular module that I found. You can access it at JS Fiddle. The ion-view directive is used once again to contain our chat view: <ion-view title="Chat"> Our list of messages is composed using the ion-list and ion-item directives, in addition to a couple of key attributes. The ion-list directive gives us some nice interactive controls using the option-buttons and can-swipe attributes. This results in each list item being swipable to the left, revealing our option-buttons as follows:    <ion-list option-buttons="itemButtons" can-swipe=     "true" ng-show="messages"> Our workhorse in the chat view is the trusty ng-repeat directive, responsible for persisting our data from Firebase to our service to our controller and into our view and back again:    <ion-item ng-repeat="message in messages |      orderByPriority" item="item" can-swipe="true"> Then, we bind our data into vanilla HTML elements that have some custom styles applied to them:     <h2 class="user">{{ message.user }}</h2> The third-party timeago filter converts the time into something such as, "5 min ago", similar to Instagram or Facebook:     <small class="time">{{ message.receivedTime |       timeago }}</small>     <p class="message">{{ message.text }}</p>    </ion-item>   </ion-list> A vanilla input element is used to accept chat messages from our users. The input data is bound to $scope.data.newMessage for sending data to Firebase and $scope.feedback is used to keep our users informed:   <input type="text" class="{{ feeling }}" placeholder=    "{{ feedback }}" ng-model="data.newMessage" /> When you click on the send/submit button, the addMessage() function sends the message to your Firebase, and adds it to the list of chat messages, in real time:   <button type="submit" id="chat-send" class="button button-small button-clear" ng-click="addMessage()"><span class="ion-android-send"></span></button> </ion-view> The ChatCtrl controller is dependant on a few more modules other than our LoginCtrl, including syncData, $ionicScrollDelegate, $ionicLoading, and $rootScope: controller('ChatCtrl', ['$scope', 'syncData', '$ionicScrollDelegate', '$ionicLoading', '$rootScope',    function($scope, syncData, $ionicScrollDelegate, $ionicLoading, $rootScope) { The userName variable is derived from the authenticated user's e-mail address (saved within the application's $rootScope) by splitting the e-mail and using everything before the @ symbol: var userEmail = $rootScope.auth.user.e-mail       userName = userEmail.split('@'); Avoid isolated scope issue in the same fashion, as we did in LoginCtrl:     $scope.data = {       newMessage   : null,       user      : userName[0]     } Our view will only contain the latest 20 messages that have been synced from Firebase:     $scope.messages = syncData('messages', 20); When a new message is saved/synced, it is added to the bottom of the ng-repeated list, so we use the $ionicScrollDeligate variable to automatically scroll the new message into view on the display as follows: $ionicScrollDelegate.scrollBottom(true); Our default chat input placeholder text is something on your mind?:     $scope.feedback = 'something on your mind?';     // displays as class on chat input placeholder     $scope.feeling = 'stable'; If we have a new message and a valid username (shortened), then we can call the $add() function, which syncs the new message to Firebase and our view is as follows:     $scope.addMessage = function() {       if(  $scope.data.newMessage         && $scope.data.user ) {        // new data elements cannot be synced without adding          them to FB Security Rules        $scope.messages.$add({                    text    : $scope.data.newMessage,                    user    : $scope.data.user,                    receivedTime : Number(new Date())                  });        // clean up        $scope.data.newMessage = null; On a successful sync, the feedback updates say Done! What's next?, as shown in the following code snippet:        $scope.feedback = 'Done! What's next?';        $scope.feeling = 'stable';       }       else {        $scope.feedback = 'Please write a message before sending';        $scope.feeling = 'assertive';       }     };       $ionicScrollDelegate.scrollBottom(true); ]) The account view The account view allows the logged in users to view their current name and e-mail address along with providing them with the ability to update their password and e-mail address. The input fields interact with Firebase in the same way as the chat view does using the syncData method defined in the firebase module: <ion-view title="'Account'" left-buttons="leftButtons"> The $scope.user object contains our logged in user's account credentials, and we bind them into our view as follows:   <p>{{ user.name }}</p>  …   <p>{{ user.email }}</p> The basic account management functionality is provided within this view; so users can update their e-mail address and or password if they choose to, using the following code snippet:   <input type="password" ng-keypress=    "reset()" ng-model="oldpass"/>  …   <input type="password" ng-keypress=    "reset()" ng-model="newpass"/>  …   <input type="password" ng-keypress=    "reset()" ng-model="confirm"/> Both the updatePassword() and updateEmail() functions work in much the same fashion as our createAccount() function within the LoginCtrl controller. They check whether the new e-mail or password is not the same as the old, and if all is well, it syncs them to Firebase and back again:   <button class="button button-block button-calm" ng-click=    "updatePassword()">update password</button>  …    <p class="error" ng-show="err">{{err}}</p>   <p class="good" ng-show="msg">{{msg}}</p>  …   <input type="text" ng-keypress="reset()" ng-model="newemail"/>  …   <input type="password" ng-keypress="reset()" ng-model="pass"/>  …   <button class="button button-block button-calm" ng-click=    "updateEmail()">update email</button>  …   <p class="error" ng-show="emailerr">{{emailerr}}</p>   <p class="good" ng-show="emailmsg">{{emailmsg}}</p>  … </ion-view> The menu view Within krakn/app/scripts/app.js, the menu route is defined as the only abstract state. Because of its abstract state, it can be presented in the app along with the other views by the ion-side-menus directive provided by Ionic. You might have noticed that only two menu options are available before signing into the application and that the rest appear only after authenticating. This is achieved using the ng-show-auth directive on the chat, account, and log out menu items. The majority of the options for Ionic's directives are available through attributes making them simple to use. For example, take a look at the animation="slide-left-right" attribute. You will find Ionic's use of custom attributes within the directives as one of the ways that the Ionic Framework is setting itself apart from other options within this space. The ion-side-menu directive contains our menu list similarly to the one we previously covered, the ion-view directive, as follows: <ion-side-menus>  <ion-pane ion-side-menu-content>   <ion-nav-bar class="bar-positive"> Our back button is displayed by including the ion-nav-back-button directive within the ion-nav-bar directive:    <ion-nav-back-button class="button-clear"><i class=     "icon ion-chevron-left"></i> Back</ion-nav-back-button>   </ion-nav-bar> Animations within Ionic are exposed and used through the animation attribute, which is built atop the ngAnimate module. In this case, we are doing a simple animation that replicates the experience of a native mobile app:   <ion-nav-view name="menuContent" animation="slide-left-right"></ion-nav-view>  </ion-pane>    <ion-side-menu side="left">   <header class="bar bar-header bar-positive">    <h1 class="title">Menu</h1>   </header>   <ion-content class="has-header"> A simple ion-list directive/element is used to display our navigation items in a vertical list. The ng-show attribute handles the display of menu items before and after a user has authenticated. Before a user logs in, they can access the navigation, but only the About and Log In views are available until after successful authentication.    <ion-list>     <ion-item nav-clear menu-close href=      "#/app/chat" ng-show-auth="'login'">      Chat     </ion-item>       <ion-item nav-clear menu-close href="#/app/about">      About     </ion-item>       <ion-item nav-clear menu-close href=      "#/app/login" ng-show-auth="['logout','error']">      Log In     </ion-item> The Log Out navigation item is only displayed once logged in, and upon a click, it calls the logout() function in addition to navigating to the login view:     <ion-item nav-clear menu-close href="#/app/login" ng-click=      "logout()" ng-show-auth="'login'">      Log Out     </ion-item>    </ion-list>   </ion-content>  </ion-side-menu> </ion-side-menus> The MenuCtrl controller is the simplest controller in this application, as all it contains is the toggleMenu() and logout() functions: controller("MenuCtrl", ['$scope', 'loginService', '$location',   '$ionicScrollDelegate', function($scope, loginService,   $location, $ionicScrollDelegate) {   $scope.toggleMenu = function() {    $scope.sideMenuController.toggleLeft();   };     $scope.logout = function() {     loginService.logout();     $scope.toggleMenu();  };  }]) The about view The about view is 100 percent static, and its only real purpose is to present the credits for all the open source projects used in the application. Global controller constants All of krakn's controllers share only two dependencies: ionic and ngAnimate. Because Firebase's modules are defined within /app/scripts/app.js, they are available for consumption by all the controllers without the need to define them as dependencies. Therefore, the firebase service's syncData and loginService are available to ChatCtrl and LoginCtrl for use. The syncData service is how krakn utilizes three-way data binding provided by krakenjs.com. For example, within the ChatCtrl controller, we use syncData( 'messages', 20 ) to bind the latest twenty messages within the messages collection to $scope for consumption by the chat view. Conversely, when a ng-click user clicks the submit button, we write the data to the messages collection by use of the syncData.$add() method inside the $scope.addMessage() function: $scope.addMessage = function() {   if(...) { $scope.messages.$add({ ... });   } }; Models and services The model for krakn is www.krakn.firebaseio.com. The services that consume krakn's Firebase API are as follows: The firebase service in krakn/app/scripts/service.firebase.js The login service in krakn/app/scripts/service.login.js The changeEmail service in krakn/app/scripts/changeEmail.firebase.js The firebase service defines the syncData service that is responsible for routing data bidirectionally between krakn/app/bower_components/angularfire.js and our controllers. Please note that the reason I have not mentioned angularfire.js until this point is that it is basically an abstract data translation layer between firebaseio.com and Angular applications that intend on consuming data as a service. Predeployment Once the majority of an application's development phase has been completed, at least for the initial launch, it is important to run all of the code through a build process that optimizes the file size through compression of images and minification of text files. This piece of the workflow was not overlooked by Yeoman and is available through the use of the $ grunt build command. As mentioned in the section on Grunt, the /Gruntfile.js file defines where built code is placed once it is optimized for deployment. Yeoman's default location for built code is the /dist folder, which might or might not exist depending on whether you have run the grunt build command before. Summary In this article, we discussed the tool stack and workflow used to build the app. Together, Git and Yeoman formed a solid foundation for building krakn. Git and GitHub provided us with distributed version control and a platform for sharing the application's source code with you and the world. Yeoman facilitated the remainder of the workflow: scaffolding with Yo, automation with Grunt, and package management with Bower. With our app fully scaffolded, we were able to build our interface with the directives provided by the Ionic Framework, and wire up the real-time data synchronization forged by our Firebase instance. With a few key tools, we were able to minimize our development time while maximizing our return. Resources for Article: Further resources on this subject: Role of AngularJS? [article] AngularJS Project [article] Creating Our First Animation AngularJS [article]
Read more
  • 0
  • 0
  • 2688
article-image-native-ms-security-tools-and-configuration
Packt
04 Mar 2015
19 min read
Save for later

Native MS Security Tools and Configuration

Packt
04 Mar 2015
19 min read
This article, written by Santhosh Sivarajan, the author of Getting Started with Windows Server Security, will introduce another powerful Microsoft tool called Microsoft Security Compliance Manager (SCM). As its name suggests, it is a platform for managing and maintaining your security and compliance polices. At this point, we have established baseline security based on your business requirement, using Microsoft SCW. These polices can be a pure reflection of your business requirements. However, in an enterprise world, you have to consider compliance, regulations, other industry standards, and best practices to maximize the effectiveness of the security policy. That's where Microsoft SCM can provide more business value. We will talk more about the included SCM baselines later in the article. The goal of the article is to walk you through the configuration and administration process of Microsoft SCM and explain how it can be used in an enterprise environment to support your security needs. Then we will talk about a method to maintain the desired state of the server using a Microsoft tool called Attack Surface Analyzer (ASA). At the end of the article, you will see an option to add more security restrictions using another Microsoft tool called AppLocker. (For more resources related to this topic, see here.) Microsoft SCM Microsoft SCM is a centralized security and compliance policy manager product from Microsoft. It is a standalone application. Microsoft develops these baselines and best practice recommendations based on customer feedback and other agency's recommendations. These polices are consistently reviewed and updated. So, it is important that you are using the latest policy baseline. If there is a new policy, you will be able to download and update the baseline from the Microsoft SCM console itself. Since Microsoft SCM supports multiple input and output formats such as XML, Group Policy Objects (GPO), Desired Configuration Management (DCM), Security Content Automation Protocol (SCAP), and so on, it can be a centralized platform for your network infrastructure and other security and compliance products. It is also possible to integrate SCM with Microsoft System Center 2012 Process Pack for IT GRC. More details can be found at http://technet.microsoft.com/en-us/library/dd206732.aspx. Installing Microsoft SCM We will start with the installation process. As mentioned earlier, it is a standalone product. It uses Microsoft SQL Server 2008 or higher as the database. If you don't have a SQL database already installed on your system, the SCM installation process will automatically install Microsoft SQL Server 2008 Express Edition. You can perform the following steps to install Microsoft SCM: Download Microsoft Security Compliance Manager from http://www.microsoft.com/en-us/download/details.aspx?id=16776. Double-click on Security_Compliance_Manager_Setup.exe to start the installation process. Click on Next on the welcome window. Make sure to select the Always check for SCM and baseline updates option. Accept the License Agreement option and click on Next. Select the installation folder from the Installation Folder window by clicking on the Browse button. Click on Next. On the Microsoft SQL Server 2008 Express window, click on Next to install Microsoft SQL Server 2008 Express Edition. If you have Microsoft SQL Server already installed on your system, you can select the correct server details from this window. Accept the License Agreement option for SQL Server 2008 Express and click on Next. Click on Install on the Ready to Install window to begin the installation. You will see the progress in the Installing the Microsoft Security Compliance Manager window. If it asks you to restart the computer, click on OK. Click on Finish to complete the installation. This section provides a high level overview of the product before starting the administration and management process. The left pane of the SCMconsole provides the list of all available baselines. This is the baseline library inside SCM. The center pane displays more information based on your policy section from the baseline library. The right pane, also called the Actions pane, provides commands and options to manage your policies. As you can see in the following screenshot, it provides a few options to export these policies into different formats. So, if you have a different compliance manager tool, you can use these files with your existing tool.  SCM – Export options In compliance with other products, Microsoft SCM supports different severity levels—critical, optional, important, and none. As you can see in the following screenshot, on a custom policy, the severity levels can be changed to None, Important, Optional, or Critical based on your requirements:   For each of these events, you will see additional details and reference articles (CCE, OVAL, and so on) in the Setting Details section. Administering Microsoft SCM This section provides you with an overview of Microsoft SCM and some administration procedures to create and manage policies. These tasks can be achieved by performing the following steps: Open Security Compliance Manager. If you see a Download Updates popup window, click on the Download button to start the download and complete the database update process. Security Compliance Manager consists of mainly two sections: Custom Baselines and Microsoft Baselines. We will go through the details later in this article. SCM - Baselines Expand Microsoft Baselines. Since we are focusing more on Windows Server 2012, I will start with this section. Select the Windows Server 2012 node. This node contains predefined security polices based on Microsoft and industry best practices. I will use the predefined WS2012 Web Server Security template for this exercise. You will not be able to make changes to the settings in the default template. If you need to make changes, you can make a copy of the template and make changes there. Select the WS2012 Web Server Security template. From the right pane, select the Duplicate option. In the Duplicate window, enter the name for this new security policy. Click on Save. The new template will be saved under the Custom Baselines node. You can review the policy and make necessary changes in the newly created policy. Creating and implementing security policies At this point, you have installed SCM and are familiar with the basic administration tasks. From this section onwards, you will be working on a real-world scenario where you will be exporting a policy from Active Directory, importing into SCM, merging with an SCM baseline, and importing back into Active Directory. In this section, our goal is to export this web server policy and merge it with an SCM baseline and import it back into Active Directory. Exporting GPO from Active Directory We will start by exporting the existing web server policy from Active Directory. The following steps can be performed to export (backup) an Active Directory GPO-based policy: Open the Group Policy Manager console. Expand Forest | Domain | Domain Name | Group Policy Objects. Right-click on the appropriate GPO and select Back Up. GPO – Back up In the Back Up Group Policy Object window, enter the Location and Description details for the backup file. Click on the Back Up button to start the backup operation. You will see the progress in the Backup window. Click on OK when it completes the backup operation. GPO can also be backed up using the Backup-GPO PowerShell cmdlet. The following is an example:Backup-Gpo. Name- "WebServerbaselineV2.0". Path- D:Backup -Comment "Baseline Backup" The backup folder name will be the GUID of the GPO itself. Importing GPO into SCM An exported GPO-based policy can be imported directly into SCM. An administrator can perform the following steps to complete this task: Open Microsoft Compliance Security Manager. From the Import section on the right pane, select the GPO Backup (Folder) option. SCM – Import In the Browse For Folder window, select the GPO backup folder. Click on OK. In the GPO Name window, confirm or change the baseline name. Click on OK. In the SCM Log window, you will see the status. Click on OK to close the window. You will see the imported policy under Custom Baselines | GPO Import | Policy Name. Currently, SCM supports importing from GPO backup and SCM CAB files. If you have some other policy or baseline (for example, DISA STIGs) that you would like to import into SCM, you need to import these polices into Active Directory first, and then export/backup to GPO before you can import into SCM. Merging imported GPO with the SCM baseline policy The third step in this process is to merge the imported policy with the SCM baseline policy. Keep in mind that some configurations and settings will be lost when you merge an existing GPO with the SCM baseline policy. For example, service-related or ACL configurations may not be preserved when you associate and merge with an SCM baseline policy. If you have these types of configuration in your GPO and want to retain them, you may need to split the GPO and use two separate GPOs. Inside the SCM, the import process is to map these configurations with the SCM library to preserve these settings. If it doesn't match or map, these settings will be dropped from the new baseline policy. For this exercise, my assumption is that you don't have a custom configuration or settings in the imported policy. The following steps can be used to Associate and Merge a GPO-based policy into an SCM-based policy: Select the imported policy in Microsoft Compliance Security Manager. From the right pane, select the Associate option from the Baseline section.Selecting the Associate option From the Associate Product with GPO window, select the appropriate baseline policy. Since we are working with a Windows Server 2012 policy, I will be selecting Windows Server 2012 as the product. If you have a different operating system, select the correct policy from the product list. Click on Associate. Your custom policy must have unique settings in the baseline policy in order to associate a custom policy with the SCM baseline policy; otherwise, the Associate button will be grayed out. Enter a name for this policy in the Baseline Policy window. You will see this policy in the Custom Baselines | Windows Server 2012 section. Select this policy. From the right pane, select the Compare/Merge option from the Baseline section. Selecting the Compare / Merge option Now you have associated your policy with an SCM baseline policy. The next step is to compare and merge your policy with a baseline SCM policy. From the Compare Baseline window, select the appropriate baseline policy. Since we are working with a web server baseline, we will be selecting WS2012 Web Server Security 1.0 as the policy. Click on OK. You will see the result in the Compare Baselines window. You can review the differ and match details here. Since we are planning to merge these two polices, we will be selecting the Merge Baselines option. You will see the summary report in the Merge Baselines window. Click on OK. In the Specify a name for the merged baseline window, enter a new name for this policy. Click on OK. This merged policy will be stored in the Custom Baselines– Windows Server 2012 section. Exporting the SCM baseline policy At this point, you have created a new policy that contains your custom policy and best practices provided by SCM. The next step is to export this policy to a supported format. Since we are dealing with Active Directory and GPO, we will be exporting it into a GPO-based policy. You can perform the following steps to export an SCM policy to a GPO-based backup policy: Select the policy from Microsoft Compliance Security Manager. From the Export section, select the GPO Backup (Folder) option. GPO Backup (Folder) From the Browse for Folder window, select the folder to store this policy in. Click on OK. Importing a policy into Active Directory The final step in this process is to import these settings back to Active Directory. This can be achieved by using Group Policy Management Console (GPMC). The following steps can be used to import an SCM-based policy into Active Directory: Open Group Policy Manager Console. Expand Forest | Domain | Domain Name | Group Policy Objects. Right-click on the appropriate policy. Select the Import Settings option. The Import Settings option Click on Next in the Welcome window. It is always a best practice to back up the existing settings. Click on Backup to continue with the backup operation. Once you have completed the backup, click on Next in the Backup GPO window. In the Backup Location window, select the backup location folder. Click on Next. Confirm the GPO name in the Source GPO window. Click on Next. You will see the scanning settings in the Scanning Backup window. Click on Next to continue. Click on Finish in the Completing the Import Settings Wizard window to complete the import operation. Click on OK in the Import window. Maintaining and monitoring the integrity of a baseline policy Once you have baseline security in place, whether it is a true business policy or a combination of business and industry practices, you will need to maintain this state to ensure the security and integrity. The whole idea is to compare your baseline image with the current image in order to validate the settings. There are many ways to achieve this. Microsoft has a free tool called Attack Surface Analyzer (ASA) that can be used to compare the two states of the system. The details and capabilities of this tool can found at http://www.microsoft.com/en-us/download/details.aspx?id=24487. Microsoft ASA An administrator can perform the following steps to install, configure, and generate an Attack Surface Report using Microsoft ASA: Download Attack Surface Analyzer from http://www.microsoft.com/en-us/download/details.aspx?id=24487. Complete the installation. It is a standalone, simple MSI installation process. Open the Attack Surface Analyzer tool. The first step is to create the baseline state. Select the Run New Scan option and enter a name for the CAB file. Click on Run Scan to start the scanning process. You will see the status and progress in the Collecting Data window. When it completes, it will create a CAB file with the result. The second step in this process is to analyze the baseline state against the existing server so as to identify the differences. You will need to create another report (Product CAB) to compare the CAB file with the baseline CAB. Select the Run New Scan option again and enter a name for the product CAB file. Click on Run Scan to start the scanning process. Complete the CAB creation process. The third step in the process is to compare the baseline CAB with the product CAB to get the delta. Select the Generate Standard Attack Surface Report option. In the Select Options section, select the baseline CAB name, select the product CAB name, and enter a name for the attack report. Click on Generate to start the process. You will see the status in the Running Analysis window. The report will be opened automatically in the web browser. This report has three sections: Report Summary, Security Issues, and Attack Surface. The following is an example of a Security Issues report Application control and management At this point, you have a baseline policy for your server platform. Now we can add more restrictions based on your requirements to provide a more secure environment. In the following section, my plan is to introduce an option to "blacklist" and "whitelist" some of the applications using a built-in native option called AppLocker. The details of the AppLocker application can be found at http://technet.microsoft.com/en-us/library/hh831409.aspx. AppLocker AppLocker polices are part of Application Control Policies in GPOs. There are four types of built-in rules: Executable, Windows Installable, Script, and Packed App rules. Before you create or enforce a policy, you need to perform an inventory check to identify the current usage of these applications in your environment. AppLocker has an inventory process called Auditing that helps you to achieve this. In this scenario, our goal is to block unauthorized access of the NLTEST application from all servers. Creating a policy As the first step, you need to identify the current usage of the application in your environment. The following steps can be performed to create a new AppLocker policy in an Active Directory environment: Open Group Policy Manager Console. Expand Forest | Domain | Domain Name. Right-click on the Group Policy Object node and select New. Enter a name for the GPO in the New GPO window. Leave Source Starter GPO as (none). Click on OK. This will create a new blank GPO in the Group Policy Object node. We will be using this GPO to configure the AppLocker settings. Right-click on the newly created GPO and select Edit. This will open the Group Policy Management Editor window. Expand Policies | Windows Settings | Security Settings | AppLocker. Right-click on Executable Polices and select Create Default Rules. These default rules allow users and built-in administrators to run default programs and administrators to run files and applications. Based on your requirements, you can modify and delete these rules. The default AppLocker rule allows everyone to run files located only in the Windows folder, and the administrator can run all files. The default AppLocker rule Expand Policies | Windows Settings | Security Settings | AppLocker. Right-click on Executable Polices and select Create New Rules. Click on Next in the Create Executable Rules window. In the Permission window, select Deny. In the User or Group section, click on Select and select the Server Admins group. Here, I have created a security group with all server administrators in that group. In the Conditions window, select the File Hash option. Click on Next. In the File Hash window, select the correct file name using the Browse File option. In this scenario, I will be selecting the NLTEST.exe file. Click on Next. In the Name and Description window, select or enter an appropriate name for this rule. Click on Create. Auditing a policy The next step in this process is to audit the previously created polices to ensure that there will not be any adverse effects to your environment. An administrator can perform the following steps to audit an existing policy in an Active Directory environment: Right-click on AppLocker (Policies | Windows Settings | Security Settings) and go to Properties. On the Enforcement tab, select appropriate rule types as Configured. From the drop-down list, select the rule as Audit only. Click on OK. GPO – AppLocker policy You can see the application usage and history in the Event log. Open Event Viewer. Navigate to Applications and Services Logs | Microsoft | Windows | AppLocker. Based on your policy configuration, you will see the appropriate event information in the AppLocker section. In an enterprise world, manually checking the items in an event log is not going to be a viable option. You have a few options available to automate this process. You can forward the event log to a central server (Event Forwarding) and verify from that single console, or you can use the Get-WinEvent PowerShell cmdlet to collect these events remotely. The following section provides an option to evaluate these logs using the Get-WinEvent PowerShell cmdlet. By default, AppLocker events are located in the Applications and Services Logs | Microsoft | Windows | AppLocker section of the Event Viewer. The Get-WinEvent -ComputerName "SERVER01.MYINFRALAB.COM" –LogName *AppLocker* | fl | out-file Server01.txt cmdlet filters all AppLocker-related events from Server01 and puts them in the output file Server01.txt. Here are some of the events that you will see in the event log: If you have multiple computers to evaluate, you can create a simple PowerShell script to automatically input the computer names. The following is a sample PowerShell script. The Servers.txt file will be your input file that contains all of the server names: $OutPut = "C:InputOutput.txt" Get-Content "C:InputServers.txt" | Foreach-Object { $_| out-file $OutPut -Append -Encoding ascii Get-WinEvent -ComputerName "Infralab01.MYINFRALAB.COM" –LogName *AppLocker* | fl | out-file $OutPut -Append -Encoding ascii } Implementing the policy Once you have verified the audit result, you can enforce the policy using the AppLockerGPO. The following steps can be used to implement the AppLocker GPO in an Active Directory environment: Open Group Policy Manager Console. Expand the Forest | Domain | Domain Name | Group Policy Object node. Right-click on the Server Application Restriction GPO and select Edit. This will open a Group Policy Management Editor MMC window. Opening the Group Policy Management Editor MMC window From Group Policy Management Editor, expand Policies | Windows Settings | Security Settings. Right-click on AppLocker and select Properties. In the AppLocker Properties window, change Executable rules to Enforce rules. Click on OK: Close the Group Policy Management Editor MMC window. The new policy will apply to the server based on your Active Directory replication interval and GPO refresh cycle. You can use the GPUPDATE/Force command to force the GPOon to a local server. Two different results are shown in the following screenshots. As you can see in the following screenshot, the user Johndoe was denied the execution of the NLTEST.exe application:   Since the following user was part of the Server Admins group, the user was allowed to execute the NLTEST.exe application:   Some additional security recommendations to consider when installing and configuring AppLocker are included at http://technet.microsoft.com/en-us/library/ee844118(WS.10).aspx. AppLocker and PowerShell AppLocker supports PowerShell, and it has a PowerShell module called AppLocker. An administrator can create, test, and troubleshoot the AppLocker policies using these cmdlets. You need to import the AppLocker module before these cmdlets can be used. The following are the supported cmdlets in the module: Summary We started this article with baseline security for your server platform, which was originally created using Microsoft SCW. In this article, you learned how to incorporate this policy with the baseline and best practice recommendations using MicrosoftSCM. Then you used AppLocker to enforce more application-based security. We also learned how to monitor the state of the server and compare it with the baseline to identify the security vulnerabilities and issues using Microsoft ASA. Resources for Article:  Further resources on this subject: Active Directory migration [article] Microsoft DAC 2012 [article] Insight into Hyper-V Storage [article]
Read more
  • 0
  • 0
  • 2075

article-image-working-vmware-infrastructure
Packt
04 Mar 2015
21 min read
Save for later

Working with VMware Infrastructure

Packt
04 Mar 2015
21 min read
In this article by Daniel Langenhan, the author of VMware vRealize Orchestrator Cookbook, we will take a closer look at how Orchestrator interacts with vCenter Server and vRealize Automation (vRA—formerly known as vCloud Automation Center, vCAC). vRA uses Orchestrator to access and automate infrastructure using Orchestrator plugins. We will take a look at how to make Orchestrator workflows available to vRA. We will investigate the following recipes: Unmounting all the CD-ROMs of all VMs in a cluster Provisioning a VM from a template An approval process for VM provisioning (For more resources related to this topic, see here.) There are quite a lot of plugins for Orchestrator to interact with VMware infrastructure and programs: vCenter Server vCloud Director (vCD) vRealize Automation (vRA—formally known as vCloud Automation Center, vCAC) Site Recovery Manager (SRM) VMware Auto Deploy Horizon (View and Virtual Desktops) vRealize Configuration Manager (earlier known as vCenter Configuration Manager) vCenter Update Manager vCenter Operation Manager, vCOPS (only example packages) VMware, as of writing of this article, is still renaming its products. An overview of all plugins and their names and download links can be found at http://www.vcoteam.info/links/plug-ins.html. There are quite a lot of plugins, and we will not be able to cover all of them, so we will focus on the one that is most used, vCenter. Sadly, vCloud Director is earmarked by VMware to disappear for everyone but service providers, so there is no real need to show any workflow for it. We will also work with vRA and see how it interacts with Orchestrator. vSphere automation The interaction between Orchestrator and vCenter is done using the vCenter API. Here is the explanation of the interaction, which you can refer to in the following figure. A user starts an Orchestrator workflow (1) either in an interactive way via the vSphere Web Client, the Orchestrator Web Operator, the Orchestrator Client, or via the API. The workflow in Orchestrator will then send a job (2) to vCenter and receive a task ID back (type VC:Task). vCenter will then start enacting the job (3). Using the vim3WaitTaskEnd action (4), Orchestrator pauses until the task has been completed. If we do not use the wait task, we can't be certain whether the task has ended or failed. It is extremely important to use the vim3WaitTaskEnd action whenever we send a job to vCenter. When the wait task reports that the job has finished, the workflow will be marked as finished. The vCenter MoRef The MoRef (Managed Object Reference) is a unique ID for every object inside vCenter. MoRefs are basically strings; some examples are shown here: VM Network Datastore ESXi host Data center Cluster vm-301 network-312 dvportgroup-242 datastore-101 host-44 data center-21 domain-c41 The MoRefs are typically stored in the attribute .id or .key of the Orchestrator API object. For example, the MoRef of a vSwitch Network is VC:Network.id. To browse for MoRefs, you can use the Managed Object Browser (MOB), documented at https://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.wssdk.pg.doc/PG_Appx_Using_MOB.20.1.html. The vim3WaitTaskEnd action As already said, vim3WaitTaskEnd is one of the most central actions while interacting with vCenter. The action has the following variables: Category Name Type Usage IN vcTask VC:Task Carries the reconfiguration task from the script to the wait task IN progress Boolean Write to the logs the progress of a task in percentage IN pollRate Number How often the action should be checked for task completion in vCenter OUT ActionResult Any Returns the task's result The wait task will check in regular intervals (pollRate) the status of a task that has been submitted to vCenter. The task can have the following states: State Meaning Queued The task is queued and will be executed as soon as possible. Running The task is currently running. If the progress is set to true, the progress in percentage will be displayed in the logs. Success The task is finished successfully. Error The task has failed and an error will be thrown. Other vCenter wait actions There are actually five waiting tasks that come with the vCenter Server plugin. Here's an overview of the other four: Task Description vim3WaitToolsStarted This task waits until the VMware tools are started on a VM or until a timeout is reached. Vim3WaitForPrincipalIP This task waits until the VMware tools report the primary IP of a VM or until a timeout is reached. This typically indicates that the operating system is ready to receive network traffic. The action will return the primary IP. Vim3WaitDnsNameInTools This task waits until the VMware tools report a given DNS name of a VM or until a timeout is reached. The in-parameter addNumberToName is not used and can be set to Null. WaitTaskEndOrVMQuestion This task waits until a task is finished or if a VM develops a question. A vCenter question is related to user interaction. vRealize Automation (vRA) Automation has changed since the beginning of Orchestrator. Before, tools such as vCloud Director or vCloud Automation Center (vCAC)/vRealize Automation (vRA), Orchestrator was the main tool for automating vCenter resources. With version 6.2 of vCloud Automation Center (vCAC), the product has been renamed vRealize Automation. Now vRA is deemed to become the central cornerstone in the VMware automation effort. vRealize Orchestrator (vRO), is used by vRA to interact with and automate VMware and non-VMware products and infrastructure elements. Throughout the various vCAC/vRA interactions, the role of Orchestrator has changed substantially. Orchestrator started off as an extension to vCAC and became a central part of vRA. In vCAC 5.x, Orchestrator was only an extension of the IaaS life cycle. Orchestrator was tied in using the stubs vCAC 6.0 integrated Orchestrator as an XaaS service (Everything as a Service) using the Advanced Service Designer (ASD) In vCAC 6.1, Orchestrator is used to perform all VMware NSX operations (VMware's new network virtualization and automation), meaning that it became even more of a central part of the IaaS services. With vCAC 6.2, the Advance Service Designer (ASD) was enhanced to allow more complex form of designs, allowing better leverage of Orchestrator workflows. As you can see in the following figure, vRA connects to the vCenter Server using an infrastructure endpoint that allows vRA to conduct basic infrastructure actions, such as power operations, cloning, and so on. It doesn't allow any complex interactions with the vSphere infrastructure, such as HA configurations. Using the Advanced Service Endpoints, vRA integrates the Orchestrator (vRO) plugins as additional services. This allows vRA to offer the entire plugin infrastructure as services to vRA. The vCenter Server, AD, and PowerShell plugins are typical integrations that are used with vRA. Using Advance Service Designer (ASD), you can create integrations that use Orchestrator workflows. ASD allows you to offer Orchestrator workflows as vRA catalog items, making it possible for tenants to access any IT service that can be configured with Orchestrator via its plugins. The following diagram shows an example using the Active Directory plugin. The Orchestrator Plugin provides access to the AD services. By creating a custom resource using the exposed AD infrastructure, we can create a service blueprint and resource actions, both of which are based on Orchestrator workflows that use the AD plugin. The other method of integrating Orchestrator into the IaaS life cycle, which was predominately used in vCAC 5.x was to use the stubs. The build process of a VM has several steps; each step can be assigned a customizable workflow (called a stub). You can configure vRA to run an Orchestrator workflow at these stubs in order to facilitate a few customized actions. Such actions could be taken to change the VMs HA or DRS configuration, or to use the guest integration to install or configure a program on a VM. Installation How to install and configure vRA is out of the scope of this article, but take a look at http://www.kendrickcoleman.com/index.php/Tech-Blog/how-to-install-vcloud-automation-center-vcac-60-part-1-identity-appliance.html for more information. If you don't have the hardware or the time to install vRA yourself, you can use the VMware Hands-on Labs, which can be accessed after clicking on Try for Free at http://hol.vmware.com. The vRA Orchestrator plugin Due to the renaming, the vRA plugin is called vRealize Orchestrator vRA Plug-in 6.2.0, however the file you download and use is named o11nplugin-vcac-6.2.0-2287231.vmoapp. The plugin currently creates a workflow folder called vCloud Automation Center. vRA-integrated Orchestrator The vRA appliance comes with an installed and configured vRO instance; however, the best practice for a production environment is to use a dedicated Orchestrator installation, even better would be an Orchestrator cluster. Dynamic Types or XaaS XaaS means Everything (X) as a Service. The introduction of Dynamic Types in Orchestrator Version 5.5.1 does exactly that; it allows you to build your own plugins and interact with infrastructure that has not yet received its own plugin. Take a look at this article by Christophe Decanini; it integrates Twitter with Orchestrator using Dynamic Types at http://www.vcoteam.info/articles/learn-vco/282-dynamic-types-tutorial-implement-your-own-twitter-plug-in-without-any-scripting.html. Read more… To read more about Orchestrator integration with vRA, please take a look at the official VMware documentation. Please note that the official documentation you need to look at is about vRealize Automation, and not about vCloud Automation Center, but, as of writing this article, the documentation can be found at https://www.vmware.com/support/pubs/vrealize-automation-pubs.html. The document called Advanced Service Design deals with vRO and Advanced Service Designer The document called Machine Extensibility discusses customization using subs Unmounting all the CD-ROMs of all VMs in a cluster This is an easy recipe to start with, but one you can really make it work for your existing infrastructure. The workflow will unmount all CD-ROMs from a running VM. A mounted CD-ROM may block a VM from being vMotioned. Getting ready We need a VM that can mount a CD-ROM either as an ISO from a host or from the client. Before you start the workflow, make sure that the VM is powered on and has an ISO connected to it. How to do it... Create a new workflow with the following variables: Name Type Section Use cluster VC:ClusterComputerResource IN Used to input the cluster clusterVMs Array of VC:VirtualMachine Attribute Use to capture all VMs in a cluster Add the getAllVMsOfCluster action to the schema and assign the cluster in-parameter and the clusterVMs attribute to it as actionResult. Now, add a Foreach element to the schema and assign the workflow Disconnect all detachable devices from a running virtual machine. Assign the Foreach element clusterVMs as a parameter. Save and run the workflow. How it works... This recipe shows how fast and easily you can design solutions that help you with everyday vCenter problems. The problem is that VMs that have CD-ROMs or floppies mounted may experience problems using vMotion, making it impossible for them to be used with DRS. The reality is that a lot of admins mount CD-ROMs and then forget to disconnect them. Scheduling this script every evening just before the nighttime backups will make sure that a production cluster is able to make full use of DRS and is therefore better load-balanced. You can improve this workflow by integrating an exclusion list. See also Refer to the example workflow, 7.01 UnMount CD-ROM from Cluster. Provisioning a VM from a template In this recipe, we will build a deployment workflow for Windows and Linux VMs. We will learn how to create workflows and reduce the amount of input variables. Getting ready We need a Linux or Windows template that we can clone and provision. How to do it… We have split this recipe in two sections. In the first section, we will create a configuration element, and in the second, we will create the workflow. Creating a configuration We will use a configuration for all reusable variables. Build a configuration element that contains the following items: Name Type Use productId String This is the Windows product ID—the licensing code joinDomain String This is the Windows domain FQDN to join domainAdmin Credential These are the credentials to join the domain licenseMode VC:CustomizationLicenseDataMode Example, perServer licenseUsers Number This denotes the number of licensed concurrent users inTimezone Enums:MSTimeZone Time zone fullName String Full name of the user orgName String Organization name newAdminPassword String New admin password dnsServerList Array of String List of DNS servers dnsDomain String DNS domain gateway Array of String List of gateways Creating the base workflow Now we will create the base workflow: Create the workflow as shown in the following figure by adding the given elements:      Clone, Windows with single NIC and credential      Clone, Linux with single NIC      Custom decision Use the Clone, Windows… workflow to create all variables. Link up the ones that you have defined in the configuration as attributes. The rest are defined as follows: Name Type Section Use vmName String IN This is the new virtual machine's name vm VC:VirtualMachine IN Virtual machine to clone folder VC:VmFolder IN This is the virtual machine folder datastore VC:Datastore IN This is the datastore in which you store the virtual machine pool VC:ResourcePool IN This is the resource pool in which you create the virtual machine network VC:Network IN This is the network to which you attach the virtual network interface ipAddress String IN This is the fixed valid IP address subnetMask String IN This is the subnet mask template Boolean Attribute For value No, mark new VM as template powerOn Boolean Attribute For value Yes, power on the VM after creation doSysprep Boolean Attribute For value Yes, run Windows Sysprep dhcp Boolean Attribute For value No, use DHCP newVM VC:VirtualMachine OUT This is the newly-created VM The following sub-workflow in-parameters will be set to special values: Workflow In-parameter value Clone, Windows with single NIC and credential host Null joinWorkgroup Null macAddress Null netBIOS Null primaryWINS Null secondaryWINS Null name vmName clientName vmName Clone, Linux with single NIC host Null macAddress Null name vmName clientName vmName Define the in-parameter VM as input for the Custom decision and add the following script. The script will check whether the name of the OS contains the word Microsoft: guestOS=vm.config.guestFullName; System.log(guestOS);if (guestOS.indexOf("Microsoft") >=0){return true;} else {return false} Save and run the workflow. This workflow will now create a new VM from an existing VM and customize it with a fixed IP. How it works… As you can see, creating workflows to automate vCenter deployments is pretty straightforward. Dealing with the various in-parameters of workflows can be quite overwhelming. The best way to deal with this problem is to hide away variables by defining them centrally using a configuration, or define them locally as attributes. Using configurations has the advantage that you can create them once and reuse them as needed. You can even push the concept a bit further by defining multiple configurations for multiple purposes, such as different environments. While creating a new workflow for automation, a typical approach is as follows: Look for a workflow that you need. Run the workflow normally to check out what it actually does. Either create a new workflow that uses the original or duplicate and edit the one you tried, modifying it until it does what you want. A fast way to deal with a lot of variables is to drag every element you need into the schema and then use the binding to create the variables as needed. You may have noticed that this workflow only lets you select vSwitch networks, not distributed vSwitch networks. You can improve this workflow with the following features: Read the existing Sysprep information stored in your vCenter Server Generate different predefined configurations (for example DEV or Prod) There's more... We can improve the workflow by implementing the ability to change the vCPU and the memory of the VM. Follow these steps to implement it: Move the out-parameter newVM to be an attribute. Add the following variables: Name Type Section Use vCPU Number IN This variable denotes the amount of vCPUs Memory Number IN This variable denotes the amount of VM memory vcTask VC:Task Attribute This variable will carry the reconfiguration task from the script to the wait task progress Boolean Attribute Value NO, vim3WaitTaskEnd pollRate Number Attribute Value 5, vim3WaitTaskEnd ActionResult Any Attribute vim3WaitTaskEnd Add the following actions and workflows according to the next figure:      shutdownVMAndForce      changeVMvCPU      vim3WaitTaskEnd      changeVMRAM      Start virtual machine Bind newVM to all the appropriate input parameters of the added actions and workflows. Bind actionResults (VC:tasks) of the change actions to vim3WaitTasks. See also Refer to the example workflows, 7.02.1 Provision VM (Base), 7.02.2 Provision VM (HW custom), as well as the configuration element, 7 VM provisioning. An approval process for VM provisioning In this recipe, we will see how to create a workflow that waits for an approver to approve the VM creation before provisioning it. We will learn how to combine mail and external events in a workflow to make it interact with different users. Getting ready For this recipe, we first need the provisioning workflow that we have created in the Provisioning a VM from a template recipe. You can use the example workflow, 7.02.1 Provision VM (Base). Additionally, we need a functional e-mail system as well as a workflow to send e-mails. You can use the example workflow, 4.02.1 SendMail as well as its configuration item, 4.2.1 Working with e-mail. How to do it… We will split this recipe in three parts. First, we will create a configuration element then, we will create the workflow, and lastly, we will use a presentation to make the workflow usable. Creating a configuration element We will use a configuration for all reusable variables. Build a configuration element that contains the following items: Name Type Use templates Array/VC:VirtualMachine This contains all the VMs that serve as templates folders Array/VC:VmFolder This contains all the VM folders that are targets for VM provisioning networks Array/VC:Network This contains all VM networks that are targets for VM provisioning resourcePools Array/VC:ResourcePool This contains all resource pools that are targets for VM provisioning datastores Array/VC:Datastore This contains all datastores that are targets for VM provisioning daysToApproval Number These are the number of days the approval should be available for approver String This is the e-mail of the approver Please note that you also have to define or use the configuration elements for SendMail, as well as the Provision VM workflows. You can use the examples contained in the example package. Creating a workflow Create a new workflow and add the following variables: Name Type Section Use mailRequester String IN This is the e-mail address of the requester vmName String IN This is the name of the new virtual machine vm VC:VirtualMachine IN This is the virtual machine to be cloned folder VC:VmFolder IN This is the virtual machine folder datastore VC:Datastore IN This is the datastore in which you store the virtual machine pool VC:ResourcePool IN This is the resource pool in which you create the virtual machine network VC:Network IN This is the network to which you attach the virtual network interface ipAddress String IN This is the fixed valid IP address subnetMask String IN This is the subnet mask isExternalEvent Boolean Attribute A value of true defines this event as external mailApproverSubject String Attribute This is the subject line of the mail sent to the approver mailApproverContent String Attribute This is the content of the mail that is sent to the approver mailRequesterSubject String Attribute This is the subject line of the mail sent to the requester when the VM is provisioned mailRequesterContent String Attribute This is the content of the mail that is sent to the requester when the VM is provisioned mailRequesterDeclinedSubject String Attribute This is the subject line of the mail sent to the requester when the VM is declined mailRequesterDeclinedContent String Attribute This is the content of the mail that is sent to the requester when the VM is declined eventName String Attribute This is the name of the external event endDate Date Attribute This is the end date for the wait of external event approvalSuccess Boolean Attribute This checks whether the VM has been approved Now add all the attributes we defined in the configuration element and link them to the configuration. Create the workflow as shown in the following figure by adding the given elements:      Scriptable task      4.02.1 SendMail (example workflow)       Wait for custom event       Decision       Provision VM (example workflow) Edit the scriptable task and bind the following variables to it: In Out vmName ipAddress mailRequester template approver days to approval mailApproverSubject mailApproverContent mailRequesterSubject mailRequesterContent mailRequesterDeclinedSubject mailRequesterDeclinedContent eventName endDate Add the following script to the scriptable task: //construct event name eventName="provision-"+vmName; //add days to today for approval var today = new Date(); var endDate = new Date(today); endDate.setDate(today.getDate()+daysToApproval); //construct external URL for approval var myURL = new URL() ; myURL=System.customEventUrl(eventName, false); externalURL=myURL.url; //mail to approver mailApproverSubject="Approval needed: "+vmName; mailApproverContent="Dear Approver,n the user "+mailRequester+" would like to provision a VM from template "+template.name+".n To approve please click here: "+externalURL; //VM provisioned mailRequesterSubject="VM ready :"+vmName; mailRequesterContent="Dear Requester,n the VM "+vmName+" has been provisioned and is now available under IP :"+ipAddress; //declined mailRequesterDeclinedSubject="Declined :"+vmName; mailRequesterDeclinedContent="Dear Requester,n the VM "+vmName+" has been declined by "+approver; Bind the out-parameter of Wait for customer event to approvalSuccess. Configure the Decision element with approvalSuccess as true. Bind all the other variables to the workflow elements. Improving with the presentation We will now edit the workflow's presentation in order to make it workable for the requester. To do so, follow the given steps: Click on Presentation and follow the steps to alter the presentation, as seen in the following screenshot: Add the following properties to the in-parameters: In-parameter Property Value template Predefined list of elements #templates folder Predefined list of elements #folders datastore Predefined list of elements #datastores pool Predefined list of elements #resourcePools network Predefined list of elements #networks You can now use the General tab of each in-parameter to change the displayed text. Save and close the workflow. How it works… This is a very simplified example of an approval workflow to create VMs. The aim of this recipe is to introduce you to the method and ideas of how to build such a workflow. This workflow will only give a requester the choices that are configured in the configuration element, making the workflow quite safe for users that have only limited knowhow of the IT environment. When the requester submits the workflow, an e-mail is sent to the approver. The e-mail contains a link, which when clicked, triggers the external event and approves the VM. If the VM is approved it will get provisioned, and when the provisioning has finished an e-mail is sent to the requester stating that the VM is now available. If the VM is not approved within a certain timeframe, the requester will receive an e-mail that the VM was not approved. To make this workflow fully functional, you can add permissions for a requester group to the workflow and Orchestrator so that the user can use the vCenter to request a VM. Things you can do to improve the workflow are as follows: Schedule the provisioning to a future date. Use the resources for the e-mail and replace the content. Add an error workflow in case the provisioning fails. Use AD to read out the current user's e-mail and full name to improve the workflow. Create a workflow that lets an approver configure the configuration elements that a requester can chose from. Reduce the selections by creating, for instance, a development and production configuration that contains the correct folders, datastores, networks, and so on. Create a decommissioning workflow that is automatically scheduled so that the VM is destroyed automatically after a given period of time. See also Refer to the example workflow, 7.03 Approval and the configuration element, 7 approval. Summary In this article, we discussed one of the important aspects of the interaction of Orchestrator with vCenter Server and vRealize Automation, that is VM provisioning. Resources for Article: Further resources on this subject: Importance of Windows RDS in Horizon View [article] Metrics in vRealize Operations [article] Designing and Building a Horizon View 6.0 Infrastructure [article]
Read more
  • 0
  • 0
  • 13128
Modal Close icon
Modal Close icon