Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-network-and-data-management-containers
Packt
10 Jun 2015
14 min read
Save for later

Network and Data Management for Containers

Packt
10 Jun 2015
14 min read
In this article by Neependra Khare author of the book Docker Cookbook, when the Docker daemon starts, it creates a virtual Ethernet bridge with the name docker0. For example, we will see the following with the ip addr command on the system that runs the Docker daemon: (For more resources related to this topic, see here.) As we can see, docker0 has the IP address 172.17.42.1/16. Docker randomly chooses an address and subnet from a private range defined in RFC 1918 (https://tools.ietf.org/html/rfc1918). Using this bridged interface, containers can communicate with each other and with the host system. By default, every time Docker starts a container, it creates a pair of virtual interfaces, one end of which is attached to the host system and other end to the created container. Let's start a container and see what happens: The end that is attached to the eth0 interface of the container gets the 172.17.0.1/16 IP address. We also see the following entry for the other end of the interface on the host system: Now, let's create a few more containers and look at the docker0 bridge with the brctl command, which manages Ethernet bridges: Every veth* binds to the docker0 bridge, which creates a virtual subnet shared between the host and every Docker container. Apart from setting up the docker0 bridge, Docker creates IPtables NAT rules, such that all containers can talk to the external world by default but not the other way around. Let's look at the NAT rules on the Docker host: If we try to connect to the external world from a container, we will have to go through the Docker bridge that was created by default: When starting a container, we have a few modes to select its networking: --net=bridge: This is the default mode that we just saw. So, the preceding command that we used to start the container can be written as follows: $ docker run -i -t --net=bridge centos /bin/bash --net=host: With this option, Docker does not create a network namespace for the container; instead, the container will network stack with the host. So, we can start the container with this option as follows: $ docker run -i -t --net=host centos bash We can then run the ip addr command within the container as seen here: We can see all the network devices attached to the host. An example of using such a configuration is to run the nginx reverse proxy within a container to serve the web applications running on the host. --net=container:NAME_or_ID: With this option, Docker does not create a new network namespace while starting the container but shares it from another container. Let's start the first container and look for its IP address: $ docker run -i -t --name=centos centos bash Now start another as follows: $ docker run -i -t --net=container:centos ubuntu bash As we can see, both containers contain the same IP address. Containers in a Kubernetes (http://kubernetes.io/) Pod use this trick to connect with each other. --net=none: With this option, Docker creates the network namespace inside the container but does not configure networking. For more information about the different networking, visit https://docs.docker.com/articles/networking/#how-docker-networks-a-container. From Docker 1.2 onwards, it is also possible to change /etc/host, /etc/hostname, and /etc/resolv.conf on a running container. However, note that these are just used to run a container. If it restarts, we will have to make the changes again. So far, we have looked at networking on a single host, but in the real world, we would like to connect multiple hosts and have a container from one host to talk to a container from another host. Flannel (https://github.com/coreos/flannel), Weave (https://github.com/weaveworks/weave), Calio (http://www.projectcalico.org/getting-started/docker/), and Socketplane (http://socketplane.io/) are some solutions that offer this functionality Socketplane joined Docker Inc in March '15. Community and Docker are building a Container Network Model (CNM) with libnetwork (https://github.com/docker/libnetwork), which provides a native Go implementation to connect containers. More information on this development can be found at http://blog.docker.com/2015/04/docker-networking-takes-a-step-in-the-right-direction-2/. Accessing containers from outside Once the container is up, we would like to access it from outside. If you have started the container with the --net=host option, then it can be accessed through the Docker host IP. With --net=none, you can attach the network interface from the public end or through other complex settings. Let's see what happens in by default—where packets are forwarded from the host network interface to the container. Getting ready Make sure the Docker daemon is running on the host and you can connect through the Docker client. How to do it… Let's start a container with the -P option: $ docker run --expose 80 -i -d -P --name f20 fedora /bin/bash This automatically maps any network port of the container to a random high port of the Docker host between 49000 to 49900. In the PORTS section, we see 0.0.0.0:49159->80/tcp, which is of the following form: <Host Interface>:<Host Port> -> <Container Interface>/<protocol> So, in case any request comes on port 49159 from any interface on the Docker host, the request will be forwarded to port 80 of the centos1 container. We can also map a specific port of the container to the specific port of the host using the -p option: $ docker run -i -d -p 5000:22 --name centos2 centos /bin/bash In this case, all requests coming on port 5000 from any interface on the Docker host will be forwarded to port 22 of the centos2 container. How it works… With the default configuration, Docker sets up the firewall rule to forward the connection from the host to the container and enables IP forwarding on the Docker host: As we can see from the preceding example, a DNAT rule has been set up to forward all traffic on port 5000 of the host to port 22 of the container. There's more… By default, with the -p option, Docker will forward all the requests coming to any interface to the host. To bind to a specific interface, we can specify something like the following: $ docker run -i -d -p 192.168.1.10:5000:22 --name f20 fedora /bin/bash In this case, only requests coming to port 5000 on the interface that has the IP 192.168.1.10 on the Docker host will be forwarded to port 22 of the f20 container. To map port 22 of the container to the dynamic port of the host, we can run following command: $ docker run -i -d -p 192.168.1.10::22 --name f20 fedora /bin/bash We can bind multiple ports on containers to ports on hosts as follows: $ docker run -d -i -p 5000:22 -p 8080:80 --name f20 fedora /bin/bash We can look up the public-facing port that is mapped to the container's port as follows: $ docker port f20 80 0.0.0.0:8080 To look at all the network settings of a container, we can run the following command: $ docker inspect   -f "{{ .NetworkSettings }}" f20 See also Networking documentation on the Docker website at https://docs.docker.com/articles/networking/. Managing data in containers Any uncommitted data or changes in containers get lost as soon as containers are deleted. For example, if you have configured the Docker registry in a container and pushed some images, as soon as the registry container is deleted, all of those images will get lost if you have not committed them. Even if you commit, it is not the best practice. We should try to keep containers as light as possible. The following are two primary ways to manage data with Docker: Data volumes: From the Docker documentation (https://docs.docker.com/userguide/dockervolumes/), a data volume is a specially-designated directory within one or more containers that bypasses the Union filesystem to provide several useful features for persistent or shared data: Volumes are initialized when a container is created. If the container's base image contains data at the specified mount point, that data is copied into the new volume. Data volumes can be shared and reused between containers. Changes to a data volume are made directly. Changes to a data volume will not be included when you update an image. Volumes persist until no containers use them. Data volume containers: As a volume persists until no container uses it, we can use the volume to share persistent data between containers. So, we can create a named volume container and mount the data to another container. Getting ready Make sure that the Docker daemon is running on the host and you can connect through the Docker client. How to do it... Add a data volume. With the -v option with the docker run command, we add a data volume to the container: $ docker run -t -d -P -v /data --name f20 fedora /bin/bash We can have multiple data volumes within a container, which can be created by adding -v multiple times: $ docker run -t -d -P -v /data -v /logs --name f20 fedora /bin/bash The VOLUME instruction can be used in a Dockerfile to add data volume as well by adding something similar to VOLUME ["/data"]. We can use the inspect command to look at the data volume details of a container: $ docker inspect -f "{{ .Config.Volumes }}" f20 $ docker inspect -f "{{ .Volumes }}" f20 If the target directory is not there within the container, it will be created. Next, we mount a host directory as a data volume. We can also map a host directory to a data volume with the -v option: $ docker run -i -t -v /source_on_host:/destination_on_container fedora /bin/bash Consider the following example: $ docker run -i -t -v /srv:/mnt/code fedora /bin/bash This can be very useful in cases such as testing code in different environments, collecting logs in central locations, and so on. We can also map the host directory in read-only mode as follows: $ docker run -i -t -v /srv:/mnt/code:ro fedora /bin/bash We can also mount the entire root filesystem of the host within the container with the following command: $ docker run -i -t -v /:/host:ro fedora /bin/bash If the directory on the host (/srv) does not exist, then it will be created, given that you have permission to create one. Also, on the Docker host where SELinux is enabled and if the Docker daemon is configured to use SELinux (docker -d --selinux-enabled), you will see the permission denied error if you try to access files on mounted volumes until you relabel them. To relabel them, use either of the following commands: $ docker run -i -t -v /srv:/mnt/code:z fedora /bin/bash $ docker run -i -t -v /srv:/mnt/code:Z fedora /bin/bash Now, create a data volume container. While sharing the host directory to a container through volume, we are binding the container to a given host, which is not good. Also, the storage in this case is not controlled by Docker. So, in cases when we want data to be persisted even if we update the containers, we can get help from data volume containers. Data volume containers are used to create a volume and nothing else; they do not even run. As the created volume is attached to a container (not running), it cannot be deleted. For example, here's a named data container: $ docker run -d -v /data --name data fedora echo "data volume container" This will just create a volume that will be mapped to a directory managed by Docker. Now, other containers can mount the volume from the data container using the --volumes-from option as follows: $ docker run -d -i -t --volumes-from data --name client1 fedora /bin/bash We can mount a volume from the data volume container to multiple containers: $ docker run -d -i -t --volumes-from data --name client2 fedora /bin/bash We can also use --volumes-from multiple times to get the data volumes from multiple containers. We can also create a chain by mounting volumes from the container that mounts from some other container. How it works… In case of data volume, when the host directory is not shared, Docker creates a directory within /var/lib/docker/ and then shares it with other containers. There's more… Volumes are deleted with -v flag to docker rm, only if no other container is using it. If some other container is using the volume, then the container will be removed (with docker rm) but the volume will not be removed. The Docker registry, which by default starts with the dev flavor. In this registry, uploaded images were saved in the /tmp/registry folder within the container we started. We can mount a directory from the host at /tmp/registry within the registry container, so whenever we upload an image, it will be saved on the host that is running the Docker registry. So, to start the container, we run the following command: $ docker run -v /srv:/tmp/registry -p 5000:5000 registry To push an image, we run the following command: $ docker push registry-host:5000/nkhare/f20 After the image is successfully pushed, we can look at the content of the directory that we mounted within the Docker registry. In our case, we should see a directory structure as follows: /srv/ ├── images │ ├── 3f2fed40e4b0941403cd928b6b94e0fd236dfc54656c00e456747093d10157ac │ │ ├── ancestry │ │ ├── _checksum │ │ ├── json │ │ └── layer │ ├── 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158 │ │ ├── ancestry │ │ ├── _checksum │ │ ├── json │ │ └── layer │ ├── 53263a18c28e1e54a8d7666cb835e9fa6a4b7b17385d46a7afe55bc5a7c1994c │ │ ├── ancestry │ │ ├── _checksum │ │ ├── json │ │ └── layer │ └── fd241224e9cf32f33a7332346a4f2ea39c4d5087b76392c1ac5490bf2ec55b68 │ ├── ancestry │ ├── _checksum │ ├── json │ └── layer ├── repositories │ └── nkhare │ └── f20 │ ├── _index_images │ ├── json │ ├── tag_latest │ └── taglatest_json See also The documentation on the Docker website at https://docs.docker.com/userguide/dockervolumes/ http://container42.com/2013/12/16/persistent-volumes-with-docker-container-as-volume-pattern/ http://container42.com/2014/11/03/docker-indepth-volumes/ Linking two or more containers With containerization, we would like to create our stack by running services on different containers and then linking them together. However, we can also put them in different containers and link them together. Container linking creates a parent-child relationship between them, in which the parent can see selected information of its children. Linking relies on the naming of containers. Getting ready Make sure the Docker daemon is running on the host and you can connect through the Docker client. How to do it… Create a named container called centos_server: $ docker run -d -i -t --name centos_server centos /bin/bash Now, let's start another container with the name client and link it with the centos_server container using the --link option, which takes the name:alias argument. Then look at the /etc/hosts file: $ docker run -i -t --link centos_server:server --name client fedora /bin/bash How it works… In the preceding example, we linked the centos_server container to the client container with an alias server. By linking the two containers, an entry of the first container, which is centos_server in this case, is added to the /etc/hosts file in the client container. Also, an environment variable called SERVER_NAME is set within the client to refer to the server. There's more… Now, let's create a mysql container: $ docker run --name mysql -e MYSQL_ROOT_PASSWORD=mysecretpassword -d mysql Then, let's link it from a client and check the environment variables: $ docker run -i -t --link mysql:mysql-server --name client fedora /bin/bash Also, let's look at the docker ps output: If you look closely, we did not specify the -P or -p options to map ports between two containers while starting the client container. Depending on the ports exposed by a container, Docker creates an internal secure tunnel in the containers that links to it. And, to do that, Docker sets environment variables within the linker container. In the preceding case, mysql is the linked container and client is the linker container. As the mysql container exposes port 3306, we see corresponding environment variables (MYSQL_SERVER_*) within the client container. As linking depends on the name of the container, if you want to reuse a name, you must delete the old container. See also Documentation on the Docker website at https://docs.docker.com/userguide/dockerlinks/ Summary In this article, we learned how to connect a container with another container, in the external world. We also learned how we can share external storage from other containers and the host system. Resources for Article: Further resources on this subject: Giving Containers Data and Parameters [article] Creating your infrastructure using Chef Provisioning [article] Unboxing Docker [article]
Read more
  • 0
  • 0
  • 2049

article-image-integrating-quick-and-benefits-behavior-driven-development-part-2
Benjamin Reed
08 Jun 2015
9 min read
Save for later

Integrating Quick and the Benefits of Behavior Driven Development (Part 2)

Benjamin Reed
08 Jun 2015
9 min read
To continue the discussion about sufficient testing on the iOS platform, I think it would be best to break apart a simple application and test from the ground up. Due to copyright laws, I put together a simple calculator for time. It’s called TimeMath, and it is by no means finished. I’ve included all the visual assets and source code for the project. The goal is that readers can follow along with this tutorial. The Disclaimer Before we begin, I must note that this application was simply made for the person of demonstrating proper testing. While it has the majority of its main functionality implemented, it jokingly asks you for “all your money” to enable the features which cease to exist. There aren’t any NSLayoutContraints, so there are no guarantees as to how it looks on any simulator device besides the iPhone 6 Plus. By doing this, there are no warnings at and before compile time. Also, if a reader would like to make some changes in the simulator, they won’t have to worry about resetting constraints. That is something that would definitely need to be covered in a separate post. There are many types of tests in the software world. Unfortunately, there is not enough time to sufficiently cover all of the material. Unit testing is the fundamental building block that was covered in the first part of this series. Automated UI tests are very powerful, because these allow a developer to test direct interactions with the user interface. If every potential interaction is recorded and performed whenever tests are run, UI issues are likely to be caught early and often. However, there are some unfortunate coincidences. The most popular frameworks in which tests are composed in Objective-C and Swift use undocumented Apple APIs. If these tests are not removed from the bundle before the app is submitted, Apple will reject it. When it comes to Apple’s solution, it doesn’t utilize their language (it’s JavaScript), and it revolves around the Instruments application. For these reasons, I have chosen to solely focus on unit tests. In the previous post, a comparison was given between testing for web development and iOS development. In many cases, web developers utilize automated UI tests. For instance, Capybara is a popular automated UI testing option in the Ruby world. This is definitely an area where the iOS community could improve. However, the provided information should be reusable and adaptable when it comes to any modern iOS project. The Map of the App This app is for those moments when a user cannot remember time arithmetic. It is designed to look and behave similar to the factory-installed calculator app. It allows for simple calculations between hours and minutes. As you can imagine, it is remarkably simple. There are two integer arrays, heap1 and heap2. They deal with four integers each. This should make up the combinations of minutes and hours. When an operator is selected and the equivalence button is tapped, these integers are converted into an integer representation of the minutes. The operation is performed, and the hour portion of the solution is found by dividing the result by 60. The remainder of this division serves as the minute portion. In order to keep it simple, seconds, milliseconds, and beyond are not supported! There was a challenge when it came to entering the time. Whenever an operator or equivalence button was tapped, all of the remaining (unassigned) elements in the array need to be set to zero. In the code this is done twice, during changes to the labels and during the final computation. This could be a problematic area. If the zeros aren’t added appropriately, the entire solution is wrong. This will be extensively tested below. The Class and Ignore “Rules” One of the best practices is to have a test class (also referred to as a ‘spec’) for each class in your code. It is generally good to have at least one test for each method; however, we’ll discuss when this can become redundant. There shouldn’t be any exceptions to the “rule.” Even a thorough implementation of a stack, queue, list, and tree could be tested. After all, these data structures must follow the strict definitions in order for ideas to accurately flow from the library’s architect to the developer. When it comes to iOS, there can be classes for models, views, and controllers. Generally, all of these should be tested as well. In TimeMath (excluding the TimeMathTests group), there are three major classes: AppDelegate, ViewController, PrettyButton. To begin, we are not going to test the AppDelegate. I can honestly say that I have never tested it in my life. There are some apps designed to run in the background, and they need to persist data between states. However, the background behaviors and data persistence tasks often belong in their own classes. Next, we need to test the ViewController class. There is definitely a lot covered in this class, so ViewControllerSpec will become our primary focus. Finally, we will avoid testing the PrettyButton class. The class’ only potential for unit tests lies in making sure the appropriate backgroundColor is set based on the style property. However, this would just be an equivalence expectation for the color. When it comes to testing, I believe, the “ignore rule” is an equally important practice. Everything has the potential to be tested. However, good software engineers know how to find adequate ways to cover their classes without testing each possible, redundant combination. In this example, say I wanted to test that every time which could be entered is displayed appropriately. Based on the 10 digits, which are the possibilities, and 4 allocated spaces, I would need to write 10,000 tests! Now, all engineers can reach a consensus that this is not a good practice. Similar to the concept of proof in mathematics, one does not attempt to show every possible combination to prove a conjecture. The same should apply to unit testing. Likewise, one does not “re-invent the wheel” by re-proving every theorem that led to their conjecture. In software engineering terms, you should only test your code. Don’t bother testing Apple’s API or frameworks that you have absolutely no control over. That simply adds to work with an unnoticeable benefit. Testing the ViewController While it may be common sense in this scenario, an engineer would have to use this same logic to deduce which tests would be included in the ViewControllerSpec. For instance, each numeric button tapped does not need a separate test (despite being an individual method). These are simply event handlers, and each one calls the exact same method: addNumericToHeaps(...). Since this is the case, it makes sense to only test that method. The addNumericToHeaps(...) method is responsible for adding the number to the either heap1 or heap2, and then it relies on the setLabels(...) method to set the display. Our tests may look something like this: it("should add a number to heap1") { // 01:00 vc.tapEvent_1() vc.tapEvent_0() vc.tapEvent_0() expect(vc.lab_focused.text).to(equal("01:00")) } it("should add and display a number for heap2 when operator tapped") { // 00:01 vc.tapEvent_1() vc.tapEvent_ADD() // 02:00 vc.tapEvent_2() vc.tapEvent_0() vc.tapEvent_0() expect(vc.lab_focused.text).to(equal("02:00")) } it("should display heap1's number in tiny label when heap2 active") { // 00:01 vc.tapEvent_1() vc.tapEvent_ADD() // 02:00 vc.tapEvent_2() vc.tapEvent_0() vc.tapEvent_0() expect(vc.lab_unfocused.text).to(equal("00:01")) } Now, we must test the composition(...) method! This method assumes unclaimed places in the array are zeros, and it converts the time to an integer representation (in minutes). We’ll write tests for each, like so: it("should properly find composition of heaps by adding a single zero") { // numbers entered as 1-2-4 vc.heap1 = [4,2,1] vc.composition(&vc.heap1) expect(vc.heap1).to(contain(4)) expect(vc.heap1).to(contain(2)) expect(vc.heap1).to(contain(1)) expect(vc.heap1).to(contain(0)) } it("should properly find composition of heaps by adding multiple zeros") { // numbers entered as 1 vc.heap1 = [1] vc.composition(&vc.heap1) expect(vc.heap1[0]).to(equal(1)) expect(vc.heap1[1]).to(equal(0)) expect(vc.heap1[2]).to(equal(0)) expect(vc.heap1[3]).to(equal(0)) } it("should properly find composition of heaps by converting to minutes") { // numbers entered as 1-0-0 vc.heap1 = [0,0,1] let minutes = vc.composition(&vc.heap1) expect(minutes).to(equal(60)) } Conclusion All in all, I sincerely hope that the iOS community hears the pleas from our web development friends and accepts the vitality of testing. Furthermore, I truly want all readers to see unit testing in a new light. This two-part series is intended to open the doors to the new world of BDD. This world thrives outside of XCTest, and it is one that stresses readability and maintainability. I have become intrigued by the Quick project, and, personally, I have found myself more inline with testing. When it comes to these posts, I’ve added my own spin (and opinions) in hopes that it will lead you to draft your own. Give Quick a try and see if you feel more comfortable writing your tests. As for the app, it is absolutely free for any hacking, and it would bring me tremendous pleasure to see it finished and released on the App Store. Thanks for reading! About the author Benjamin Reed began Computer Science classes at a nearby university in Nashville during his sophomore year in high school. Since then, he has become an advocate for open source. He is now pursing degrees in Computer Science and Mathematics fulltime. The Ruby community has intrigued him, and he openly expresses support for the Rails framework. When asked, he believes that studying Rails has led him to some of the best practices and, ultimately, has made him a better programmer. iOS development is one of his hobbies, and he enjoys scouting out new projects on GitHub. On GitHub, he’s appropriately named @codeblooded. On Twitter, he’s @benreedDev.
Read more
  • 0
  • 0
  • 961

article-image-creating-your-infrastructure-using-chef-provisioning
Packt
05 Jun 2015
5 min read
Save for later

Creating your infrastructure using Chef Provisioning

Packt
05 Jun 2015
5 min read
In this article by Matthias Marschall, author of the book Chef Infrastructure Automation Cookbook - Second Edition, we will "know how to use Chef to manage the software on individual machines and you know how to use knife to bootstrap individual nodes. Chef Provisioning helps you to use the power of Chef to create your whole infrastructure for you. No matter whether you want to create a cluster of Vagrant boxes, Docker instances, or Cloud servers, Chef Provisioning lets you define your infrastructure in a simple recipe and run it idempotently. Let's see how to create a Vagrant machine using a Chef recipe. (For more resources related to this topic, see here.) Getting ready Make sure that you have your Berksfile, my_cookbook and web_server roles ready to create an nginx site. How to do it... Let's see how "to create a Vagrant machine and install nginx "on it: Describe your Vagrant machine in a recipe called mycluster.rb: mma@laptop:~/chef-repo $ subl mycluster.rb require 'chef/provisioning'   with_driver 'vagrant' with_machine_options :vagrant_options => { 'vm.box' => 'opscode-ubuntu-14.04' }   machine 'web01' do role 'web_server' end Install all required cookbooks in your local chef-repo: mma@laptop:~/chef-repo $ berks installmma@laptop:~/chef-repo $ berks vendor cookbooks Resolving cookbook dependencies... Using apt (2.6.1) ...TRUNCATED OUTPUT... Vendoring yum-epel (0.6.0) to cookbooks/yum-epel Run the Chef client in local mode to bring up the Vagrant machine and execute a Chef run on it: mma@laptop:~/chef-repo $ chef-client -z mycluster.rb [2015-03-08T21:09:39+01:00] INFO: Starting chef-zero on host localhost, port 8889 with repository at repository at /Users/mma/work/chef-repo ...TRUNCATED OUTPUT... Recipe: @recipe_files::/Users/mma/work/chef-repo/mycluster.rb * machine[webserver] action converge[2015-03-08T21:09:43+01:00] INFO: Processing machine[web01] action converge (@recipe_files::/Users/mma/work/chef-repo/mycluster.rb line 6) ...TRUNCATED OUTPUT... [2015-03-08T21:09:47+01:00] INFO: Executing sudo chef-client -l info on vagrant@127.0.0.1      [web01] [2015-03-08T20:09:21+00:00] INFO: Forking chef instance to converge...                Starting Chef Client, version 12.1.0                ...TRUNCATED OUTPUT...                Chef Client finished, 18/25 resources updated in 73.839065458 seconds ...TRUNCATED OUTPUT... [2015-03-08T21:11:05+01:00] INFO: Completed chef-client -l info on vagrant@127.0.0.1: exit status 0    - run 'chef-client -l info' on web01 [2015-03-08T21:11:05+01:00] INFO: Chef Run complete in 82.948293 seconds ...TRUNCATED OUTPUT... Chef Client finished, 1/1 resources updated in 85.914979 seconds Change" into the directory where Chef put the Vagrant configuration: mma@laptop:~/chef-repo $ cd ~/.chef/vms Validate that there is a Vagrant machine named web01 running: mma@laptop:~/.chef/vms $ vagrant status Current machine states: web01                 running (virtualbox) Validate that nginx is installed and running on the Vagrant machine: mma@laptop:~/.chef/vms $ vagrant ssh vagrant@web01:~$ wget localhost:80 ...TRUNCATED OUTPUT... 2015-03-08 22:14:45 (2.80 MB/s) - 'index.html' saved [21/21] How it works... Chef Provisioning comes with a selection of drivers for all kinds of infrastructures, including Fog (supporting Amazon EC2, OpenStack, and others), VMware VSphere, Vagrant (supporting Virtualbox and VMware Fusion), various Containers, such as LXC Docker "and Secure Shell (SSH). In this recipe, we make sure that we can use the directives provided by Chef Provisioning by requiring chef/provisioning library. Then, we configure the driver that we want to use. We use Vagrant and tell Chef to use the opscode-ubuntu-14.04 Vagrant box to spin up our machine. Using the machine resource, we ask Chef to spin up a Vagrant machine and configure it using Chef by applying the role web_server. The web_server role uses the cookbook my_cookbook to configure the newly created Vagrant machine. To make sure that all the required cookbooks are available to Chef, we use berks install and berks vendor cookbooks. The berks vendor cookbooks installs all the required cookbooks in the local cookbooks directory. The Chef client can access the cookbooks here, without the need for a Chef server. Finally, we use the Chef client to execute our Chef Provisioning recipe. It will spin up the defined Vagrant machine and execute a Chef client run on it. Chef Provisioning will put the Vagrant Virtual Machine (VM) definition into the directory ~/.chef/vms. To manage the Vagrant VM, you need to change to this directory. There's more... Instead of using the with_driver directive, you can use the CHEF_DRIVER environment variable: mma@laptop:~/chef-repo $ CHEF_DRIVER=vagrant chef-client -z mycluster.rb You can create multiple instances of a machine by using the machine_image directive in your recipe: machine_image 'web_server' do role 'web_server' end 1.upto(2) do |i| machine "web0#{i}" do    from_image 'web_server'   end end See also Find the source code of the Chef Provisioning library at GitHub: https://github.com/chef/chef-provisioning Find" the Chef Provisioning documentation at https://docs.chef.io/provisioning.html Learn how to" set up a Chef server using Chef Provisioning: https://www.chef.io/blog/2014/12/15/sysadvent-day-14-using-chef-provisioning-to-build-chef-server/ Summary This article deals with networking and applications spanning multiple servers. You learned how to create your whole infrastructure using Chef provisioning. Resources for Article: Further resources on this subject: Chef Infrastructure [article] Going Beyond the Basics [article] Getting started with using Chef [article]
Read more
  • 0
  • 0
  • 3405

article-image-what-bi-and-what-are-bi-tools-microsoft-dynamics-gp
Packt
05 Jun 2015
13 min read
Save for later

What is BI and What are BI Tools for Microsoft Dynamics GP?

Packt
05 Jun 2015
13 min read
In this article by Belinda Allen and Mark Polino, authors of the book Real-world Business Intelligence with Microsoft Dynamics GP, we will define BI and discuss the BI tools for Microsoft Dynamics GP. (For more resources related to this topic, see here.) What is BI and how do I get it? So let's define BI with no assumptions. To us, BI is the ability to make decisions based on accurate and timely information. It's neither a report nor dashboard, nor is it just data. It is the insight obtained from the content and its presentation that gives us the information essential to make sound decisions for our business. It is your insight and experience combined with your data. Imagine going to a dinner party and seeing a bowl of green beans with almonds on the table. You love green beans; they are your favorite vegetable. However, you have a nut allergy, and you visually see almonds with the green beans, so you know not to eat the beans. If we asked you, "Why aren't you eating the green beans, aren't they your favorite?" You'll respond, "I see almonds and I'm allergic to almonds." It's your knowledge combined with the visual of the dish that provides you with personal intelligence to stay away from the beans. When you are trying to determine what BI your business or organization needs, ask yourself what information would make it easier for your firm to obtain its goals. Ask what problems you have and what information would help solve or prevent them from happening again. Focusing on a report or dashboard first will limit your options unnecessarily. As fast as the economy and technology change, one bad or misinformed decision can ruin your company and/or your career. Out-of-the-box BI tools for Microsoft Dynamics GP The following are all the tools that work with GP and are considered native or out-of-the-box as they come with GP or are a part of the Microsoft stack of technology. Some of these tools are included in the price of GP and others must be purchased separately. We won't use all of these tools, no one has that much time! We do want to make sure that you are aware of their existence and understand what each tool does. The tools are in no particular order; this isn't a beauty pageant or a top ten list. Business Analyzer This is a metric or Key Performance Indicator (KPI) tool that comes with Microsoft Dynamics GP. This tool is role based and includes over 150 reports out-of-the-box. These reports or metrics can be run from within GP, outside of GP, on a Microsoft Surface via an app from the Microsoft App Store, and even on an iPad with the Business Analyzer app. Business Analyzer uses reports that are built-in and can be edited with Microsoft SQL Server Reporting Services. Business Analyzer with SQL Security is secure and easy to use. Reports can be displayed as a dashboard, chart, or tabular with drill back right into GP data: Management Reporter reports and Excel reports can even be added to the Windows App and iPad App versions. This tool is best used for dashboards where the data can be represented in small charts or graphs along with the Management Reporter reports representing what you want to see. SQL Server Reporting Services SQL Server Reporting Services (SSRS) is a report-writing tool based directly on the data coming from Microsoft SQL Server. Reports can be created using tabular, graphical, or free form format. Reports can be launched in Business Analyzer, on the GP home page within many GP cards and transaction windows, or in Microsoft SharePoint. The following screenshot shows six SSRS (out-of-the-box with GP) reports being used to make the home page (for this user only) dashboard. This makes the home page in GP a custom experience for each and every user, providing the user with the information that is important to them: Like Business Analyzer, SSRS is a great tool for repetitive analysis. It's not as useful for ad hoc analysis. Microsoft Excel Although Microsoft Excel is not included with Microsoft Dynamics GP, it is likely to be a tool you already own and like using. Microsoft Dynamics GP includes Excel-based reports that are connected to be completely refreshable with new data with just a click. This means no more exporting to Excel and then formatting, only repeating the task the next time you need the report. Now, you can pull the data into Excel and then format and save it. The next time you need the report, open the Excel file, select Data and Refresh (or even have it auto refresh) with formatting intact and with no extra effort. This allows Excel to be your report writer with data integrated automatically, so there is no need to balance Excel with GP. Quit thinking of Excel as a big calculator, and focus on its analytical power. Excel is incredibly powerful for both repetitive and ad hoc analyses. Excel is really less of a tool and more like a hardware store. We are by no means suggesting that a large number of Excel reports become your BI. Instead, we are suggesting that you use Excel to extract data from the source, using it as a formatting tool and data delivery tool. The following screenshot is an example of using Excel to format refreshable data into a dashboard, using Excel as a report delivery tool. The following report is actually the first report we will build: Microsoft Excel PowerPivot PowerPivot is a tool in Excel 2013—Office Professional Plus that enables you to perform data mashups (combining data from two or more sources, such as GP and Microsoft CRM) and data exploration, using billions of rows of data at a super fast speed. We refer to this as pivot tables on steroids! This is accomplished through the use of the data model. The data model is an in-memory data storage device with row based compression. That data is stored as a part of the file but is not visible in the Excel spreadsheet, unless you choose to display it (or a part of it). This is how a single Excel file can handle billions of rows, bypassing the normal row and column limitations of the Excel spreadsheet. The data model can also receive data from multiple sources, allowing you to make custom links, and even custom fields, by using Data Analysis Expressions (DAX). It is through PowerPivot's data model that Excel can create a single pivot table/chart on the data from multiple sources. This is a great tool when you want to share data offline with others: Microsoft Excel Power Query Power Query is a great new tool that allows you to conform, combine, split, merge, and mash up your data from GP and other sources, including public websites (such as Wikipedia and some government sites) and even some private websites. These queries can then be shared with other users via Microsoft Power BI for Office 365. Think of it as SmartList objects outside of Dynamics GP. Power Query uses an Excel spreadsheet and/or the data model from PowerPivot to hold the data it captures and cleanses. What makes this an exciting tool is its ability to gather all kinds of data from all kinds of sources, combine it, and use it in Excel. PowerPivot can import data and contain it, while Power Query can import or link to data and use PowerPivot to contain it. Why is this small difference important? Power Query is more flexible in the types of connections it can make. Also, Power Query is the data editing tool of the new Power BI dashboard-ing tool: Microsoft Excel Power Map Power Map is a great way to visually see and even fly across your data as a 3D geographical representation. Why is this considered a BI tool? Imagine seeing your sales represented on a map, showing total sales or gross margin. Does one product or product line sell better in the North than the South? Does it sell better in the fall in the East and in summer in the West? Where should you put your new warehouse in order for it to be close to your customer base? Power Maps are not always the best fit for your BI, but when they do fit, you can sure learn a lot about your data. The following screenshot shows sales leads and their estimated value by the salesperson from Microsoft CRM data: Microsoft Power BI Microsoft Power BI is a stand-alone website/dashboard tool that allows you to create your own dashboard, with refreshable links from a large variety of data sources. Included with this tool is a free App that displays the data from the website. One of the most amazing features of Microsoft Power BI is the Q&A feature. If you upload an Excel table into the dashboard, you can ask questions about the data, in natural language, just like you do in Microsoft Bing. The results of your questions will be a visual representation of the answer. It could be a graph, chart, table, map, and so on. If this is something you ask a lot, you can simply pin it to the dashboard as a new chart. This tool is amazing for managers, executives, owners, and board members alike. It gives a quick insight into timely data, right at their fingertips: Microsoft Excel Power View Power View is a tool in Excel 2013—Office Professional Plus that enables you to represent your data in a more graphic representation than those of a traditional pivot table or chart. For example, you can graph your sales for each state on an actual map of the U.S., highlighting visually where your biggest sales come from without reading any numbers. This is a simple dashboard tool that allows for easy filtering. This tool works very well for those individuals who want to see data in a dashboard format, with the ability to filter either a single part of the dashboard or the entire dashboard. Power View can use data from an Excel spreadsheet, or data in a PowerPivot data model. Again, this allows for multiple data sources and large amounts of data to be used on a single dashboard: GP Analysis Cubes library This module in GP allows you to organize your data into analysis cubes that allows users to evaluate or create reports from different angles or formats using pivot tables. The same chunk or cube of data can be used to evaluate inventory sold, sales revenue, sales commission, returns of items, profitability of sales, and so on. These cubes are designed specifically to analyze the GP database, using the SQL Server Analysis Services (SSAS) or Online Analytical Processing (OLAP) database. Analysis Cubes create a warehouse of data from GP for the purpose of reporting. Reporting from the cubes rather than from the production data, frees the server's resources for GP activity. Modifying cubes or connecting them to additional data sources will often require expert help: SmartList and SmartList Designer SmartList is an ad hoc query tool that comes with Microsoft Dynamics GP. It is in a tabular format and can be exported to Excel or Word. Custom SmartList objects can be created using the GP tool SmartList Designer. Although SmartList is an invaluable tool for GP use, for BI purposes, we prefer to go directly to Excel. SmartList exports of large datasets are painfully slow; a root canal level of pain. Excel reports are fast and easily reusable. If you create a SmartList and export it to Excel for each use, you will need to reformat the Excel document each and every time. There are ways to avoid reformatting, but even those take a lot of effort. SmartList Designerallows users to create and build their own SmartList objects. Although there are many great SmartList objects already built-in, they do not always fit your needs exactly. A good example of this would be Payables Transactions. All documents display as a positive amount since it is a list of documents. Many users want to see the document and its effect on the AP account itself (for example, returns are negatives and invoices are positive). If this is how you want your list to be displayed, you can do this through SmartList Designer: Management Reporter We often become so focused on using Management Reporter (or FRx) for balance sheets, profit and loss statements, and cash flow statements that we forget the value already built in our financial statement tool. Imagine taking your profit and loss statement (or statement of activities for not-for-profits) and removing the budget column, or splitting MTD into weeks and comparing each week of the month, or even week 1 of this month to week 1 of last month. All this would take is a new column format and "poof"—access to a new and amazing trend reporting! The following illustration is a Weekly Material Usage Report from Management Reporter. From this report, managers can see a giant spike in the last week of January that would not be visible in a report that only displayed month-to-date information: Microsoft SharePoint Microsoft SharePoint is server software (and does not come with GP) or an online tool in Office 365 that creates a central point for work to be shared and collaboration to occur. This product is what it is named, SharePoint, a point for sharing. Anyway… This is a good spot to have BI content exist for version control and sharing. The Microsoft social networking tool, Yammer, extends SharePoint into an even better collaboration tool. There is a large variety of additional BI tools available through the SharePoint arena which are awesome. However, we wanted to stick with tools that you'll likely already own, or can obtain easily and take off running on your own. So, we'll leave SharePoint off the table. Microsoft Dynamics GP Workspace for Office 365 In Microsoft SharePoint for Office 365, you can create a custom workspace using Dynamics GP 2013 R2 or higher. Here, you can store your reports, creating a truly collaborative environment. We'll not be getting into this much, but we did want to give it a shout out. It's a great storage place for your reports and an excellent starting spot. Summary We reviewed what BI is and why it's important. We've also identified many of the tools that you probably already own and may even have installed. Resources for Article: Further resources on this subject: Financial Management with Microsoft Dynamics AX 2012 R3 [article] Diagnostic leveraging of the Accelerated POC with the CRM Online service [article] Interacting with Data for Dashboards [article]
Read more
  • 0
  • 0
  • 2333

article-image-deploy-game-heroku
Daan van
05 Jun 2015
13 min read
Save for later

Deploy a Game to Heroku

Daan van
05 Jun 2015
13 min read
In this blog post we will deploy a game to Heroku so that everybody can enjoy it. We will deploy the game *Tag* that we created in the blog post [Real-time Communication with SocketIO]. Heroku is a platform as a service (PaaS) provider. A PaaS is a:category of cloud computing services that provides a platform allowing customers to develop, run and manage Web applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. Pricing Nothing comes for free. Luckily Heroku has a pay as you grow pricing philosophy. This means that if you start out using Heroku server in moderation, you are free to use it. Only when your app starts to use more resources you need to pay, or let your application be unavailable for a while. Follow Along If you want to follow along deploying the Tag server to Heroku, download follow-along. Unzip it in a suitable location and enter it. Heroku depends on Git for the deployment process. Make sure you download and install Git for your platform, if you have not already done so. With Git installed enter the Tag-follow-along-deployment directory, initialize a Git repository, add all the files and make a commit with the following commands cd Tag-follow-along-deployment git init git add . git commit -m "following along" If you want to know what the end result looks like, take a peek. Signing Up You need to register with Heroku in order to start using their services. You can sign up with a form where you provide Heroku with your full name, your email-address and optionally a company name. If you have not already signed up, do so now. Make sure to read Heroku's terms of service and their privacy statement. Heroku Toolbelt Once you have signed up, you can start downloading the Heroku toolbelt. The toolbelt is Heroku's workhorse. It is a set of command line tools that are responsible for running your application locally, deploying the application to Heroku, starting, stopping and scaling the application and monitoring the application state. Make sure to download the appropriate toolbelt for your operating system. Login In Having installed the Heroku toolbelt it is now time to login with the same credentials we signed up with. Issue the command: heroku login And provide it with the correct email and password. The command should responds with Authentication successful. Create an App With Heroku successfully authenticating us we can start creating an app. This is done with the heroku create command. When issued, the Heroku toolbelt will start working to create an app on the Heroku servers, give it an unique, albeit random, name and add a remote to your Git repository. heroku create It responded in my case with Creating peaceful-caverns-9339... done, stack is cedar-14 https://peaceful-caverns-9339.herokuapp.com/ | https://git.heroku.com/peaceful-caverns-9339.git Git remote heroku added If you run the command the names and URLs could be different, but the overall response should be similar. Remote A remote is a tracked repository, i.e. a repository that is related to the repository you're working on. You can inspect the tracked repositories with the git remote command. It will tell you that it tracks the repository known by the name heroku. If you want to learn more about Git remotes, see the documentation. Add a Procfile A Procfile is used by Heroku to configure what processes should run. We are going to create one now. Open you favorite editor and create a file Procfile in the root of the Tag-follow-along-deployment. Write the following content into it: web: node server.js This tells Heroku to start a web process and let it run node server.js. Save it and then add it to the repository with the following commands: git add Procfile git commit -m "Configured a Procfile" Deploy your code The next step is to deploy your code to Heroku. The following command will do this for you. git push heroku master Notice that this is a Git command. What happens is that the code is pushed to Heroku. This triggers Heroku to start taking the necessary steps to start your server. Heroku informs you what it is doing. The run should look similar to the output below: counting objects: 29, done. Delta compression using up to 8 threads. Compressing objects: 100% (26/26), done. Writing objects: 100% (29/29), 285.15 KiB | 0 bytes/s, done. Total 29 (delta 1), reused 0 (delta 0) remote: Compressing source files... done. remote: Building source: remote: remote: -----> Node.js app detected remote: remote: -----> Reading application state remote: package.json... remote: build directory... remote: cache directory... remote: environment variables... remote: remote: Node engine: unspecified remote: Npm engine: unspecified remote: Start mechanism: Procfile remote: node_modules source: package.json remote: node_modules cached: false remote: remote: NPM_CONFIG_PRODUCTION=true remote: NODE_MODULES_CACHE=true remote: remote: -----> Installing binaries remote: Resolving node version (latest stable) via semver.io... remote: Downloading and installing node 0.12.2... remote: Using default npm version: 2.7.4 remote: remote: -----> Building dependencies remote: Installing node modules remote: remote: > ws@0.5.0 install /tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/engine.io/node_modules/ws remote: > (node-gyp rebuild 2> builderror.log) || (exit 0) remote: remote: make: Entering directory `/tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/engine.io/node_modules/ws/build' remote: CXX(target) Release/obj.target/bufferutil/src/bufferutil.o remote: SOLINK_MODULE(target) Release/obj.target/bufferutil.node remote: SOLINK_MODULE(target) Release/obj.target/bufferutil.node: Finished remote: COPY Release/bufferutil.node remote: CXX(target) Release/obj.target/validation/src/validation.o remote: SOLINK_MODULE(target) Release/obj.target/validation.node remote: SOLINK_MODULE(target) Release/obj.target/validation.node: Finished remote: COPY Release/validation.node remote: make: Leaving directory `/tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/engine.io/node_modules/ws/build' remote: remote: > ws@0.4.31 install /tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws remote: > (node-gyp rebuild 2> builderror.log) || (exit 0) remote: remote: make: Entering directory `/tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/build' remote: CXX(target) Release/obj.target/bufferutil/src/bufferutil.o remote: make: Leaving directory `/tmp/build_bce51a5d2c066ee14a706cebbc28bd3e/node_modules/socket.io/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/build' remote: express@4.12.3 node_modules/express remote: ├── merge-descriptors@1.0.0 remote: ├── utils-merge@1.0.0 remote: ├── cookie-signature@1.0.6 remote: ├── methods@1.1.1 remote: ├── cookie@0.1.2 remote: ├── fresh@0.2.4 remote: ├── escape-html@1.0.1 remote: ├── range-parser@1.0.2 remote: ├── content-type@1.0.1 remote: ├── finalhandler@0.3.4 remote: ├── vary@1.0.0 remote: ├── parseurl@1.3.0 remote: ├── serve-static@1.9.2 remote: ├── content-disposition@0.5.0 remote: ├── path-to-regexp@0.1.3 remote: ├── depd@1.0.1 remote: ├── on-finished@2.2.1 (ee-first@1.1.0) remote: ├── qs@2.4.1 remote: ├── debug@2.1.3 (ms@0.7.0) remote: ├── etag@1.5.1 (crc@3.2.1) remote: ├── send@0.12.2 (destroy@1.0.3, ms@0.7.0, mime@1.3.4) remote: ├── proxy-addr@1.0.8 (forwarded@0.1.0, ipaddr.js@1.0.1) remote: ├── accepts@1.2.7 (negotiator@0.5.3, mime-types@2.0.11) remote: └── type-is@1.6.2 (media-typer@0.3.0, mime-types@2.0.11) remote: remote: nodemon@1.3.7 node_modules/nodemon remote: ├── minimatch@0.3.0 (sigmund@1.0.0, lru-cache@2.6.2) remote: ├── touch@0.0.3 (nopt@1.0.10) remote: ├── ps-tree@0.0.3 (event-stream@0.5.3) remote: └── update-notifier@0.3.2 (is-npm@1.0.0, string-length@1.0.0, chalk@1.0.0, semver-diff@2.0.0, latest-version@1.0.0, configstore@0.3.2) remote: remote: socket.io@1.3.5 node_modules/socket.io remote: ├── debug@2.1.0 (ms@0.6.2) remote: ├── has-binary-data@0.1.3 (isarray@0.0.1) remote: ├── socket.io-adapter@0.3.1 (object-keys@1.0.1, debug@1.0.2, socket.io-parser@2.2.2) remote: ├── socket.io-parser@2.2.4 (isarray@0.0.1, debug@0.7.4, component-emitter@1.1.2, benchmark@1.0.0, json3@3.2.6) remote: ├── engine.io@1.5.1 (base64id@0.1.0, debug@1.0.3, engine.io-parser@1.2.1, ws@0.5.0) remote: └── socket.io-client@1.3.5 (to-array@0.1.3, indexof@0.0.1, debug@0.7.4, component-bind@1.0.0, backo2@1.0.2, object-component@0.0.3, component-emitter@1.1.2, has-binary@0.1.6, parseuri@0.0.2, engine.io-client@1.5.1) remote: remote: -----> Checking startup method remote: Found Procfile remote: remote: -----> Finalizing build remote: Creating runtime environment remote: Exporting binary paths remote: Cleaning npm artifacts remote: Cleaning previous cache remote: Caching results for future builds remote: remote: -----> Build succeeded! remote: remote: Tag@1.0.0 /tmp/build_bce51a5d2c066ee14a706cebbc28bd3e remote: ├── express@4.12.3 remote: ├── nodemon@1.3.7 remote: └── socket.io@1.3.5 remote: remote: -----> Discovering process types remote: Procfile declares types -> web remote: remote: -----> Compressing... done, 12.3MB remote: -----> Launching... done, v3 remote: https://peaceful-caverns-9339.herokuapp.com/ deployed to Heroku remote: remote: Verifying deploy... done. To https://git.heroku.com/peaceful-caverns-9339.git * [new branch] master -> master Scale the App The application is deployed, but now we need to make sure that Heroku assign resources to it. heroku ps:scale web=1 The above command instructs Heroku to scale your app so that one instance of it is running. You should now be able to open a browser and go to the URL Heroku mentioned at the end of the deployment step. In my case that would be https://peaceful-caverns-9339.herokuapp.com/. There is a convenience method that helps you in that regard. The heroku open command will open the registered URL in your default browser. Inspect the Logs If you followed along and open the application you would know that at this point you would have been greeted by an application error: So what did go wrong? Let's find out by inspecting the logs. Issue the following command: heroku logs To see the available logs. Below you find an excerpt: 2015-05-11T14:29:37.193792+00:00 heroku[api]: Enable Logplex by daan.v.berkel.1980+trash@gmail.com 2015-05-11T14:29:37.193792+00:00 heroku[api]: Release v2 created by daan.v.berkel.1980+trash@gmail.com 2015-05-12T08:47:13.899422+00:00 heroku[api]: Deploy ee12c7d by daan.v.berkel.1980+trash@gmail.com 2015-05-12T08:47:13.848408+00:00 heroku[api]: Scale to web=1 by daan.v.berkel.1980+trash@gmail.com 2015-05-12T08:47:13.899422+00:00 heroku[api]: Release v3 created by daan.v.berkel.1980+trash@gmail.com 2015-05-12T08:47:16.548876+00:00 heroku[web.1]: Starting process with command `node server.js` 2015-05-12T08:47:18.142479+00:00 app[web.1]: Recommending WEB_CONCURRENCY=1 2015-05-12T08:47:18.142456+00:00 app[web.1]: Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY) 2015-05-12T08:47:18.676440+00:00 app[web.1]: Listening on http://:::3000 2015-05-12T08:48:17.132841+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch 2015-05-12T08:48:17.132841+00:00 heroku[web.1]: Stopping process with SIGKILL 2015-05-12T08:48:18.006812+00:00 heroku[web.1]: Process exited with status 137 2015-05-12T08:48:18.014854+00:00 heroku[web.1]: State changed from starting to crashed 2015-05-12T08:48:18.015764+00:00 heroku[web.1]: State changed from crashed to starting 2015-05-12T08:48:19.731467+00:00 heroku[web.1]: Starting process with command `node server.js` 2015-05-12T08:48:21.328988+00:00 app[web.1]: Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY) 2015-05-12T08:48:21.329000+00:00 app[web.1]: Recommending WEB_CONCURRENCY=1 2015-05-12T08:48:21.790446+00:00 app[web.1]: Listening on http://:::3000 2015-05-12T08:49:20.337591+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch 2015-05-12T08:49:20.337739+00:00 heroku[web.1]: Stopping process with SIGKILL 2015-05-12T08:49:21.301823+00:00 heroku[web.1]: State changed from starting to crashed 2015-05-12T08:49:21.290974+00:00 heroku[web.1]: Process exited with status 137 2015-05-12T08:57:58.529222+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=peaceful-caverns-9339.herokuapp.com request_id=50cfbc6c-0561-4862-9254-d085043cb610 fwd="87.213.160.18" dyno= connect= service= status=503 bytes= 2015-05-12T08:57:59.066974+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/favicon.ico" host=peaceful-caverns-9339.herokuapp.com request_id=608a9f0f-c2a7-45f7-8f94-2ce2f5cd1ff7 fwd="87.213.160.18" dyno= connect= service= status=503 bytes= 2015-05-12T11:10:09.538209+00:00 heroku[web.1]: State changed from crashed to starting 2015-05-12T11:10:11.968702+00:00 heroku[web.1]: Starting process with command `node server.js` 2015-05-12T11:10:13.905318+00:00 app[web.1]: Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY) 2015-05-12T11:10:13.905338+00:00 app[web.1]: Recommending WEB_CONCURRENCY=1 2015-05-12T11:10:14.509612+00:00 app[web.1]: Listening on http://:::3000 2015-05-12T11:11:12.622517+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch 2015-05-12T11:11:12.622876+00:00 heroku[web.1]: Stopping process with SIGKILL 2015-05-12T11:11:13.668749+00:00 heroku[web.1]: Process exited with status 137 2015-05-12T11:11:13.677915+00:00 heroku[web.1]: State changed from starting to crashed Analyzing the Problem While looking at the log we see that the application got deployed and scaled properly. 2015-05-12T08:47:13.899422+00:00 heroku[api]: Deploy ee12c7d by daan.v.berkel.1980+trash@gmail.com 2015-05-12T08:47:13.848408+00:00 heroku[api]: Scale to web=1 by daan.v.berkel.1980+trash@gmail It then tries to run node server.js: 2015-05-12T08:48:19.731467+00:00 heroku[web.1]: Starting process with command `node server.js` This succeeds because we see the expected Listening on message: 2015-05-12T08:48:21.790446+00:00 app[web.1]: Listening on http://:::3000 Unfortunately, it all breaks down after that. 2015-05-12T08:49:20.337591+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch It retries starting the application, but eventually it gives up. The problem is that we hard-coded our application server to listen on port `3000`, but Heroku expects an other port. Heroku communicates the port to use with the `PORT` environment variable. Using Environment Variables In order to start our application correctly we need to use the environment variable PORT that Heroku provides. We can do that by opening server.js and going to line 15: server.listen(3000, function(){ var host = server.address().address; var port = server.address().port; console.log('Listening on http://%s:%s', host, port); }); This snippet will start the server and it will listening on port 3000. We need to change that value so that it will use the environment variable PORT. This is done with the following code: server.listen(process.env.PORT || 3000, function(){ var host = server.address().address; var port = server.address().port; console.log('Listening on http://%s:%s', host, port); }); process.env.PORT || 3000 will use the PORT environment variable if it is set and will default to port 3000, e.g. for testing purposes. Re-deploy Application We need to deploy our code changes to Heroku. This is done with the following set of commands. git add server.js git commit -m "use PORT environment variable" git push heroku master The first two commands at the changes in server.js to the repository. The third updates the tracked repository with these changes. This triggers Heroku to try and restart the application anew. If you now inspect the log with heroku logs you will see that the application is successfully started. 2015-05-12T12:22:15.829584+00:00 heroku[api]: Deploy 9a2cac8 by daan.v.berkel.1980+trash@gmail.com 2015-05-12T12:22:15.829584+00:00 heroku[api]: Release v4 created by daan.v.berkel.1980+trash@gmail.com 2015-05-12T12:22:17.325749+00:00 heroku[web.1]: State changed from crashed to starting 2015-05-12T12:22:19.613648+00:00 heroku[web.1]: Starting process with command `node server.js` 2015-05-12T12:22:21.503756+00:00 app[web.1]: Recommending WEB_CONCURRENCY=1 2015-05-12T12:22:21.503733+00:00 app[web.1]: Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY) 2015-05-12T12:22:22.118797+00:00 app[web.1]: Listening on http://:::10926 2015-05-12T12:22:23.355206+00:00 heroku[web.1]: State changed from starting to up Tag Time If you now open the application in your default browser with heroku open, you should be greeted by the game of Tag. If you move your mouse around in the Tag square you will see your circle trying to chase it. You can now invite other people to play on the same address and soon you will have a real game of Tag on your hands. Conclusion We have seen that Heroku provides an easy to use Platform as a Service, that can be used to deploy your game server on with the help of the Heroku toolbelt. About the author Daan van Berkel is an enthusiastic software craftsman with a knack for presenting technical details in a clear and concise manner. Driven by the desire for understanding complex matters, Daan is always on the lookout for innovative uses of software.
Read more
  • 0
  • 0
  • 30633

article-image-edx-e-learning-course-marketing
Packt
05 Jun 2015
9 min read
Save for later

edX E-Learning Course Marketing

Packt
05 Jun 2015
9 min read
In this article by Matthew A. Gilbert, the author of edX E-Learning Course Development, we are going to learn various ways of marketing. (For more resources related to this topic, see here.) edX's marketing options If you don't market your course, you might not get any new students to teach. Fortunately, edX provides you with an array of tools for this purpose, as follows: Creative Submission Tool: Submit the assets required for creating a page in your edX course using the Creative Submission Tool. You can also use those very materials in promoting the course. Access the Creative Submission Tool at https://edx.projectrequest.net/index.php/request. Logo and the Media Kit: Although these are intended for members of the media, you can also use the edX Media Kit for your promotional purposes: you can download high-resolution photos, edX logo visual guidelines (in Adobe Illustrator and EPS versions), key facts about edX, and answers to frequently asked questions. You can also contact the press office for additional information. You can find the edX Media Kit online at https://www.edx.org/media-kit. edX Learner Stories: Using stories of students who have succeeded with other edX courses is a compelling way to market the potential of your course. Using Tumblr, edX Learner Stories offers more than a dozen student profiles. You might want to use their stories directly or use them as a template for marketing materials of your own. Read edX Learner Stories at http://edxstories.tumblr.com. Social media marketing Traditional marketing tools and the options available in the edX Marketing Portal are a fitting first step in promoting your course. However, social media gives you a tremendously enhanced toolkit you can use to attract, convert, and transform spectators into students. When marketing your course with social media, you will also simultaneously create a digital footprint for yourself. This in turn helps establish your subject matter expertise far beyond one edX course. What's more, you won't be alone; there exists a large community of edX instructors and students, including those from other MOOC platforms already online. Take, for example, the following screenshot from edX's Twitter account (@edxonline). edX has embraced social media as a means of marketing and to create a practicing virtual community for those creating and taking their courses. Likewise, edX also actively maintains a page on Facebook, as follows: You can also see how active edX's YouTube channel is in the following screenshot. Note that there are both educational and promotional videos. To get you started in social media—if you're not already there—take a look at the list of 12 social media tools, as follows. Not all of these tools might be relevant to your needs, but consider the suggestions to decide how you might best use them, and give them a try: Facebook (https://www.facebook.com): Create a fan page for your edX course; you can re-use content from your course's About page such as your course intro video, course description, course image, and any other relevant materials. Be sure to include a link from the Facebook page for your course to its About page. Look for ways to share other content from your course (or related to your course) in a way that engages members of your fan page. Use your Facebook page to generate interest and answer questions from potential students. You might also consider creating a Facebook group. This can be more useful for current students to share knowledge during the class and to network once it's complete. Visit edX on Facebook at https://www.facebook.com/edX. Google+ (https://plus.google.com): Take the same approach as you did with your Facebook fan page. While this is not as engaging as Facebook, you might find that posting content on Google+ increases traffic to your course's About page due to the increased referrals you are likely to experience via Google search results. Add edX to your circles on Google+ at https://plus.google.com/+edXOnline/posts. Instagram (https://instagram.com): Share behind-the-scenes pictures of you and your staff for your course. Show your students what a day in your life is like, making sure to use a unique hashtag for your course. Picture the possibilities with edX on Instagram at https://instagram.com/edxonline/. LinkedIn (https://www.linkedin.com): Share information about your course in relevant LinkedIn groups, and post public updates about it in your personal account. Again, make sure you include a unique hashtag for your course and a link to the About page. Connect with edX on LinkedIn at https://www.linkedin.com/company/edx. Pinterest (https://www.pinterest.com): Share photos as with Instagram, but also consider sharing infographics about your course's subject matter or share infographics or imagers you use in your actual course as well. You might consider creating pin boards for each course, or one per pin board per module in a course. Pin edX onto your Pinterest pin board at https://www.pinterest.com/edxonline/. Slideshare (http://www.slideshare.net): If you want to share your subject matter expertise and thought leadership with a wider audience, Slideshare is a great platform to use. You can easily post your PowerPoint presentations, class documents or scholarly papers, infographics, and videos from your course or another topic. All of these can then be shared across other social media platforms. Review presentations from or about edX courses on Slideshare at http://www.slideshare.net/search/slideshow?searchfrom=header&q=edx. SoundCloud (https://soundcloud.com): With SoundCloud, you can share MP3 files of your course lectures or create podcasts related to your areas of expertise. Your work can be shared on Twitter, Tumblr, Facebook, and Foursquare, expanding your influence and audience exponentially. Listen to some audio content from Harvard University at https://soundcloud.com/harvard. Tumblr (https://www.tumblr.com): Resembling what the child of WordPress and Twitter might be like, Tumblr provides a platform to share behind-the-scenes text, photos, quotes, links, chat, audios, and videos of your edX course and the people who make it possible. Share a "day in the life" or document in real time, an interactive history of each edX course you teach. Read edX's learner stories at http://edxstories.tumblr.com. Twitter (https://twitter.com): Although messages on Twitter are limited to 140 characters, one tweet can have a big impact. For a faculty wanting to promote its edX course, it is an efficient and cost-effective option. Tweet course videos, samples of content, links to other curriculum, or promotional material. Engage with other educators who teach courses and retweet posts from academic institutions. Follow edX on Twitter at https://twitter.com/edxonline. You might also consider subscribing to edX's Twitter list of edX instructors at https://twitter.com/edXOnline/lists/edx-professors-teachers, and explore the Twitter accounts of edX courses by subscribing to that list at https://twitter.com/edXOnline/lists/edx-course-handles. Vine (https://vine.co): A short-format video service owned by Twitter, Vine provides you with 6 seconds to share your creativity, either in a continuous stream or smaller segments linked together like stop motion. You might create a vine showing the inner working of the course faculty and staff, or maybe even ask short questions related to the course content and invite people to reply with answers. Watch vines about MOOCs at https://vine.co. WordPress: WordPress gives you two options to manage and share content with students. With WordPress.com (https://wordpress.com), you're given a selection of standardized templates to use on a hosted platform. You have limited control but reasonable flexibility and limited, if any, expenses. With Wordpress.org (https://wordpress.org), you have more control but you need to host it on your own web server, which requires some technical know-how. The choice is yours. Read posts on edX on the MIT Open Matters blog on Wordpress.com at https://mitopencourseware.wordpress.com/category/edx/. YouTube (https://www.youtube.com): YouTube is the heart of your edX course. It's the core of your curriculum and the anchor of engagement for your students. When promoting your course, use existing videos from your curriculum in your social media campaigns, but identify opportunities to record short videos specifically for promoting your course. Watch course videos and promotional content on the edX YouTube channel at https://www.youtube.com/user/EdXOnline. Personal branding basics Additionally, whether the impact of your effort is immediately evident or not, your social media presence powers your personal brand as a professor. Why is that important? Read on to know. With the possible exception of marketing professors, most educators likely tend to think more about creating and teaching their course than promoting it—or themselves. Traditionally, that made sense, but it isn't practical in today's digitally connected world. Social media opens an area of influence where all educators—especially those teaching an edX course—should be participating. Unfortunately, many professors don't know where or how to start with social media. If you're teaching a course on edX, or even edX Edge, you will likely have some kind of marketing support from your university or edX. But if you are just in an organization using edX Code, or simply want to promote yourself and your edX course, you might be on your own. One option to get you started with social media is the Babb Group, a provider of resources and consulting for online professors, business owners, and real-estate investors. Its founder and CEO, Dani Babb (PhD), says this: "Social media helps you show that you are an expert in a given field. It is an important tool today to help you get hired, earn promotions, and increase your visibility." The Babb Group offers five packages focused on different social media platforms: Twitter, LinkedIn, Facebook, Twitter and Facebook, or Twitter with Facebook and LinkedIn. You can view the Babb Group's social media marketing packages at http://www.thebabbgroup.com/social-media-profiles-for-professors.html. Connect with Dani Babb on LinkedIn at https://www.linkedin.com/in/drdanibabb or on Twitter at https://twitter.com/danibabb Summary In this article, we tackled traditional marketing tools, identified options available from edX, discussed social media marketing, and explored personal branding basics. Resources for Article: Further resources on this subject: Constructing Common UI Widgets [article] Getting Started with Odoo Development [article] MODx Web Development: Creating Lists [article]
Read more
  • 0
  • 0
  • 5369
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-splunk-web-framework
Packt
04 Jun 2015
10 min read
Save for later

The Splunk Web Framework

Packt
04 Jun 2015
10 min read
In this article by the author, Kyle Smith, of the book, Splunk Developer's Guide, we learn about search-related and view-related modules. We will be covering the following topics: Search-related modules View-related modules (For more resources related to this topic, see here.) Search-related modules Let's talk JavaScript modules. For each module, we will review their primary purpose, their module path, the default variable used in an HTML dashboard, and the JavaScript instantiation of the module. We will also cover which attributes are required and which are optional. SearchManager The SearchManager is a primary driver of any dashboard. This module contains an entire search job, including the query, properties, and the actual dispatch of the job. Let's instantiate an object, and dissect the options from this sample code: Module Path: splunkjs/mvc/searchmanager Default Variable: SearchManager JavaScript Object instantiation    Var mySearchManager = new SearchManager({        id: "search1",        earliest_time: "-24h@h",        latest_time: "now",        preview: true,        cache: false,        search: "index=_internal | stats count by sourcetype"    }, {tokens: true, tokenNamespace: "submitted"}); The only required property is the id property. This is a reference ID that will be used to access this object from other instantiated objects later in the development of the page. It is best to name it something concise, yet descriptive with no spaces. The search property is optional, and contains the SPL query that will be dispatched from the module. Make sure to escape any quotes properly, if not, you may cause a JavaScript exception. earliest_time and latest_time are time modifiers that restrict the range of the events. At the end of the options object, notice the second object with token references. This is what automatically executes the search. Without these options, you would have to trigger the search manually. There are a few other properties shown, but you can refer to the actual documentation at the main documentation page http://docs.splunk.com/DocumentationStatic/WebFramework/1.1/compref_searchmanager.html. SearchManagers are set to autostart on page load. To prevent this, set autostart to false in the options. SavedSearchManager The SavedSearchManager is very similar in operation to the SearchManager, but works with a saved report, instead of an ad hoc query. The advantage to using a SavedSearchManager is in performance. If the report is scheduled, you can configure the SavedSearchManager to use the previously run jobs to load the data. If any other user runs the report within Splunk, the SavedSearchManager can reuse that user's results in the manager to boost performance. Let's take a look at a few sections of our code: Module Path: splunkjs/mvc/savedsearchmanager Default Variable: SavedSearchManager JavaScript Object instantiation        Var mySavedSearchManager = new SavedSearchManager({            id: "savedsearch1",        searchname: "Saved Report 1"            "dispatch.earliest_time": "-24h@h",            "dispatch.latest_time": "now",            preview: true,            cache: true        }); The only two required properties are id and searchname. Both of those must be present in order for this manager to run correctly. The other options are very similar to the SearchManager, except for the dispatch options. The SearchManager has the option "earliest_time", whereas the SavedSearchManager uses the option "dispatch.earliest_time". They both have the same restriction but are named differently. The additional options are listed in the main documentation page available at http://docs.splunk.com/DocumentationStatic/WebFramework/1.1/compref_savedsearchmanager.html. PostProcessManager The PostProcessManager does just that, post processes the results of a main search. This works in the same way as the post processing done in SimpleXML; a main search to load the event set, and a secondary search to perform an additional analysis and transformation. Using this manager has its own performance considerations as well. By loading a single job first, and then performing additional commands on those results, you avoid having concurrent searches for the same information. Your usage of CPU and RAM will be less, as you only store one copy of the results, instead of multiple. Module Path: splunkjs/mvc/postprocessmanager Default Variable: PostProcessManager JavaScript Object instantiation        Var mysecondarySearch = new PostProcessManager({            id: "after_search1",        search: "stats count by sourcetype",    managerid: "search1"        }); The property id is the only required property. The module won't do anything when instantiated with only an id property, but you can set it up to populate later. The other options are similar to the SearchManager, the major difference being that the search property in this case is appended to the search property of the manager listed in the managerid property. For example, if the manager search is search index=_internal source=*splunkd.log, and the post process manager search is stats count by host, then the entire search for the post process manager is search index=_internal source=*splunkd.log | stats count by host. The additional options are listed at the main documentation page http://docs.splunk.com/DocumentationStatic/WebFramework/1.1/compref_postprocessmanager.html. View-related modules These modules are related to the views and data visualizations that are native to Splunk. They range in use from charts that display data, to control groups, such as radio groups or dropdowns. These are also included with Splunk and are included by default in the RequireJS declaration. ChartView The ChartView displays a series of data in the formats in the list as follows. Item number one shows an example of how each different chart is described and presented. Each ChartView is instantiated in the same way, the only difference is in what searches are used with which chart. Module Path: splunkjs/mvc/chartview Default Variable: ChartView JavaScript Object instantiation        Var myBarChart = new ChartView({            id: "myBarChart",             managerid: "searchManagerId",            type: "bar",            el: $("#mybarchart")        }); The only required property is the id property. This assigns the object an id that can be later referenced as needed. The el option refers to the HTML element in the page that this view will be assigned and created within. The managerid relates to an existing search, saved search, or post process manager object. The results are passed from the manager into the chart view and displayed as indicated. Each chart view can be customized extensively using the charting.* properties. For example, charting.chart.overlayFields, when set to a comma separated list of field names, will overlay those fields over the chart of other data, making it possible to display SLA times over the top of Customer Service Metrics. The full list of configurable options can be found at the following link: http://docs.splunk.com/Documentation/Splunk/latest/Viz/ChartConfigurationReference. The different types of ChartView Now that we've introduced the ChartView module, let's look at the different types of charts that are built-in. This section has been presented in the following format: Name of Chart Short description of the chart type Type property for use in the JavaScript configuration Example chart command that can be displayed with this chart type Example image of the chart The different ChartView types we will cover in this section include the following: Area The area chart is similar to the line chart, and compares quantitative data. The graph is filled with color to show volume. This is commonly used to show statistics of data over time. An example of an area chart is as follows: timechart span=1h max(results.collection1{}.meh_clicks) as MehClicks max(results.collection1{}.visitors) as Visits Bar The bar chart is similar to the column chart, except that the x and y axes have been switched, and the bars run horizontally and not vertically. The bar chart is used to compare different categories. An example of a bar chart is as follows: stats max(results.collection1{}.visitors) as Visits max(results.collection1{}.meh_clicks) as MehClicks by results.collection1{}.title.text Column The column chart is similar to the bar chart, but the bars are displayed vertically. An example of a column chart is as follows: timechart span=1h avg(DPS) as "Difference in Products Sold" Filler gauge The filler gauge is a Splunk-provided visualization. It is intended for single values, normally as a percentage, but can be adjusted to use discrete values as well. The gauge uses different colors for different ranges of values, by default using green, yellow, and red, in that order. These colors can also be changed using the charting.* properties. One of the differences between this gauge and the other single value gauges is that it shows both the color and value close together, whereas the others do not. An example of a filler gauge chart is as follows: eval diff = results.collection1{}.meh_clicks / results.collection1{}.visitors * 100 | stats latest(diff) as D Line The line chart is similar to the area chart but does not fill the area under the line. This chart can be used to display discrete measurements over time. An example of a line chart is as follows: timechart span=1h max(results.collection1{}.meh_clicks) as MehClicks max(results.collection1{}.visitors) as Visits Marker gauge The marker gauge is a Splunk native visualization intended for use with a single value. Normally this will be a percentage of a value, but can be adjusted as needed. The gauge uses different colors for different ranges of values, by default using green, yellow, and red, in that order. These colors can also be changed using the charting.* properties. An example of a marker gauge chart is as follows: eval diff = results.collection1{}.meh_clicks / results.collection1{}.visitors * 100 | stats latest(diff) as D Pie Chart A pie chart is useful for displaying percentages. It gives you the ability to quickly see which part of the "pie" is disproportionate to the others. Actual measurements may not be relevant. An example of a pie chart is as follows: top op_action Radial gauge The radial gauge is another single value chart provided by Splunk. It is normally used to show percentages, but can be adjusted to show discrete values. The gauge uses different colors for different ranges of values, by default using green, yellow, and red, in that order. These colors can also be changed using the charting.* properties. An example of a radial gauge is as follows: eval diff = MC / V * 100 | stats latest(diff) as D Scatter The scatter plot can plot two sets of data on an x and y axis chart (Cartesian coordinates). This chart is primarily time independent, and is useful for finding correlations (but not necessarily causation) in data. An example of a scatter plot is as follows: table MehClicks Visitors Summary We covered some deeper elements of Splunk applications and visualizations. We reviewed each of the SplunkJS modules, how to instantiate them, and gave an example of each search-related modules and view-related modules. Resources for Article: Further resources on this subject: Introducing Splunk [article] Lookups [article] Loading data, creating an app, and adding dashboards and reports in Splunk [article]
Read more
  • 0
  • 0
  • 3523

article-image-upgrading-vmware-virtual-infrastructure-setups
Packt
04 Jun 2015
13 min read
Save for later

Upgrading VMware Virtual Infrastructure Setups

Packt
04 Jun 2015
13 min read
In this article by Kunal Kumar and Christian Stankowic, authors of the book VMware vSphere Essentials, you will learn how to correctly upgrade VMware virtual infrastructure setups. (For more resources related to this topic, see here.) This article will cover the following topics: Prerequisites and preparations Upgrading vCenter Server Upgrading ESXi hosts Additional steps after upgrading An example scenario Let's start with a realistic scenario that is often found in data centers these days. I assume that your virtual infrastructure consists of components such as: Multiple VMware ESXi hosts Shared storage (NFS or Fibre-channel) VMware vCenter Server and vSphere Update Manager In this example, a cluster consisting of two ESXi hosts (esxi1 and esxi2) is running VMware ESXi 5.5. On a virtual machine (vc1), a Microsoft Windows Server system is running vCenter Server and vSphere Update Manager (vUM) 5.5. This article is written as a step-by-step guide to upgrade these particular vSphere components to the most recent version, which is 6.0. Example scenario consisting of two ESXi hosts with shared storage and vCenter Server Prerequisites and preparations Before we start the upgrade, we need to fulfill the following prerequisites: Ensure ESXi version support by the hardware vendor Gurarantee ESXi version support on used hardware by VMware Create a backup of the ESXi images and vCenter Server First of all, we need to refer to our hardware vendor's support matrix to ensure that our physical hosts running VMware ESXi are supported in the new release. Hardware vendors evaluate their systems before approving upgrades to customers. As an example, Dell offers a comprehensive list for their PowerEdge servers at http://topics-cdn.dell.com/pdf/vmware-esxi-6.x_Reference%20Guide2_en-us.pdf. Here are some additional links for alternative hardware vendors: Hewlett-Packard: http://h17007.www1.hp.com/us/en/enterprise/servers/supportmatrix/vmware.aspx IBM: http://www-03.ibm.com/systems/info/x86servers/serverproven/compat/us/nos/vmware.html Cisco UCS: http://www.cisco.com/web/techdoc/ucs/interoperability/matrix/matrix.html When using Fibre-channel-based storage systems, you might also need to ensure fulfilling that vendor's support matrix. Please check out your vendor's website or contact support for this information. VMware also offers a comprehensive list of tested hardware setups at http://www.vmware.com/resources/compatibility/pdf/vi_systems_guide.pdf. In their Compatibility Guide portal, VMware enabled customers to browse for particular server systems—this information might be more recent than the aforementioned PDF file. Creating a backup of ESXi Before upgrading our ESXi hosts, we also need to make sure that we have a valid backup. In case things go wrong, we might need this backup to restore the previous ESXi version. For creating a backup of the hard disk ESXi is installed on, there are a plenty of tools in the market that implement image-based backups. One possible solution, which is free, is Clonezilla. Clonezilla is a Linux-based live medium that can easily create backup images of hard disks. To create a backup using Clonezilla, proceed with the following steps: Download the Clonezilla ISO image from their website. Make sure you select the AMD64 architecture and the ISO file format. Enable maintenance mode for the particular ESXi host. Make sure you migrate virtual machines to alternative nodes or power them off. Connect the ISO file to the ESXi host and boot from CD. Also, connect a USB drive to the host. This drive will be used to store the backup. Boot from CD and select Clonezilla live. Wait until the boot process completes. When prompted, select your keyboard layout (for example, en_US.utf8) and select Don't touch keymap. In the Start Clonezilla menu, select Start_Clonezilla and device-image. This mode creates an image of the medium ESXi is running on and stores it in the USB storage. Select local_dev and choose the USB storage connected to the host from the list in the next step. Select a folder for storing the backup (optional). Select Beginner and savedisk to store the entire disk ESXi resides on as an image. Enter a name for the backup. Select the hard disk containing the ESXi installation and proceed. You can also specify whether Clonezilla should check the image after creating it (highly recommended). Afterwards, confirm the backup process. The backup job will start immediately. Once the backup completes, select reboot from the menu to reboot the host. A running backup job in Clonezilla To restore a backup using Clonezilla, perform the following steps after booting the Clonezilla media: Complete steps 1 to 8 from the previous guide. Select Beginner and restoredisk to restore the entire disk. Select the image from the USB storage and the hard drive the image should be restored on. Acknowledge the restore process. Once the restoration completes, select reboot from the menu to reboot the host. For the system running vCenter Server, we can easily create a VM snapshot, or also use Clonezilla if a physical machine is used instead. The upgrade path It is very important to execute the particular upgrade tasks in the following order: Upgrade VMware vCenter Server Upgrade the particular ESXi hosts Reformat or upgrade the VMFS data stores (if applicable) Upgrading additional components, such as distributed virtual switches, or additional appliances The first step is to upgrade vCenter Server. This is necessary to ensure that we are able to manage our ESXi hosts after upgrading them. Newer vCenter Server versions are downward compatible with numerous ESXi versions. To double-check this, we can look up the particular version support by browsing VMware's Product Interoperability Matrix on their website. Click on Solution Interoperability, choose VMware vCenter Server from the drop-down menu, and select the version you want to upgrade to. In our example, we will choose the most recent release, 6.0, and select VMware ESX/ESXi from the Add Platform/Solution drop-down menu. VMware Product Interoperability Matrix for vCenter Server and ESXi vCenter Server 6.0 supports management of VMware ESXi 5.0 and higher. We need to ensure the same support agreement for any other used products, such as these: VMware vSphere Update Manager VMware vCenter Operations (if applicable) VMware vSphere Data Protection In other words, we need to upgrade all additional vSphere and vCenter Server components to ensure full functionality. Upgrading vCenter Server Upgrading vCenter Server is the most crucial step, as this is our central management platform. The upgrade process varies according to the chosen architecture. Upgrading Windows-based vCenter Server installations is quite easy, as the installation supports in-place upgrades. When using the vCenter Server Appliance (vCSA), there is no in-place upgrade; it is necessary to deploy a new vCSA and import the settings from the old installation. This process varies between the particular vCSA versions. For upgrading from vCSA 5.0 or 5.1 to 5.5, VMware offers a comprehensive article at http://kb.vmware.com/kb/2058441. To upgrade vCenter Server 5.x on Windows to 6.0 using the Easy Install method, proceed with the following steps: Mount the vCenter Server 6.x installation media (VMware-VIMSetup-all-6.0.0-xxx.iso) on the server running vCenter Server. Wait until the installation wizard starts; if it doesn't start, double-click on the CD/DVD icon in Windows Explorer. Select vCenter Server for Windows and click on Install to start the installation utility. Accept the End-User License Agreement (EULA). Enter the current vCenter Single-Sign-On password and proceed with the next step. The installation utility begins to execute pre-upgrade checks; this might take some time. If you're running vCenter Server along with Microsoft SQL Server Express Edition, the database will be migrated to VMware vPostgres. Review and change (if necessary) the network ports of your vCenter Server installation. If needed, change the directories for vCenter Server and the Embedded Platform Controller (ESC). Carefully review the upgrade information displayed in the wizard. Also verify that you have created a backup of your system and the database. Then click on Upgrade to start the upgrade. After the upgrade, vSphere Web Client can be used to connect to the upgraded vCenter Server system. Also note that the Microsoft SQL Server Express Edition database is not used anymore. Upgrading ESXi hosts Upgrading ESXi hosts can be done using two methods: Using the installation media from the VMware website vSphere Update Manager If you need to upgrade a large number of ESXi hosts, I recommend that you use vSphere Update Manager to save time, as it can automate the particular steps. For smaller landscapes, using the installation media is easier. For using vUM to upgrade ESXi hosts, VMware offers a guide on their knowledge base at http://kb.vmware.com/kb/1019545. In order to upgrade an ESXi host using the installation media, perform the following steps: First of all, enable maintenance mode for the particular ESXi host. Make sure you migrate the virtual machines to alternative nodes or power them off. Connect the installation media to the ESXi host and boot from CD. Once the setup utility becomes available, press Enter to start the installation wizard. Accept the End-User License Agreement (EULA) by pressing F11. Select the disk containing the current ESXi installation. In the ESXi found dialog, select Upgrade. Review the installation information and press F11 to start the upgrade. After the installation completes, press Enter to reboot the system. After the system has rebooted, it will automatically reconnect to vCenter Server. Select the particular ESXi host to see whether the version has changed. In this example, the ESXi host has been successfully upgraded to version 6.0: Version information of an updated ESXi host running release 6.0 Repeat all of these steps for all the remaining ESXi hosts. Note that running an ESXi cluster with mixed versions should only be a temporary solution. It is not recommended to mix various ESXi releases in production usage, as the various features of ESXi might not perform as expected in mixed clusters. Additional steps After upgrading vCenter Server and our ESXi hosts, there are additional steps that can be done: Reformating or upgrading VMFS data stores Upgrading distributed virtual switches Upgrading virtual machine's hardware versions Upgrading VMFS data stores VMware's VMFS (Virtual Machine Filesystem) is the most used filesystem for shared storage. It can be used along with local storage, iSCSI, or Fibre-channel storage. Particularly, ESX(i) releases support various versions of VMFS. Let's take a look at the major differences:   VMFS 2   VMFS 3   VMFS 5   Supported by ESX 2.x, ESXi 3.x/4.x (read-only) ESX(i) 3.x and higher ESXi 5.x and higher Block size(s) 1, 8, 64, or 256 MB 1, 2, 4, or 8 MB 1 MB (fixed) Maximum file size 1 MB block size: 456 MB 8 MB block size: 2.5 TB 64 MB block size: 28.5 TB 256 MB block size: 64 TB 1 MB block size: 256 MB 2 MB block size: 512 GB 4 MB block size: 1 TB 8 MB block size: 2 TB 62 TB Files per volume Ca. 256 (no directories supported) Ca. 37,720 Ca. 130,690 When migrating from an ESXi version such as 4.x or older, it is possible to upgrade VMFS data stores to version 5. VMFS 2 cannot be upgraded to VMFS 5; it first needs to be upgraded to VMFS 3. To enable the upgrade, a VMFS 2 volume must not have a block size more than 8 MB, as VMFS 3 only supports block sizes up to 8 MB. In comparison with older VMFS versions, VMFS 5 supports larger file sizes and more files per volume. I highly recommend that you reformat VMFS data stores instead of upgrading them, as the upgrade does not change the filesystem's block size. Because of this limitation, you won't benefit from all the new VMFS 5 features after an upgrade. To upgrade a VMFS 3 volume to VMFS 5, perform these steps: Log in to vSphere Web Client. Go to the Storage pane. Click on the data store to upgrade and go to Settings under the Manage tab. Click on Upgrade to VMFS5. Then click on OK to start the upgrade. VMware vNetwork Distributed Switch When using vNetwork Distributed Switches (also often called dvSwitches) it is recommended to perform an upgrade to the latest version. In comparison with vNetwork Standard Switches (also called vSwitches), dvSwitches are created at the vCenter Server level and replicated to all subscribed ESXi hosts. When creating a dvSwitch, the administrator can choose between various dvSwitch versions. After upgrading vCenter Server and the ESXi hosts, additional features can be unlocked by upgrading the dvSwitch. Let's take a look at some commonly used dvSwitch versions:   vDS 5.0   vDS 5.1   vDS 5.5   vDS 6.0   Compatible with ESXi 5.0 and higher ESXi 5.1 and higher ESXi 5.5 and higher ESXi 6.0 Common features Network I/O Control, load-based teaming, traffic shaping, VM port blocking, PVLANs (private VLANs), network vMotion, and port policies Additional features Network resource pools, NetFlow, and port mirroring VDS 5.0 +, management network rollback, network health checks, enhanced port mirroring, and LACP (Link Aggregation Control Protocol) VDS 5.1 +, traffic filtering, and enhanced LACP functionality VDS 5.5 +, multicast snooping, and Network I/O Control version 3 (bandwidth guarantee) It is also possible to use the old version furthermore, as vCenter Server is downward compatible with numerous dvSwitch versions. Upgrading a dvSwitch is a task that cannot be undone. During the upgrade, it is possible that virtual machines will lose their network connectivity for some seconds. After the upgrade, older ESXi hosts will not be able to participate in the distributed switch setup. To upgrade a dvSwitch, perform the following steps: Log in to vSphere Web Client. Go to the Networking pane and select the dvSwitch to upgrade. Lorem..... After upgrading the dvSwitch, you will notice that the version has changed: Version information of a dvSwitch running VDS 6.0 Virtual machine hardware version Every virtual machine is created with a virtual machine hardware version specified (also called VMHW or vHW). A vHW version defines a set of particular limitations and features, such as controller types or network cards. To benefit from the new virtual machine features, it is sufficient to upgrade vHW versions. ESXi hosts support a range of vHW versions, but it is always advisable to use the most recent vHW version. Once a vHW version is upgraded, particular virtual machines cannot be started on older ESXi versions that don't support the vHW version. Let's take a deeper look at some popular vHW versions:   vSphere 4.1   vSphere 5.1   vSphere 5.5   vSphere 6.0   Maximum vHW 7 9 10 11 Virtual CPUs 8 64 128 Virtual RAM 255 GB 1 TB 4 TB vDisk size 2 TB 62 TB SCSI adapters / targets 4/60 SATA adapters / targets Not supported 4/30 Parallel / Serial Ports 3/4 3/32 USB controllers / devices per VM 1/20 (USB 1.x + 2.x) 1/20 (USB 1.x, 2.x + 3.x) The upgrade cannot be undone. Also, it might be necessary to update VMware Tools and the drivers of the operating system running in the virtual machine. Summary In this article we learnt how to correctly upgrade VMware virtual infrastructure setups. If you want to know more about VMware vSphere and virtual infrastructure setups, go ahead and get your copy of Packt Publishing's book VMware vSphere Essentials. Resources for Article: Further resources on this subject: Networking [article] The Design Documentation [article] VMware View 5 Desktop Virtualization [article]
Read more
  • 0
  • 0
  • 7692

article-image-predicting-hospital-readmission-expense-using-cascading
Packt
04 Jun 2015
10 min read
Save for later

Predicting Hospital Readmission Expense Using Cascading

Packt
04 Jun 2015
10 min read
In this article by Michael Covert, author of the book Learning Cascading, we will look at a system that allows for health care providers to create complex predictive models that can assess who is most at risk for such readmission using Cascading. (For more resources related to this topic, see here.) Overview Hospital readmission is an event that health care providers are attempting to reduce, and it is the primary target of new regulations of the Affordable Care Act, passed by the US government. A readmission is defined as any reentry to a hospital 30 days or less from a prior discharge. The financial impact of this is that US Medicare and Medicaid will either not pay or will reduce the payment made to hospitals for expenses incurred. By the end of 2014, over 2600 hospitals will incur these losses from a Medicare and Medicaid tab that is thought to exceed $24 billion annually. Hospitals are seeking to find ways to predict when a patient is susceptible to readmission so that actions can be taken to fully treat the patient before discharge. Many of them are using big data and machine learning-based predictive analytics. One such predictive engine is MedPredict from Analytics Inside, a company based in Westerville, Ohio. MedPredict is the predictive modeling component of the MedMiner suite of health care products. These products use Concurrent Cascading products to perform nightly rescoring of inpatients using a highly customizable calculation known as LACE, which stands for the following: Length of stay: This refers to the number of days a patient been in hospital Acute admissions through emergency department: This refers to whether a patient has arrived through the ER Comorbidities: A comorbidity refers to the presence of a two or more individual conditions in a patient. Each condition is designated by a diagnosis code. Diagnosis codes can also indicate complications and severity of a condition. In LACE, certain conditions are associated with the probability of readmission through statistical analysis. For instance, a diagnosis of AIDS, COPD, diabetes, and so on will each increase the probability of readmission. So, each diagnosis code is assigned points, with other points indicating "seriousness" of the condition. Diagnosis codes: These refer to the International Classification of Disease codes. Version 9 (ICD-9) and now version 10 (ICD-10) standards are available as well. Emergency visits: This refers to the number of emergency room visits the patient has made in a particular window of time. The LACE engine looks at a patient's history and computes a score that is a predictor of readmissions. In order to compute the comorbidity score, the Charlson Comorbidity Index (CCI) calculation is used. It is a statistical calculation that factors in the age and complexity of the patient's condition. Using Cascading to control predictive modeling The full data workflow to compute the probability of readmissions is as follows: Read all hospital records and reformat them into patient records, diagnosis records, and discharge records. Read all data related to patient diagnosis and diagnosis records, that is, ICD-9/10, date of diagnosis, complications, and so on. Read all tracked diagnosis records and join them with patient data to produce a diagnosis (comorbidity) score by summing up comorbidity "points". Read all data related to patient admissions, that is, records associated with admission and discharge, length of stay, hospital, admittance location, stay type, and so on. Read patient profile record, that is, age, race, gender, ethnicity, eye color, body mass indicator, and so on. Compute all intermediate scores for age, emergency visits, and comorbidities. Calculate the LACE score (refer to Figure 2). Assign a date and time to it. Take all the patient information, as mentioned in the preceding points, and run it through MedPredict to produce these variety of metrics: Expected length of stay Expected expense Expected outcome Probability of readmission Figure 1 – The data workflow The Cascading LACE engine The calculational aspects of computing LACE scores makes it ideal for Cascading as a series of reusable subassemblies. Firstly, the extraction, transformation, and loading (ETL) of patient data is complex and costly. Secondly, the calculations are data-intensive. The CCI alone has to examine a patient's medical history and must find all matching diagnosis codes (such as ICD-9 or ICD-10) to assign a score. This score must be augmented by the patient's age, and lastly, a patient's inpatient discharge records must be examined for admittance to the ER as well as emergency room visits. Also, many hospitals desire to customize these calculations. The LACE engine supports and facilitates this since scores are adjustable at the diagnosis code level, and MedPredict automatically produces metrics about how significant an individual feature is to the resulting score. Medical data is quite complex too. For instance, the particular diagnosis codes that represent cancer are many, and their meanings are quite nuanced. In some cases, metastasis (spreading of cancer to other locations in the body) may have occurred, and this is treated as a more severe situation. In other situations, measured values may be "bucketed", so this implies that we track the number of emergency room visits over 1 year, 6 months, 90 days, and 30 days. The Cascading LACE engine performs these calculations easily. It is customized through a set of hospital supplied parameters, and it has the capability to perform full calculations nightly due to its usage of Hadoop. Using this capability, a patient's record can track the full history of the LACE index over time. Additionally, different sets of LACE indices can be computed simultaneously, maybe one used for diabetes, the other for Chronic Obstructive Pulmonary Disorder (COPD), and so on. Figure 2 – The LACE subassembly MedPredict tracking The Lace engine metrics feed into MedPredict along with many other variables cited previously. These records are rescored nightly and the patient history is updated. This patient history is then used to analyze trends and generate alerts when the patient is showing an increased likelihood of variance to the desired metric values. What Cascading does for us We chose Cascading to help reduce the complexity of our development efforts. MapReduce provided us with the scalability that we desired, but we found that we were developing massive amounts of code to do so. Reusability was difficult, and the Java code library was becoming large. By shifting to Cascading, we found that we could encapsulate our code better and achieve significantly greater reusability. Additionally, we reduced complexity as well. The Cascading API provides simplification and understandability, which accelerates our development velocity metrics and also reduces bugs and maintenance cycles. We allow Cascading to control the end-to-end workflow of these nightly calculations. It handles preprocessing and formatting of data. Then, it handles running these calculations in parallel, allowing high speed hash joins to be performed, and also for each leg of the calculation to be split into a parallel pipe. Next, all these calculations are merged and the final score is produced. The last step is to analyze the patient trends and generate alerts where potential problems are likely to occur. Cascading has allowed us to produce a reusable assembly that is highly parameterized, thereby allowing hospitals to customize their usage. Not only can thresholds, scores, and bucket sizes be varied, but if it's desired, additional information could be included for things, such as medical procedures performed on the patient. The local mode of Cascading allows for easy testing, and it also provides a scaled down version that can be run against a small number of patients. However, by using Cascading in the Hadoop mode, massive scalability can be achieved against very large patient populations and ICD-9/10 code sets. Concurrent also provides an excellent framework for predictive modeling using machine learning through its Pattern component. MedPredict uses this to integrate its predictive engine, which is written using Cascading, MapReduce, and Mahout. Pattern provides an interface for the integration of other external analysis products through the exchange of Predictive Model Markup Language (PMML), an XML dialect that allows many of the MedPredict proprietary machine learning algorithms to be directly incorporated into the full Cascading LACE workflow. MedPredict then produces a variety of predictive metrics in a single pass of the data. The LACE scores (current and historical trends) are used as features for these predictions. Additionally, Concurrent provides a product called Driven that greatly reduces the development cycle time for such large, complex applications. Their lingual product provides seamless integration with relational databases, which is also key to enterprise integration. Results Numerous studies have now been performed using LACE risk estimates. Many hospitals have shown the ability to reduce readmission rates by 5-10 percent due to early intervention and specific guidance given to a patient as a result of an elevated LACE score. Other studies are examining the efficacy of additional metrics, and of segmentation of the patients into better identifying groups, such as heart failure, cancer, diabetes, and so on. Additional effort is being put in to study the ability of modifying the values of the comorbidity scores, taking into account combinations and complications. In some cases, even more dramatic improvements have taken place using these techniques. For up-to-date information, search for LACE readmissions, which will provide current information about implementations and results. Analytics Inside LLC Analytics Inside is based in Westerville, Ohio. It was founded in 2005 and specializes in advanced analytical solutions and services. Analytics Inside produces the RelMiner family of relationship mining systems. These systems are based on machine learning, big data, graph theories, data visualizations, and Natural Language Processing (NLP). For further information, visit our website at http://www.AnalyticsInside.us, or e-mail us at info@AnalyticsInside.us. MedMiner Advanced Analytics for Health Care is an integrated software system designed to help an organization or patient care team in the following ways: Predicting the outcomes of patient cases and tracking these predictions over time Generating alerts based on patient case trends that will help direct remediation Complying better with ARRA value-based purchasing and meaningful use guidelines Providing management dashboards that can be used to set guidelines and track performance Tracking performance of drug usage, interactions, potentials for drug diversion, and pharmaceutical fraud Extracting medical information contained within text documents Designating data security is a key design point PHI can be hidden through external linkages, so data exchange is not required If PHI is required, it is kept safe through heavy encryption, virus scanning, and data isolation Using both cloud-based and on premise capabilities to meet client needs Concurrent Inc. Concurrent Inc. is the leader in big data application infrastructure, delivering products that help enterprises create, deploy, run, and manage data applications at scale. The company's flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 175,000 user downloads a month. Used by thousands of businesses, including eBay, Etsy, The Climate Corporation, and Twitter, Cascading is the defacto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and can be found online at http://concurrentinc.com. Summary Hospital readmission is an event that health care providers are attempting to reduce, and it is a primary target of new regulation from the Affordable Care Act, passed by the US government. This article describes a system that allows for health care providers to create complex predictive models that can assess who is most at risk for such readmission using Cascading. Resources for Article: Further resources on this subject: Hadoop Monitoring and its aspects [article] Introduction to Hadoop [article] YARN and Hadoop [article]
Read more
  • 0
  • 0
  • 1933

article-image-data-analysis-using-r
Packt
04 Jun 2015
17 min read
Save for later

Data Analysis Using R

Packt
04 Jun 2015
17 min read
In this article by Viswa Viswanathan and Shanthi Viswanathan, the authors of the book R Data Analysis Cookbook, we discover how R can be used in various ways such as comparison, classification, applying different functions, and so on. We will cover the following recipes: Creating charts that facilitate comparisons Building, plotting, and evaluating – classification trees Using time series objects Applying functions to subsets of a vector (For more resources related to this topic, see here.) Creating charts that facilitate comparisons In large datasets, we often gain good insights by examining how different segments behave. The similarities and differences can reveal interesting patterns. This recipe shows how to create graphs that enable such comparisons. Getting ready If you have not already done so, download the code files and save the daily-bike-rentals.csv file in your R working directory. Read the data into R using the following command: > bike <- read.csv("daily-bike-rentals.csv") > bike$season <- factor(bike$season, levels = c(1,2,3,4),   labels = c("Spring", "Summer", "Fall", "Winter")) > attach(bike) How to do it... We base this recipe on the task of generating histograms to facilitate the comparison of bike rentals by season. Using base plotting system We first look at how to generate histograms of the count of daily bike rentals by season using R's base plotting system: Set up a 2 X 2 grid for plotting histograms for the four seasons: > par(mfrow = c(2,2)) Extract data for the seasons: > spring <- subset(bike, season == "Spring")$cnt > summer <- subset(bike, season == "Summer")$cnt > fall <- subset(bike, season == "Fall")$cnt > winter <- subset(bike, season == "Winter")$cnt Plot the histogram and density for each season: > hist(spring, prob=TRUE,   xlab = "Spring daily rentals", main = "") > lines(density(spring)) >  > hist(summer, prob=TRUE,   xlab = "Summer daily rentals", main = "") > lines(density(summer)) >  > hist(fall, prob=TRUE,   xlab = "Fall daily rentals", main = "") > lines(density(fall)) >  > hist(winter, prob=TRUE,   xlab = "Winter daily rentals", main = "") > lines(density(winter)) You get the following output that facilitates comparisons across the seasons: Using ggplot2 We can achieve much of the preceding results in a single command: > qplot(cnt, data = bike) + facet_wrap(~ season, nrow=2) +   geom_histogram(fill = "blue") You can also combine all four into a single histogram and show the seasonal differences through coloring: > qplot(cnt, data = bike, fill = season) How it works... When you plot a single variable with qplot, you get a histogram by default. Adding facet enables you to generate one histogram per level of the chosen facet. By default, the four histograms will be arranged in a single row. Use facet_wrap to change this. There's more... You can use ggplot2 to generate comparative boxplots as well. Creating boxplots with ggplot2 Instead of the default histogram, you can get a boxplot with either of the following two approaches: > qplot(season, cnt, data = bike, geom = c("boxplot"), fill = season) >  > ggplot(bike, aes(x = season, y = cnt)) + geom_boxplot() The preceding code produces the following output: The second line of the preceding code produces the following plot: Building, plotting, and evaluating – classification trees You can use a couple of R packages to build classification trees. Under the hood, they all do the same thing. Getting ready If you do not already have the rpart, rpart.plot, and caret packages, install them now. Download the data files and place the banknote-authentication.csv file in your R working directory. How to do it... This recipe shows you how you can use the rpart package to build classification trees and the rpart.plot package to generate nice-looking tree diagrams: Load the rpart, rpart.plot, and caret packages: > library(rpart) > library(rpart.plot) > library(caret) Read the data: > bn <- read.csv("banknote-authentication.csv") Create data partitions. We need two partitions—training and validation. Rather than copying the data into the partitions, we will just keep the indices of the cases that represent the training cases and subset as and when needed: > set.seed(1000) > train.idx <- createDataPartition(bn$class, p = 0.7, list = FALSE) Build the tree: > mod <- rpart(class ~ ., data = bn[train.idx, ], method = "class", control = rpart.control(minsplit = 20, cp = 0.01)) View the text output (your result could differ if you did not set the random seed as in step 3): > mod n= 961   node), split, n, loss, yval, (yprob)      * denotes terminal node   1) root 961 423 0 (0.55983351 0.44016649)    2) variance>=0.321235 511 52 0 (0.89823875 0.10176125)      4) curtosis>=-4.3856 482 29 0 (0.93983402 0.06016598)        8) variance>=0.92009 413 10 0 (0.97578692 0.02421308) *        9) variance< 0.92009 69 19 0 (0.72463768 0.27536232)        18) entropy< -0.167685 52   6 0 (0.88461538 0.11538462) *        19) entropy>=-0.167685 17   4 1 (0.23529412 0.76470588) *      5) curtosis< -4.3856 29   6 1 (0.20689655 0.79310345)      10) variance>=2.3098 7   1 0 (0.85714286 0.14285714) *      11) variance< 2.3098 22   0 1 (0.00000000 1.00000000) *    3) variance< 0.321235 450 79 1 (0.17555556 0.82444444)      6) skew>=6.83375 76 18 0 (0.76315789 0.23684211)      12) variance>=-3.4449 57   0 0 (1.00000000 0.00000000) *      13) variance< -3.4449 19   1 1 (0.05263158 0.94736842) *      7) skew< 6.83375 374 21 1 (0.05614973 0.94385027)      14) curtosis>=6.21865 106 16 1 (0.15094340 0.84905660)        28) skew>=-3.16705 16   0 0 (1.00000000 0.00000000) *       29) skew< -3.16705 90   0 1 (0.00000000 1.00000000) *      15) curtosis< 6.21865 268   5 1 (0.01865672 0.98134328) * Generate a diagram of the tree (your tree might differ if you did not set the random seed as in step 3): > prp(mod, type = 2, extra = 104, nn = TRUE, fallen.leaves = TRUE, faclen = 4, varlen = 8, shadow.col = "gray") The following output is obtained as a result of the preceding command: Prune the tree: > # First see the cptable > # !!Note!!: Your table can be different because of the > # random aspect in cross-validation > mod$cptable            CP nsplit rel error   xerror       xstd 1 0.69030733     0 1.00000000 1.0000000 0.03637971 2 0.09456265     1 0.30969267 0.3262411 0.02570025 3 0.04018913     2 0.21513002 0.2387707 0.02247542 4 0.01891253     4 0.13475177 0.1607565 0.01879222 5 0.01182033     6 0.09692671 0.1347518 0.01731090 6 0.01063830     7 0.08510638 0.1323877 0.01716786 7 0.01000000     9 0.06382979 0.1276596 0.01687712   > # Choose CP value as the highest value whose > # xerror is not greater than minimum xerror + xstd > # With the above data that happens to be > # the fifth one, 0.01182033 > # Your values could be different because of random > # sampling > mod.pruned = prune(mod, mod$cptable[5, "CP"]) View the pruned tree (your tree will look different): > prp(mod.pruned, type = 2, extra = 104, nn = TRUE, fallen.leaves = TRUE, faclen = 4, varlen = 8, shadow.col = "gray") Use the pruned model to predict for a validation partition (note the minus sign before train.idx to consider the cases in the validation partition): > pred.pruned <- predict(mod, bn[-train.idx,], type = "class") Generate the error/classification-confusion matrix: > table(bn[-train.idx,]$class, pred.pruned, dnn = c("Actual", "Predicted"))      Predicted Actual   0   1      0 213 11      1 11 176 How it works... Steps 1 to 3 load the packages, read the data, and identify the cases in the training partition, respectively. In step 3, we set the random seed so that your results should match those that we display. Step 4 builds the classification tree model: > mod <- rpart(class ~ ., data = bn[train.idx, ], method = "class", control = rpart.control(minsplit = 20, cp = 0.01)) The rpart() function builds the tree model based on the following:   Formula specifying the dependent and independent variables   Dataset to use   A specification through method="class" that we want to build a classification tree (as opposed to a regression tree)   Control parameters specified through the control = rpart.control() setting; here we have indicated that the tree should only consider nodes with at least 20 cases for splitting and use the complexity parameter value of 0.01—these two values represent the defaults and we have included these just for illustration Step 5 produces a textual display of the results. Step 6 uses the prp() function of the rpart.plot package to produce a nice-looking plot of the tree: > prp(mod, type = 2, extra = 104, nn = TRUE, fallen.leaves = TRUE, faclen = 4, varlen = 8, shadow.col = "gray")   use type=2 to get a plot with every node labeled and with the split label below the node   use extra=4 to display the probability of each class in the node (conditioned on the node and hence summing to 1); add 100 (hence extra=104) to display the number of cases in the node as a percentage of the total number of cases   use nn = TRUE to display the node numbers; the root node is node number 1 and node n has child nodes numbered 2n and 2n+1   use fallen.leaves=TRUE to display all leaf nodes at the bottom of the graph   use faclen to abbreviate class names in the nodes to a specific maximum length   use varlen to abbreviate variable names   use shadow.col to specify the color of the shadow that each node casts Step 7 prunes the tree to reduce the chance that the model too closely models the training data—that is, to reduce overfitting. Within this step, we first look at the complexity table generated through cross-validation. We then use the table to determine the cutoff complexity level as the largest xerror (cross-validation error) value that is not greater than one standard deviation above the minimum cross-validation error. Steps 8 through 10 display the pruned tree; use the pruned tree to predict the class for the validation partition and then generate the error matrix for the validation partition. There's more... We discuss in the following an important variation on predictions using classification trees. Computing raw probabilities We can generate probabilities in place of classifications by specifying type="prob": > pred.pruned <- predict(mod, bn[-train.idx,], type = "prob") Create the ROC Chart Using the preceding raw probabilities and the class labels, we can generate a ROC chart: > pred <- prediction(pred.pruned[,2], bn[-train.idx,"class"]) > perf <- performance(pred, "tpr", "fpr") > plot(perf) Using time series objects In this recipe, we look at various features to create and plot time-series objects. We will consider data with both a single and multiple time series. Getting ready If you have not already downloaded the data files, do it now and ensure that the files are in your R working directory. How to do it... Read the data. The file has 100 rows and a single column named sales: > s <- read.csv("ts-example.csv") Convert the data to a simplistic time series object without any explicit notion of time: > s.ts <- ts(s) > class(s.ts) [1] "ts" Plot the time series: > plot(s.ts) Create a proper time series object with proper time points: > s.ts.a <- ts(s, start = 2002) > s.ts.a Time Series: Start = 2002 End = 2101 Frequency = 1        sales [1,]   51 [2,]   56 [3,]   37 [4,]   101 [5,]   66 (output truncated) > plot(s.ts.a) > # results show that R treated this as an annual > # time series with 2002 as the starting year The result of the preceding commands is seen in the following graph: To create a monthly time series run the following command: > # Create a monthly time series > s.ts.m <- ts(s, start = c(2002,1), frequency = 12) > s.ts.m        Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2002 51 56 37 101 66 63 45 68 70 107 86 102 2003 90 102 79 95 95 101 128 109 139 119 124 116 2004 106 100 114 133 119 114 125 167 149 165 135 152 2005 155 167 169 192 170 180 175 207 164 204 180 203 2006 215 222 205 202 203 209 200 199 218 221 225 212 2007 250 219 242 241 267 249 253 242 251 279 298 260 2008 269 257 279 273 275 314 288 286 290 288 304 291 2009 314 290 312 319 334 307 315 321 339 348 323 342 2010 340 348 354 291 > plot(s.ts.m) # note x axis on plot The following plot can be seen as a result of the preceding commands: > # Specify frequency = 4 for quarterly data > s.ts.q <- ts(s, start = 2002, frequency = 4) > s.ts.q        Qtr1 Qtr2 Qtr3 Qtr4 2002   51   56   37 101 2003   66   63   45   68 2004   70 107   86 102 2005   90 102   79   95 2006   95 101 128 109 (output truncated) > plot(s.ts.q) Query time series objects (we use the s.ts.m object we created in the previous step): > # When does the series start? > start(s.ts.m) [1] 2002   1 > # When does it end? > end(s.ts.m) [1] 2010   4 > # What is the frequency? > frequency(s.ts.m) [1] 12 Create a time series object with multiple time series. This data file contains US monthly consumer prices for white flour and unleaded gas for the years 1980 through 2014 (downloaded from the website of the US Bureau of Labor Statistics): > prices <- read.csv("prices.csv") > prices.ts <- ts(prices, start=c(1980,1), frequency = 12) Plot a time series object with multiple time series: > plot(prices.ts) The plot in two separate panels appears as follows: > # Plot both series in one panel with suitable legend > plot(prices.ts, plot.type = "single", col = 1:2) > legend("topleft", colnames(prices.ts), col = 1:2, lty = 1) Two series plotted in one panel appear as follow: How it works... Step 1 reads the data. Step 2 uses the ts function to generate a time series object based on the raw data. Step 3 uses the plot function to generate a line plot of the time series. We see that the time axis does not provide much information. Time series objects can represent time in more friendly terms. Step 4 shows how to create time series objects with a better notion of time. It shows how we can treat a data series as an annual, monthly, or quarterly time series. The start and frequency parameters help us to control these data series. Although the time series we provide is just a list of sequential values, in reality our data can have an implicit notion of time attached to it. For example, the data can be annual numbers, monthly numbers, or quarterly ones (or something else, such as 10-second observations of something). Given just the raw numbers (as in our data file, ts-example.csv), the ts function cannot figure out the time aspect and by default assumes no secondary time interval at all. We can use the frequency parameter to tell ts how to interpret the time aspect of the data. The frequency parameter controls how many secondary time intervals there are in one major time interval. If we do not explicitly specify it, by default frequency takes on a value of 1. Thus, the following code treats the data as an annual sequence, starting in 2002: > s.ts.a <- ts(s, start = 2002) The following code, on the other hand, treats the data as a monthly time series, starting in January 2002. If we specify the start parameter as a number, then R treats it as starting at the first subperiod, if any, of the specified start period. When we specify frequency as different from 1, then the start parameter can be a vector such as c(2002,1) to specify the series, the major period, and the subperiod where the series starts. c(2002,1) represent January 2002: > s.ts.m <- ts(s, start = c(2002,1), frequency = 12) Similarly, the following code treats the data as a quarterly sequence, starting in the first quarter of 2002: > s.ts.q <- ts(s, start = 2002, frequency = 4) The frequency values of 12 and 4 have a special meaning—they represent monthly and quarterly time sequences. We can supply start and end, just one of them, or none. If we do not specify either, then R treats the start as 1 and figures out end based on the number of data points. If we supply one, then R figures out the other based on the number of data points. While start and end do not play a role in computations, frequency plays a big role in determining seasonality, which captures periodic fluctuations. If we have some other specialized time series, we can specify the frequency parameter appropriately. Here are two examples:   With measurements taken every 10 minutes and seasonality pegged to the hour, we should specify frequency as 6   With measurements taken every 10 minutes and seasonality pegged to the day, use frequency = 24*6 (6 measurements per hour times 24 hours per day) Step 5 shows the use of the functions start, end, and frequency to query time series objects. Steps 6 and 7 show that R can handle data files that contain multiple time series. Applying functions to subsets of a vector The tapply function applies a function to each partition of the dataset. Hence, when we need to evaluate a function over subsets of a vector defined by a factor, tapply comes in handy. Getting ready Download the files and store the auto-mpg.csv file in your R working directory. Read the data and create factors for the cylinders variable: > auto <- read.csv("auto-mpg.csv", stringsAsFactors=FALSE) > auto$cylinders <- factor(auto$cylinders, levels = c(3,4,5,6,8),   labels = c("3cyl", "4cyl", "5cyl", "6cyl", "8cyl")) How to do it... To apply functions to subsets of a vector, follow these steps: Calculate mean mpg for each cylinder type: > tapply(auto$mpg,auto$cylinders,mean)      3cyl     4cyl     5cyl     6cyl     8cyl 20.55000 29.28676 27.36667 19.98571 14.96311 We can even specify multiple factors as a list. The following example shows only one factor since the out file has only one, but it serves as a template that you can adapt: > tapply(auto$mpg,list(cyl=auto$cylinders),mean)   cyl    3cyl     4cyl     5cyl     6cyl     8cyl 20.55000 29.28676 27.36667 19.98571 14.96311 How it works... In step 1 the mean function is applied to the auto$mpg vector grouped according to the auto$cylinders vector. The grouping factor should be of the same length as the input vector so that each element of the first vector can be associated with a group. The tapply function creates groups of the first argument based on each element's group affiliation as defined by the second argument and passes each group to the user-specified function. Step 2 shows that we can actually group by several factors specified as a list. In this case, tapply applies the function to each unique combination of the specified factors. There's more... The by function is similar to tapply and applies the function to a group of rows in a dataset, but by passing in the entire data frame. The following examples clarify this. Applying a function on groups from a data frame In the following example, we find the correlation between mpg and weight for each cylinder type: > by(auto, auto$cylinders, function(x) cor(x$mpg, x$weight)) auto$cylinders: 3cyl [1] 0.6191685 --------------------------------------------------- auto$cylinders: 4cyl [1] -0.5430774 --------------------------------------------------- auto$cylinders: 5cyl [1] -0.04750808 --------------------------------------------------- auto$cylinders: 6cyl [1] -0.4634435 --------------------------------------------------- auto$cylinders: 8cyl [1] -0.5569099 Summary Being an extensible system, R's functionality is divided across numerous packages with each one exposing large numbers of functions. Even experienced users cannot expect to remember all the details off the top of their head. In this article, we went through a few techniques using which R helps analyze data and visualize the results. Resources for Article: Further resources on this subject: Combining Vector and Raster Datasets [article] Factor variables in R [article] Big Data Analysis (R and Hadoop) [article]
Read more
  • 0
  • 0
  • 3583
article-image-installing-openstack-swift
Packt
04 Jun 2015
10 min read
Save for later

Installing OpenStack Swift

Packt
04 Jun 2015
10 min read
In this article by Amar Kapadia, Sreedhar Varma, and Kris Rajana, authors of the book OpenStack Object Storage (Swift) Essentials, we will see how IT administrators can install OpenStack Swift. The version discussed here is the Juno release of OpenStack. Installation of Swift has several steps and requires careful planning before beginning the process. A simple installation consists of installing all Swift components on a single node, and a complex installation consists of installing Swift on several proxy server nodes and storage server nodes. The number of storage nodes can be in the order of thousands across multiple zones and regions. Depending on your installation, you need to decide on the number of proxy server nodes and storage server nodes that you will configure. This article demonstrates a manual installation process; advanced users may want to use utilities such as Puppet or Chef to simplify the process. This article walks you through an OpenStack Swift cluster installation that contains one proxy server and five storage servers. (For more resources related to this topic, see here.) Hardware planning This section describes the various hardware components involved in the setup. Since Swift deals with object storage, disks are going to be a major part of hardware planning. The size and number of disks required should be calculated based on your requirements. Networking is also an important component, where factors such as a public or private network and a separate network for communication between storage servers need to be planned. Network throughput of at least 1 GB per second is suggested, while 10 GB per second is recommended. The servers we set up as proxy and storage servers are dual quad-core servers with 12 GB of RAM. In our setup, we have a total of 15 x 2 TB disks for Swift storage; this gives us a total size of 30 TB. However, with in-built replication (with a default replica count of 3), Swift maintains three copies of the same data. Therefore, the effective capacity for storing files and objects is approximately 10 TB, taking filesystem overhead into consideration. This is further reduced due to less than 100 percent utilization. The following figure depicts the nodes of our Swift cluster configuration: The storage servers have container, object, and account services running in them. Server setup and network configuration All the servers are installed with the Ubuntu server operating system (64-bit LTS version 14.04). You'll need to configure three networks, which are as follows: Public network: The proxy server connects to this network. This network provides public access to the API endpoints within the proxy server. Storage network: This is a private network and it is not accessible to the outside world. All the storage servers and the proxy server will connect to this network. Communication between the proxy server and the storage servers and communication between the storage servers take place within this network. In our configuration, the IP addresses assigned in this network are 172.168.10.0 and 172.168.10.99. Replication network: This is also a private network that is not accessible to the outside world. It is dedicated to replication traffic, and only storage servers connect to it. All replication-related communication between storage servers takes place within this network. In our configuration, the IP addresses assigned in this network are 172.168.9.0 and 172.168.9.99. This network is optional, and if it is set up, the traffic on it needs to be monitored closely. Pre-installation steps In order for various servers to communicate easily, edit the /etc/hosts file and add the host names of each server in it. This has to be done on all the nodes. The following screenshot shows an example of the contents of the /etc/hosts file of the proxy server node: Install the Network Time Protocol (NTP) service on the proxy server node and storage server nodes. This helps all the nodes to synchronize their services effectively without any clock delays. The pre-installation steps to be performed are as follows: Run the following command to install the NTP service: # apt-get install ntp Configure the proxy server node to be the reference server for the storage server nodes to set their time from the proxy server node. Make sure that the following line is present in /etc/ntp.conf for NTP configuration in the proxy server node: server ntp.ubuntu.com For NTP configuration in the storage server nodes, add the following line to /etc/ntp.conf. Comment out the remaining lines with server addresses such as 0.ubuntu.pool.ntp.org, 1.ubuntu.pool.ntp.org, 2.ubuntu.pool.ntp.org, and 3.ubuntu.pool.ntp.org: # server 0.ubuntu.pool.ntp.org# server 1.ubuntu.pool.ntp.org# server 2.ubuntu.pool.ntp.org# server 3.ubuntu.pool.ntp.orgserver s-swift-proxy Restart the NTP service on each server with the following command: # service ntp restart Downloading and installing Swift The Ubuntu Cloud Archive is a special repository that provides users with the ability to install new releases of OpenStack. The steps required to download and install Swift are as follows: Enable the capability to install new releases of OpenStack, and install the latest version of Swift on each node using the following commands. The second command shown here creates a file named cloudarchive-juno.list in /etc/apt/sources.list.d, whose content is "deb http://ubuntu-cloud.archieve.canonical.com/ubuntu": Now, update the OS using the following command: # apt-get update && apt-get dist-upgrade On all the Swift nodes, we will install the prerequisite software and services using this command: # apt-get install swift rsync memcached python-netifaces python-xattr python-memcache Next, we create a Swift folder under /etc and give users the permission to access this folder, using the following commands: # mkdir –p /etc/swift/# chown –R swift:swift /etc/swift Download the /etc/swift/swift.conf file from GitHub using this command: # curl –o /etc/swift/swift.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/swift.conf-sample Modify the /etc/swift/swift.conf file and add a variable called swift_hash_path_suffix in the swift-hash section. We then create a unique hash string using # python –c "from uuid import uuid4; print uuid4()" or # openssl rand –hex 10, and assign it to this variable, as shown in the following configuration option: We then add another variable called swift_hash_path_prefix to the swift-hash section, and assign to it another hash string created using the method described in the preceding step. These strings will be used in the hashing process to determine the mappings in the ring. The swift.conf file should be identical on all the nodes in the cluster. Setting up storage server nodes This section explains additional steps to set up the storage server nodes, which will contain the object, container, and account services. Installing services The first step required to set up the storage server node is installing services. Let's look at the steps involved: On each storage server node, install the packages for swift-account services, swift-container services, swift-object services, and xfsprogs (XFS Filesystem) using this command: # apt-get install swift-account swift-container swift-object xfsprogs Download the account-server.conf, container-server.conf, and object-server.conf samples from GitHub, using the following commands: # curl –o /etc/swift/account-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/account-server.conf-sample# curl –o /etc/swift/container-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/container-server.conf-sample# curl –o /etc/swift/object-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/object-server.conf-sample Edit the /etc/swift/account-server.conf file with the following section: Edit the /etc/swift/container-server.conf file with this section: Edit the /etc/swift/object-server.conf file with the following section: Formatting and mounting hard disks On each storage server node, we need to identify the hard disks that will be used to store the data. We will then format the hard disks and mount them on a directory, which Swift will then use to store data. We will not create any RAID levels or subpartitions on these hard disks because they are not necessary for Swift. They will be used as entire disks. The operating system will be installed on separate disks, which will be RAID configured. First, identify the hard disks that are going to be used for storage and format them. In our storage server, we have identified sdb, sdc, and sdd to be used for storage. We will perform the following operations on sdb. These four steps should be repeated for sdc and sdd as well: Carry out the partitioning for sdb and create the filesystem using this command: # fdisk /dev/sdb# mkfs.xfs /dev/sdb1 Then let's create a directory in /srv/node/sdb1 that will be used to mount the filesystem. Give the permission to the swift user to access this directory. These operations can be performed using the following commands: # mkdir –p /srv/node/sdb1# chown –R swift:swift /srv/node/sdb1 We set up an entry in fstab for the sdb1 partition in the sdb hard disk, as follows. This will automatically mount sdb1 on /srv/node/sdb1 upon every boot. Add the following command line to the /etc/fstab file: /dev/sdb1 /srv/node/sdb1 xfsnoatime,nodiratime,nobarrier,logbufs=8 0 2 Mount sdb1 on /srv/node/sdb1 using the following command: # mount /srv/node/sdb1 RSYNC and RSYNCD In order for Swift to perform the replication of data, we need to configure rsync by configuring rsyncd.conf. This is done by performing the following steps: Create the rsyncd.conf file in the /etc folder with the following content: # vi /etc/rsyncd.conf We are setting up synchronization within the network by including the following lines in the configuration file: 172.168.9.52 is the IP address that is on the replication network for this storage server. Use the appropriate replication network IP addresses for the corresponding storage servers. We then have to edit the /etc/default/rsync file and set RSYNC_ENABLE to true using the following configuration option: RSYNC_ENABLE=true Next, we restart the rsync service using this command: # service rsync restart Then we create the swift, recon, and cache directories using the following commands, and then set its permissions: # mkdir -p /var/cache/swift# mkdir -p /var/swift/recon Setting permissions is done using these commands: # chown -R swift:swift /var/cache/swift# chown -R swift:swift /var/swift/recon Repeat these steps on every storage server. Setting up the proxy server node This section explains the steps required to set up the proxy server node, which are as follows: Install the following services only on the proxy server node: # apt-get install python-swiftclient python-keystoneclientpython-keystonemiddleware swift-proxy Swift doesn't support HTTPS. OpenSSL has already been installed as part of the operating system installation to support HTTPS. We are going to use the OpenStack Keystone service for authentication. In order to set up the proxy-server.conf file for this, we download the configuration file from the following link and edit it: https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/proxy-server.conf-sample# vi /etc/swift/proxy-server.conf The proxy-server.conf file should be edited to get the correct auth_host, admin_token, admin_tenant_name, admin_user, and admin_password values: admin_token = 01d8b673-9ebb-41d2-968a-d2a85daa1324admin_tenant_name = adminadmin_user = adminadmin_password = changeme Next, we create a keystone-signing directory and give permissions to the swift user using the following commands: # mkdir -p /home/swift/keystone-signing# mkdir -R swift:swift /home/swift/keystone-signing Summary In this article, you learned how to install and set up the OpenStack Swift service to provide object storage, and install and set up the Keystone service to provide authentication for users to access the Swift object storage. Resources for Article: Further resources on this subject: Troubleshooting in OpenStack Cloud Computing [Article] Using OpenStack Swift [Article] Playing with Swift [Article]
Read more
  • 0
  • 0
  • 15975

article-image-plotting-haskell
Packt
04 Jun 2015
10 min read
Save for later

Plotting in Haskell

Packt
04 Jun 2015
10 min read
In this article by James Church, author of the book Learning Haskell Data Analysis, we will see the different methods of data analysis by plotting data using Haskell. The other topics that this article covers is using GHCi, scaling data, and comparing stock prices. (For more resources related to this topic, see here.) Can you perform data analysis in Haskell? Yes, and you might even find that you enjoy it. We are going to take a few snippets of Haskell and put some plots of the stock market data together. To get started with, the following software needs to be installed: The Haskell platform (http://www.haskell.org/platform) Gnuplot (http://www.gnuplot.info/) The cabal command-line tool is the tool used to install packages in Haskell. There are three packages that we may need in order to analyze the stock market data. To use cabal, you will use the cabal install [package names] command. Run the following command to install the CSV parsing package, the EasyPlot package, and the Either package: $ cabal install csv easyplot either Once you have the necessary software and packages installed, we are all set for some introductory analysis in Haskell. We need data It is difficult to perform an analysis of data without data. The Internet is rich with sources of data. Since this tutorial looks at the stock market data, we need a source. Visit the Yahoo! Finance website to find the history of every publicly traded stock on the New York Stock Exchange that has been adjusted to reflect splits over time. The good folks at Yahoo! provide this resource in the csv file format. We begin with downloading the entire history of the Apple company from Yahoo! Finance (http://finance.yahoo.com). You can find the content for Apple by performing a quote look up from the Yahoo! Finance home page for the AAPL symbol (that is, 2 As, not 2 Ps). On this page, you can find the link for Historical Prices. On the Historical Prices page, identify the link that says Download to Spreadsheet. The complete link to Apple's historical prices can be found at the following link: http://real-chart.finance.yahoo.com/table.csv?s=AAPL. We should take a moment to explore our dataset. Here are the column headers in the csv file: Date: This is a string that represents the date of a particular date in Apple's history Open: This is the opening value of one share High: This is the high trade value over the course of this day Low: This is the low trade value of the course of this day Close: This is the final price of the share at the end of this trading day Volume: This is the total number of shares traded on this day Adj Close: This is a variation on the closing price that adjusts the dividend payouts and company splits Another feature of this dataset is that each of the rows are written in a table in a chronological reverse order. The most recent date in the table is the first. The oldest is the last. Yahoo! Finance provides this table (Apple's historical prices) under the unhelpful name table.csv. I renamed my csv file aapl.csv, which is provided by Yahoo! Finance. Start GHCi The interactive prompt for Haskell is GHCi. On the command line, type GHCi. We begin with importing our newly installed libraries from the prompt: > import Data.List< > import Text.CSV< > import Data.Either.Combinators< > import Graphics.EasyPlot Parse the csv file that you just downloaded using the parseCSVFromFile command. This command will return an Either type, which represents one of the two things that happened: your file was parsed (Right) or something went wrong (Left). We can inspect the type of our result with the :t command: > eitherErrorOrCells <- parseCSVFromFile "aapl.csv"< > :t eitherErrorOrCells < eitherErrorOrCells :: Either Text.Parsec.Error.ParseError CSV Did we get an error for our result? For this, we are going to use the fromRight and fromLeft commands. Remember, Right is right and Left is wrong. When we run the fromLeft command, we should see this message saying that our content is in the Right: > fromLeft' eitherErrorOrCells < *** Exception: Data.Either.Combinators.fromLeft: Argument takes from 'Right _' Pull the cells of our csv file into cells. We can see the first four rows of our content using take 5 (which will pull our header line and the first four cells): > let cells = fromRight' eitherErrorOrCells< > take 5 cells< [["Date","Open","High","Low","Close","Volume","Adj Close"],["2014-11-10","552.40","560.63","551.62","558.23","1298900","558.23"],["2014-11-07","555.60","555.60","549.35","551.82","1589100","551.82"],["2014-11-06","555.50","556.80","550.58","551.69","1649900","551.69"],["2014-11-05","566.79","566.90","554.15","555.95","1645200","555.95"]] The last column in our csv file is the Adj Close, which is the column we would like to plot. Count the columns (starting with 0), and you will find that Adj Close is number 6. Everything else can be dropped. (Here, we are also using the init function to drop the last row of the data, which is an empty list. Grabbing the 6th element of an empty list will not work in Haskell.): > map (x -> x !! 6) (take 5 (init cells))< ["Adj Close","558.23","551.82","551.69","555.95"] We know that this column represents the adjusted close prices. We should drop our header row. Since we use tail to drop the header row, take 5 returns the first five adjusted close prices: > map (x -> x !! 6) (take 5 (tail (init cells)))< ["558.23","551.82","551.69","555.95","564.19"] We should store all of our adjusted close prices in a value called adjCloseOriginal: > let adjCloseAAPLOriginal = map (x -> x !! 6) (tail (init cells)) These are still raw strings. We need to convert these to a Double type with the read function: > let adjCloseAAPL = map read adjCloseAaplOriginal :: [Double] We are almost done messaging our data. We need to make sure that every value in adjClose is paired with an index position for the purpose of plotting. Remember that our adjusted closes are in a chronological reverse order. This will create a tuple, which can be passed to the plot function: > let aapl = zip (reverse [1.0..length adjCloseAAPL]) adjCloseAAPL< > take 5 aapl < [(2577,558.23),(2576,551.82),(2575,551.69),(2574,555.95),(2573,564.19)] Plotting > plot (PNG "aapl.png") $ Data3D [Title "AAPL"] [] aapl< True The following chart is the result of the preceding command: Open aapl.png, which should be newly created in your current working directory. This is a typical default chart created by EasyPlot. We can see the entire history of the Apple stock price. For most of this history, the adjusted share price was less than $10 per share. At about the 6,000 trading day, we see the quick ascension of the share price to over $100 per share. Most of the time, when we take a look at a share price, we are only interested in the tail portion (say, the last year of changes). Our data is already reversed, so the newest close prices are at the front. There are 252 trading days in a year, so we can take the first 252 elements in our value and plot them. While we are at it, we are going to change the style of the plot to a line plot: > let aapl252 = take 252 aapl< > plot (PNG "aapl_oneyear.png") $ Data2D [Title "AAPL", Style Lines] [] aapl252< True The following chart is the result of the preceding command: Scaling data Looking at the share price of a single company over the course of a year will tell you whether the price is trending upward or downward. While this is good, we can get better information about the growth by scaling the data. To scale a dataset to reflect the percent change, we subtract each value by the first element in the list, divide that by the first element, and then multiply by 100. Here, we create a simple function called percentChange. We then scale the values 100 to 105, using this new function. (Using the :t command is not necessary, but I like to use it to make sure that I have at least the desired type signature correct.): > let percentChange first value = 100.0 * (value - first) / first< > :t percentChange< percentChange :: Fractional a => a -> a -> a< > map (percentChange 100) [100..105]< [0.0,1.0,2.0,3.0,4.0,5.0] We will use this new function to scale our Apple dataset. Our tuple of values can be split using the fst (for the first value containing the index) and snd (for the second value containing the adjusted close) functions: > let firstValue = snd (last aapl252)< > let aapl252scaled = map (pair -> (fst pair, percentChange firstValue (snd pair))) aapl252< > plot (PNG "aapl_oneyear_pc.png") $ Data2D [Title "AAPL PC", Style Lines] [] aapl252scaled< True The following chart is the result of the preceding command: Let's take a look at the preceding chart. Notice that it looks identical to the one we just made, except that the y axis is now changed. The values on the left-hand side of the chart are now the fluctuating percent changes of the stock from a year ago. To the investor, this information is more meaningful. Comparing stock prices Every publicly traded company has a different stock price. When you hear that Company A has a share price of $10 and Company B has a price of $100, there is almost no meaningful content to this statement. We can arrive at a meaningful analysis by plotting the scaled history of the two companies on the same plot. Our Apple dataset uses an index position of the trading day for the x axis. This is fine for a single plot, but in order to combine plots, we need to make sure that all plots start at the same index. In order to prepare our existing data of Apple stock prices, we will adjust our index variable to begin at 0: > let firstIndex = fst (last aapl252scaled)< > let aapl252scaled = map (pair -> (fst pair - firstIndex, percentChange firstValue (snd pair))) aapl252 We will compare Apple to Google. Google uses the symbol GOOGL (spelled Google without the e). I downloaded the history of Google from Yahoo! Finance and performed the same steps that I previously wrote with our Apple dataset: > -- Prep Google for analysis< > eitherErrorOrCells <- parseCSVFromFile "googl.csv"< > let cells = fromRight' eitherErrorOrCells< > let adjCloseGOOGLOriginal = map (x -> x !! 6) (tail (init cells))< > let adjCloseGOOGL = map read adjCloseGOOGLOriginal :: [Double]< > let googl = zip (reverse [1.0..genericLength adjCloseGOOGL]) adjCloseGOOGL< > let googl252 = take 252 googl< > let firstValue = snd (last googl252)< > let firstIndex = fst (last googl252)< > let googl252scaled = map (pair -> (fst pair - firstIndex, percentChange firstValue (snd pair))) googl252 Now, we can plot the share prices of Apple and Google on the same chart, Apple plotted in red and Google plotted in blue: > plot (PNG "aapl_googl.png") [Data2D [Title "AAPL PC", Style Lines, Color Red] [] aapl252scaled, Data2D [Title "GOOGL PC", Style Lines, Color Blue] [] googl252scaled]< True The following chart is the result of the preceding command: You can compare for yourself the growth rate of the stock price for these two competing companies because I believe that the contrast is enough to let the image speak for itself. This type of analysis is useful in the investment strategy known as growth investing. I am not recommending this as a strategy, nor am I recommending either of these two companies for the purpose of an investment. I am recommending Haskell as your language of choice for performing data analysis. Summary In this article, we used data from a csv file and plotted data. The other topics covered in this article were using GHCi and EasyPlot for plotting, scaling data, and comparing stock prices. Resources for Article: Further resources on this subject: The Hunt for Data [article] Getting started with Haskell [article] Driving Visual Analyses with Automobile Data (Python) [article]
Read more
  • 0
  • 0
  • 7936

article-image-preparing-optimizations
Packt
04 Jun 2015
11 min read
Save for later

Preparing Optimizations

Packt
04 Jun 2015
11 min read
In this article by Mayur Pandey and Suyog Sarda, authors of LLVM Cookbook, we will look into the following recipes: Various levels of optimization Writing your own LLVM pass Running your own pass with the opt tool Using another pass in a new pass (For more resources related to this topic, see here.) Once the source code transformation completes, the output is in the LLVM IR form. This IR serves as a common platform for converting into assembly code, depending on the backend. However, before converting into an assembly code, the IR can be optimized to produce more effective code. The IR is in the SSA form, where every new assignment to a variable is a new variable itself—a classic case of an SSA representation. In the LLVM infrastructure, a pass serves the purpose of optimizing LLVM IR. A pass runs over the LLVM IR, processes the IR, analyzes it, identifies the optimization opportunities, and modifies the IR to produce optimized code. The command-line interface opt is used to run optimization passes on LLVM IR. Various levels of optimization There are various levels of optimization, starting at 0 and going up to 3 (there is also s for space optimization). The code gets more and more optimized as the optimization level increases. Let's try to explore the various optimization levels. Getting ready... Various optimization levels can be understood by running the opt command-line interface on LLVM IR. For this, an example C program can first be converted to IR using the Clang frontend. Open an example.c file and write the following code in it: $ vi example.c int main(int argc, char **argv) { int i, j, k, t = 0; for(i = 0; i < 10; i++) {    for(j = 0; j < 10; j++) {      for(k = 0; k < 10; k++) {        t++;      }    }    for(j = 0; j < 10; j++) {      t++;    } } for(i = 0; i < 20; i++) {    for(j = 0; j < 20; j++) {      t++;    }    for(j = 0; j < 20; j++) {      t++;    } } return t; } Now convert this into LLVM IR using the clang command, as shown here: $ clang –S –O0 –emit-llvm example.c A new file, example.ll, will be generated, containing LLVM IR. This file will be used to demonstrate the various optimization levels available. How to do it… Do the following steps: The opt command-line tool can be run on the IR-generated example.ll file: $ opt –O0 –S example.ll The –O0 syntax specifies the least optimization level. Similarly, you can run other optimization levels: $ opt –O1 –S example.ll $ opt –O2 –S example.ll $ opt –O3 –S example.ll How it works… The opt command-line interface takes the example.ll file as the input and runs the series of passes specified in each optimization level. It can repeat some passes in the same optimization level. To see which passes are being used in each optimization level, you have to add the --debug-pass=Structure command-line option with the previous opt commands. See Also To know more on various other options that can be used with the opt tool, refer to http://llvm.org/docs/CommandGuide/opt.html Writing your own LLVM pass All LLVM passes are subclasses of the pass class, and they implement functionality by overriding the virtual methods inherited from pass. LLVM applies a chain of analyses and transformations on the target program. A pass is an instance of the Pass LLVM class. Getting ready Let's see how to write a pass. Let's name the pass function block counter; once done, it will simply display the name of the function and count the basic blocks in that function when run. First, a Makefile needs to be written for the pass. Follow the given steps to write a Makefile: Open a Makefile in the llvm lib/Transform folder: $ vi Makefile Specify the path to the LLVM root folder and the library name, and make this pass a loadable module by specifying it in Makefile, as follows: LEVEL = ../../.. LIBRARYNAME = FuncBlockCount LOADABLE_MODULE = 1 include $(LEVEL)/Makefile.common This Makefile specifies that all the .cpp files in the current directory are to be compiled and linked together in a shared object. How to do it… Do the following steps: Create a new .cpp file called FuncBlockCount.cpp: $ vi FuncBlockCount.cpp In this file, include some header files from LLVM: #include "llvm/Pass.h" #include "llvm/IR/Function.h" #include "llvm/Support/raw_ostream.h" Include the llvm namespace to enable access to LLVM functions: using namespace llvm; Then start with an anonymous namespace: namespace { Next declare the pass: struct FuncBlockCount : public FunctionPass { Then declare the pass identifier, which will be used by LLVM to identify the pass: static char ID; FuncBlockCount() : FunctionPass(ID) {} This step is one of the most important steps in writing a pass—writing a run function. Since this pass inherits FunctionPass and runs on a function, a runOnFunction is defined to be run on a function: bool runOnFunction(Function &F) override {      errs() << "Function " << F.getName() << 'n';      return false;    } }; } This function prints the name of the function that is being processed. The next step is to initialize the pass ID: char FuncBlockCount::ID = 0; Finally, the pass needs to be registered, with a command-line argument and a name: static RegisterPass<FuncBlockCount> X("funcblockcount", "Function Block Count", false, false); Putting everything together, the entire code looks like this: #include "llvm/Pass.h" #include "llvm/IR/Function.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; namespace { struct FuncBlockCount : public FunctionPass { static char ID; FuncBlockCount() : FunctionPass(ID) {} bool runOnFunction(Function &F) override {    errs() << "Function " << F.getName() << 'n';    return false; }            };        }        char FuncBlockCount::ID = 0;        static RegisterPass<FuncBlockCount> X("funcblockcount", "Function Block Count", false, false); How it works A simple gmake command compiles the file, so a new file FuncBlockCount.so is generated at the LLVM root directory. This shared object file can be dynamically loaded to the opt tool to run it on a piece of LLVM IR code. How to load and run it will be demonstrated in the next section. See also To know more on how a pass can be built from scratch, visit http://llvm.org/docs/WritingAnLLVMPass.html Running your own pass with the opt tool The pass written in the previous recipe, Writing your own LLVM pass, is ready to be run on the LLVM IR. This pass needs to be loaded dynamically for the opt tool to recognize and execute it. How to do it… Do the following steps: Write the C test code in the sample.c file, which we will convert into an .ll file in the next step: $ vi sample.c   int foo(int n, int m) { int sum = 0; int c0; for (c0 = n; c0 > 0; c0--) {    int c1 = m;  for (; c1 > 0; c1--) {      sum += c0 > c1 ? 1 : 0;    } } return sum; } Convert the C test code into LLVM IR using the following command: $ clang –O0 –S –emit-llvm sample.c –o sample.ll This will generate a sample.ll file. Run the new pass with the opt tool, as follows: $ opt -load (path_to_.so_file)/FuncBlockCount.so -funcblockcount sample.ll The output will look something like this: Function foo How it works… As seen in the preceding code, the shared object loads dynamically into the opt command-line tool and runs the pass. It goes over the function and displays its name. It does not modify the IR. Further enhancement in the new pass is demonstrated in the next recipe. See also To know more about the various types of the Pass class, visit http://llvm.org/docs/WritingAnLLVMPass.html#pass-classes-and-requirements Using another pass in a new pass A pass may require another pass to get some analysis data, heuristics, or any such information to decide on a further course of action. The pass may just require some analysis such as memory dependencies, or it may require the altered IR as well. The new pass that you just saw simply prints the name of the function. Let's see how to enhance it to count the basic blocks in a loop, which also demonstrates how to use other pass results. Getting ready The code used in the previous recipe remains the same. Some modifications are required, however, to enhance it—as demonstrated in next section—so that it counts the number of basic blocks in the IR. How to do it… The getAnalysis function is used to specify which other pass will be used: Since the new pass will be counting the number of basic blocks, it requires loop information. This is specified using the getAnalysis loop function: LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo(); This will call the LoopInfo pass to get information on the loop. Iterating through this object gives the basic block information: unsigned num_Blocks = 0; Loop::block_iterator bb; for(bb = L->block_begin(); bb != L->block_end();++bb)    num_Blocks++; errs() << "Loop level " << nest << " has " << num_Blocks << " blocksn"; This will go over the loop to count the basic blocks inside it. However, it counts only the basic blocks in the outermost loop. To get information on the innermost loop, recursive calling of the getSubLoops function will help. Putting the logic in a separate function and calling it recursively makes more sense: void countBlocksInLoop(Loop *L, unsigned nest) { unsigned num_Blocks = 0; Loop::block_iterator bb; for(bb = L->block_begin(); bb != L->block_end();++bb)    num_Blocks++; errs() << "Loop level " << nest << " has " << num_Blocks << " blocksn"; std::vector<Loop*> subLoops = L->getSubLoops(); Loop::iterator j, f; for (j = subLoops.begin(), f = subLoops.end(); j != f; ++j)    countBlocksInLoop(*j, nest + 1); } virtual bool runOnFunction(Function &F) { LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo(); errs() << "Function " << F.getName() + "n"; for (Loop *L : *LI)    countBlocksInLoop(L, 0); return false; } How it works… The newly modified pass now needs to run on a sample program. Follow the given steps to modify and run the sample program: Open the sample.c file and replace its content with the following program: int main(int argc, char **argv) { int i, j, k, t = 0; for(i = 0; i < 10; i++) {    for(j = 0; j < 10; j++) {      for(k = 0; k < 10; k++) {        t++;      }    }    for(j = 0; j < 10; j++) {      t++;    } } for(i = 0; i < 20; i++) {    for(j = 0; j < 20; j++) {      t++;    }    for(j = 0; j < 20; j++) {      t++;    } } return t; } Convert it into a .ll file using Clang: $ clang –O0 –S –emit-llvm sample.c –o sample.ll Run the new pass on the previous sample program: $ opt -load (path_to_.so_file)/FuncBlockCount.so - funcblockcount sample.ll The output will look something like this: Function main Loop level 0 has 11 blocks Loop level 1 has 3 blocks Loop level 1 has 3 blocks Loop level 0 has 15 blocks Loop level 1 has 7 blocks Loop level 2 has 3 blocks Loop level 1 has 3 blocks There's more… The LLVM's pass manager provides a debug pass option that gives us the chance to see which passes interact with our analyses and optimizations, as follows: $ opt -load (path_to_.so_file)/FuncBlockCount.so - funcblockcount sample.ll –disable-output –debug-pass=Structure Summary In this article you have explored various optimization levels, and the optimization techniques kicking at each level. We also saw the step-by-step approach to writing our own LLVM pass. Resources for Article: Further resources on this subject: Integrating a D3.js visualization into a simple AngularJS application [article] Getting Up and Running with Cassandra [article] Cassandra Architecture [article]
Read more
  • 0
  • 0
  • 4371
article-image-regex-practice
Packt
04 Jun 2015
24 min read
Save for later

Regex in Practice

Packt
04 Jun 2015
24 min read
Knowing Regex's syntax allows you to model text patterns, but sometimes coming up with a good reliable pattern can be more difficult, so taking a look at some actual use cases can really help you learn some common design patterns. So, in this article by Loiane Groner and Gabriel Manricks, coauthors of the book JavaScript Regular Expressions, we will develop a form, and we will explore the following topics: Validating a name Validating e-mails Validating a Twitter username Validating passwords Validating URLs Manipulating text (For more resources related to this topic, see here.) Regular expressions and form validation By far, one of the most common uses for regular expressions on the frontend is for use with user submitted forms, so this is what we will be building. The form we will be building will have all the common fields, such as name, e-mail, website, and so on, but we will also experiment with some text processing besides all the validations. In real-world applications, you usually are not going to implement the parsing and validation code manually. You can create a regular expression and rely on some JavaScript libraries, such as: jQuery validation: Refer to http://jqueryvalidation.org/ Parsely.js: Refer to http://parsleyjs.org/ Even the most popular frameworks support the usage of regular expressions with its native validation engine, such as AngularJS (refer to http://www.ng-newsletter.com/posts/validations.html). Setting up the form This demo will be for a site that allows users to create an online bio, and as such, consists of different types of fields. However, before we get into this (since we won't be building a backend to handle the form), we are going to setup some HTML and JavaScript code to catch the form submission and extract/validate the data entered in it. To keep the code neat, we will create an array with all the validation functions, and a data object where all the final data will be kept. Here is a basic outline of the HTML code for which we begin by adding fields: <!DOCTYPE HTML> <html>    <head>        <title>Personal Bio Demo</title>    </head>    <body>        <form id="main_form">            <input type="submit" value="Process" />        </form>          <script>            // js goes here        </script>    </body> </html> Next, we need to write some JavaScript to catch the form and run through the list of functions that we will be writing. If a function returns false, it means that the verification did not pass and we will stop processing the form. In the event where we get through the entire list of functions and no problems arise, we will log out of the console and data object, which contain all the fields we extracted: <script>    var fns = [];    var data = {};      var form = document.getElementById("main_form");      form.onsubmit = function(e) {      e.preventDefault();          data = {};          for (var i = 0; i < fns.length; i++) {            if (fns[i]() == false) {                return;            }        }          console.log("Verified Data: ", data);    } </script> The JavaScript starts by creating the two variables I mentioned previously, we then pull the form's object from the DOM and set the submit handler. The submit handler begins by preventing a page from actually submitting, (as we don't have any backend code in this example) and then we go through the list of functions running them one by one. Validating fields In this section, we will explore how to validate different types of fields manually, such as name, e-mail, website URL, and so on. Matching a complete name To get our feet wet, let's begin with a simple name field. It's something we have gone through briefly in the past, so it should give you an idea of how our system will work. The following code goes inside the script tags, but only after everything we have written so far: function process_name() {    var field = document.getElementById("name_field");    var name = field.value;      var name_pattern = /^(S+) (S*) ?b(S+)$/;      if (name_pattern.test(name) === false) {        alert("Name field is invalid");         return false;    }      var res = name_pattern.exec(name);    data.first_name = res[1];    data.last_name = res[3];      if (res[2].length > 0) {        data.middle_name = res[2];    }      return true; }   fns.push(process_name); We get the name field in a similar way to how we got the form, then, we extract the value and test it against a pattern to match a full name. If the name doesn't match the pattern, we simply alert the user and return false to let the form handler know that the validations have failed. If the name field is in the correct format, we set the corresponding fields on the data object (remember, the middle name is optional here). The last line just adds this function to the array of functions, so it will be called when the form is submitted. The last thing required to get this working is to add HTML for this form field, so inside the form tags (right before the submit button), you can add this text input: Name: <input type="text" id="name_field" /><br /> Opening this page in your browser, you should be able to test it out by entering different values into the Name box. If you enter a valid name, you should get the data object printed out with the correct parameters, otherwise you should be able to see this alert message: Understanding the complete name Regex Let's go back to the regular expression used to match the name entered by a user: /^(S+) (S*) ?b(S+)$/ The following is a brief explanation of the Regex: The ^ character asserts its position at the beginning of a string The first capturing group (S+) S+ matches a non-white space character [^rntf] The + quantifier between one and unlimited times The second capturing group (S*) S* matches any non-whitespace character [^rntf] The * quantifier between zero and unlimited times " ?" matches the whitespace character The ? quantifier between zero and one time b asserts its position at a (^w|w$|Ww|wW) word boundary The third capturing group (S+) S+ matches a non-whitespace character [^rntf] The + quantifier between one and unlimited times $ asserts its position at the end of a string Matching an e-mail with Regex The next type of field we may want to add is an e-mail field. E-mails may look pretty simple at first glance, but there are a large variety of e-mails out there. You may just think of creating a word@word.word pattern, but the first section can contain many additional characters besides just letters, the domain can be a subdomain, or the suffix could have multiple parts (such as .co.uk for the UK). Our pattern will simply look for a group of characters that are not spaces or instances where the @ symbol has been used in the first section. We will then want an @ symbol, followed by another set of characters that have at least one period, followed by the suffix, which in itself could contain another suffix. So, this can be accomplished in the following manner: /[^s@]+@[^s@.]+.[^s@]+/ The pattern of our example is very simple and will not match every valid e-mail address. There is an official standard for an e-mail address's regular expressions called RFC 5322. For more information, please read http://www.regular-expressions.info/email.html. So, let's add the field to our page: Email: <input type="text" id="email_field" /><br /> We can then add this function to verify it: function process_email() {    var field = document.getElementById("email_field");    var email = field.value;      var email_pattern = /^[^s@]+@[^s@.]+.[^s@]+$/;      if (email_pattern.test(email) === false) {        alert("Email is invalid");        return false;    }      data.email = email;    return true; }   fns.push(process_email); There is an HTML5 field type specifically designed for e-mails, but here we are verifying manually, as this is a Regex book. For more information, please refer to http://www.w3.org/TR/html-markup/input.email.html. Understanding the e-mail Regex Let's go back to the regular expression used to match the name entered by the user: /^[^s@]+@[^s@.]+.[^s@]+$/ Following is a brief explanation of the Regex: ^ asserts a position at the beginning of the string [^s@]+ matches a single character that is not present in the following list: The + quantifier between one and unlimited times s matches any white space character [rntf ] @ matches the @ literal character [^s@.]+ matches a single character that is not present in the following list: The + quantifier between one and unlimited times s matches a [rntf] whitespace character @. is a single character in the @. list, literally . matches the . character literally [^s@]+ match a single character that is not present in the following list: The + quantifier between one and unlimited times s matches [rntf] a whitespace character @ is the @ literal character $ asserts its position at end of a string Matching a Twitter name The next field we are going to add is a field for a Twitter username. For the unfamiliar, a Twitter username is in the @username format, but when people enter this in, they sometimes include the preceding @ symbol and on other occasions, they only write the username by itself. Obviously, internally we would like everything to be stored uniformly, so we will need to extract the username, regardless of the @ symbol, and then manually prepend it with one, so regardless of whether it was there or not, the end result will look the same. So again, let's add a field for this: Twitter: <input type="text" id="twitter_field" /><br /> Now, let's write the function to handle it: function process_twitter() {    var field = document.getElementById("twitter_field");    var username = field.value;      var twitter_pattern = /^@?(w+)$/;      if (twitter_pattern.test(username) === false) {        alert("Twitter username is invalid");        return false;    }      var res = twitter_pattern.exec(username);    data.twitter = "@" + res[1];    return true; }   fns.push(process_twitter); If a user inputs the @ symbol, it will be ignored, as we will add it manually after checking the username. Understanding the twitter username Regex Let's go back to the regular expression used to match the name entered by the user: /^@?(w+)$/ This is a brief explanation of the Regex: ^ asserts its position at start of the string @? matches the @ character, literally The ? quantifier between zero and one time First capturing group (w+) w+ matches a [a-zA-Z0-9_] word character The + quantifier between one and unlimited times $ asserts its position at end of a string Matching passwords Another popular field, which can have some unique constraints, is a password field. Now, not every password field is interesting; you may just allow just about anything as a password, as long as the field isn't left blank. However, there are sites where you need to have at least one letter from each case, a number, and at least one other character. Considering all the ways these can be combined, creating a pattern that can validate this could be quite complex. A much better solution for this, and one that allows us to be a bit more verbose with our error messages, is to create four separate patterns and make sure the password matches each of them. For the input, it's almost identical: Password: <input type="password" id="password_field" /><br /> The process_password function is not very different from the previous example as we can see its code as follows: function process_password() {    var field = document.getElementById("password_field");    var password = field.value;      var contains_lowercase = /[a-z]/;    var contains_uppercase = /[A-Z]/;    var contains_number = /[0-9]/;    var contains_other = /[^a-zA-Z0-9]/;      if (contains_lowercase.test(password) === false) {        alert("Password must include a lowercase letter");        return false;    }      if (contains_uppercase.test(password) === false) {        alert("Password must include an uppercase letter");        return false;    }      if (contains_number.test(password) === false) {        alert("Password must include a number");        return false;    }      if (contains_other.test(password) === false) {        alert("Password must include a non-alphanumeric character");        return false;    }      data.password = password;    return true; }   fns.push(process_password); All in all, you may say that this is a pretty basic validation and something we have already covered, but I think it's a great example of working smart as opposed to working hard. Sure, we probably could have created one long pattern that would check everything together, but it would be less clear and less flexible. So, by breaking it into smaller and more manageable validations, we were able to make clear patterns, and at the same time, improve their usability with more helpful alert messages. Matching URLs Next, let's create a field for the user's website; the HTML for this field is: Website: <input type="text" id="website_field" /><br /> A URL can have many different protocols, but for this example, let's restrict it to only http or https links. Next, we have the domain name with an optional subdomain, and we need to end it with a suffix. The suffix itself can be a single word, such as .com or it can have multiple segments, such as.co.uk. All in all, our pattern looks similar to this: /^(?:https?://)?w+(?:.w+)?(?:.[A-Z]{2,3})+$/i Here, we are using multiple noncapture groups, both for when sections are optional and for when we want to repeat a segment. You may have also noticed that we are using the case insensitive flag (/i) at the end of the regular expression, as links can be written in lowercase or uppercase. Now, we'll implement the actual function: function process_website() {    var field = document.getElementById("website_field");    var website = field.value;      var pattern = /^(?:https?://)?w+(?:.w+)?(?:.[A-Z]{2,3})+$/i      if (pattern.test(website) === false) {       alert("Website is invalid");        return false;    }      data.website = website;    return true; }   fns.push(process_website); At this point, you should be pretty familiar with the process of adding fields to our form and adding a function to validate them. So, for our remaining examples let's shift our focus a bit from validating inputs to manipulating data. Understanding the URL Regex Let's go back to the regular expression used to match the name entered by the user: /^(?:https?://)?w+(?:.w+)?(?:.[A-Z]{2,3})+$/i This is a brief explanation of the Regex: ^ asserts its position at start of a string (?:https?://)? is anon-capturing group The ? quantifier between zero and one time http matches the http characters literally (case-insensitive) s? matches the s character literally (case-insensitive) The ? quantifier between zero and one time : matches the : character literally / matches the / character literally / matches the / character literally w+ matches a [a-zA-Z0-9_] word character The + quantifier between one and unlimited times (?:.w+)? is a non-capturing group The ? quantifier between zero and one time . matches the . character literally w+ matches a [a-zA-Z0-9_] word character The + quantifier between one and unlimited times (?:.[A-Z]{2,3})+ is a non-capturing group The + quantifier between one and unlimited times . matches the . character literally [A-Z]{2,3} matches a single character present in this list The {2,3} quantifier between2 and 3 times A-Z is a single character in the range between A and Z (case insensitive) $ asserts its position at end of a string i modifier: insensitive. Case insensitive letters, meaning it will match a-z and A-Z. Manipulating data We are going to add one more input to our form, which will be for the user's description. In the description, we will parse for things, such as e-mails, and then create both a plain text and HTML version of the user's description. The HTML for this form is pretty straightforward; we will be using a standard textbox and give it an appropriate field: Description: <br /> <textarea id="description_field"></textarea><br /> Next, let's start with the bare scaffold needed to begin processing the form data: function process_description() {    var field = document.getElementById("description_field");    var description = field.value;      data.text_description = description;      // More Processing Here      data.html_description = "<p>" + description + "</p>";      return true; }   fns.push(process_description); This code gets the text from the textbox on the page and then saves both a plain text version and an HTML version of it. At this stage, the HTML version is simply the plain text version wrapped between a pair of paragraph tags, but this is what we will be working on now. The first thing I want to do is split between paragraphs, in a text area the user may have different split-ups—lines and paragraphs. For our example, let's say the user just entered a single new line character, then we will add a <br /> tag and if there is more than one character, we will create a new paragraph using the <p> tag. Using the String.replace method We are going to use JavaScript's replace method on the string object This function can accept a Regex pattern as its first parameter, and a function as its second; each time it finds the pattern it will call the function and anything returned by the function will be inserted in place of the matched text. So, for our example, we will be looking for new line characters, and in the function, we will decide if we want to replace the new line with a break line tag or an actual new paragraph, based on how many new line characters it was able to pick up: var line_pattern = /n+/g; description = description.replace(line_pattern, function(match) {    if (match == "n") {        return "<br />";    } else {        return "</p><p>";    } }); The first thing you may notice is that we need to use the g flag in the pattern, so that it will look for all possible matches as opposed to only the first. Besides this, the rest is pretty straightforward. Consider this form: If you take a look at the output from the console of the preceding code, you should get something similar to this: Matching a description field The next thing we need to do is try and extract e-mails from the text and automatically wrap them in a link tag. We have already covered a Regexp pattern to capture e-mails, but we will need to modify it slightly, as our previous pattern expects that an e-mail is the only thing present in the text. In this situation, we are interested in all the e-mails included in a large body of text. If you were simply looking for a word, you would be able to use the b matcher, which matches any boundary (that can be the end of a word/the end of a sentence), so instead of the dollar sign, which we used before to denote the end of a string, we would place the boundary character to denote the end of a word. However, in our case it isn't quite good enough, as there are boundary characters that are valid e-mail characters, for example, the period character is valid. To get around this, we can use the boundary character in conjunction with a lookahead group and say we want it to end with a word boundary, but only if it is followed by a space or end of a sentence/string. This will ensure we aren't cutting off a subdomain or a part of a domain, if there is some invalid information mid-way through the address. Now, we aren't creating something that will try and parse e-mails no matter how they are entered; the point of creating validators and patterns is to force the user to enter something logical. That said, we assume that if the user wrote an e-mail address and then a period, that he/she didn't enter an invalid address, rather, he/she entered an address and then ended a sentence (the period is not part of the address). In our code, we assume that to the end an address, the user is either going to have a space after, such as some kind of punctuation, or that he/she is ending the string/line. We no longer have to deal with lines because we converted them to HTML, but we do have to worry that our pattern doesn't pick up an HTML tag in the process. At the end of this, our pattern will look similar to this: /b[^s<>@]+@[^s<>@.]+.[^s<>@]+b(?=.?(?:s|<|$))/g We start off with a word boundary, then, we look for the pattern we had before. I added both the (>) greater-than and the (<) less-than characters to the group of disallowed characters, so that it will not pick up any HTML tags. At the end of the pattern, you can see that we want to end on a word boundary, but only if it is followed by a space, an HTML tag, or the end of a string. The complete function, which does all the matching, is as follows: function process_description() {    var field = document.getElementById("description_field");    var description = field.value;      data.text_description = description;      var line_pattern = /n+/g;    description = description.replace(line_pattern, function(match) {        if (match == "n") {            return "<br />";        } else {            return "</p><p>";        }    });      var email_pattern = /b[^s<>@]+@[^s<>@.]+.[^s<>@]+b(?=.?(?:s|<|$))/g;    description = description.replace(email_pattern, function(match){        return "<a href='mailto:" + match + "'>" + match + "</a>";    });      data.html_description = "<p>" + description + "</p>";      return true; } We can continue to add fields, but I think the point has been understood. You have a pattern that matches what you want, and with the extracted data, you are able to extract and manipulate the data into any format you may need. Understanding the description Regex Let's go back to the regular expression used to match the name entered by the user: /b[^s<>@]+@[^s<>@.]+.[^s<>@]+b(?=.?(?:s|<|$))/g This is a brief explanation of the Regex: b asserts its position at a (^w|w$|Ww|wW) word boundary [^s<>@]+ matches a single character not present in the this list: The + quantifier between one and unlimited times s matches a [rntf ] whitespace character <>@ is a single character in the <>@ list (case-sensitive) @ matches the @ character literally [^s<>@.]+ matches a single character not present in this list: The + quantifier between one and unlimited times s matches any [rntf] whitespace character <>@. is a single character in the <>@. list literally (case sensitive) . matches the . character literally [^s<>@]+ matches a single character not present in this the list: The + quantifier between one and unlimited times s matches a [rntf ] whitespace character <>@ isa single character in the <>@ list literally (case sensitive) b asserts its position at a (^w|w$|Ww|wW) word boundary (?=.?(?:s|<|$)) Positive Lookahead - Assert that the Regex below can be matched .? matches any character (except new line) The ? quantifier between zero and one time (?:s|<|$) is a non-capturing group: First alternative: s matches any white space character [rntf] Second alternative: < matches the character < literally Third alternative: $ assert position at end of the string The g modifier: global match. Returns all matches of the regular expression, not only the first one Explaining a Markdown example More examples of regular expressions can be seen with the popular Markdown syntax (refer to http://en.wikipedia.org/wiki/Markdown). This is a situation where a user is forced to write things in a custom format, although it's still a format, which saves typing and is easier to understand. For example, to create a link in Markdown, you would type something similar to this: [Click Me](http://gabrielmanricks.com) This would then be converted to: <a href="http://gabrielmanricks.com">Click Me</a> Disregarding any validation on the URL itself, this can easily be achieved using this pattern: /[([^]]*)](([^(]*))/g It looks a little complex, because both the square brackets and parenthesis are both special characters that need to be escaped. Basically, what we are saying is that we want an open square bracket, anything up to the closing square bracket, then we want an open parenthesis, and again, anything until the closing parenthesis. A good website to write markdown documents is http://dillinger.io/. Since we wrapped each section into its own capture group, we can write this function: text.replace(/[([^]]*)](([^(]*))/g, function(match, text, link){    return "<a href='" + link + "'>" + text + "</a>"; }); We haven't been using capture groups in our manipulation examples, but if you use them, then the first parameter to the callback is the entire match (similar to the ones we have been working with) and then all the individual groups are passed as subsequent parameters, in the order that they appear in the pattern. Summary In this article, we covered a couple of examples that showed us how to both validate user inputs as well as manipulate them. We also took a look at some common design patterns and saw how it's sometimes better to simplify the problem instead of using brute force in one pattern for the purpose of creating validations. Resources for Article: Further resources on this subject: Getting Started with JSON [article] Function passing [article] YUI Test [article]
Read more
  • 0
  • 0
  • 7004

article-image-getting-started-hyper-v-architecture-and-components
Packt
04 Jun 2015
19 min read
Save for later

Getting Started with Hyper-V Architecture and Components

Packt
04 Jun 2015
19 min read
In this article by Vinícius R. Apolinário, author of the book Learning Hyper-V, we will cover the following topics: Hypervisor architecture Type 1 and 2 Hypervisors Microkernel and Monolithic Type 1 Hypervisors Hyper-V requirements and processor features Memory configuration Non-Uniform Memory Access (NUMA) architecture (For more resources related to this topic, see here.) Hypervisor architecture If you've used Microsoft Virtual Server or Virtual PC, and then moved to Hyper-V, I'm almost sure that your first impression was: "Wow, this is much faster than Virtual Server". You are right. And there is a reason why Hyper-V performance is much better than Virtual Server or Virtual PC. It's all about the architecture. There are two types of Hypervisor architectures. Hypervisor Type 1, like Hyper-V and ESXi from VMware, and Hypervisor Type 2, like Virtual Server, Virtual PC, VMware Workstation, and others. The objective of the Hypervisor is to execute, manage and control the operation of the VM on a given hardware. For that reason, the Hypervisor is also called Virtual Machine Monitor (VMM). The main difference between these Hypervisor types is the way they operate on the host machine and its operating systems. As Hyper-V is a Type 1 Hypervisor, we will cover Type 2 first, so we can detail Type 1 and its benefits later. Type 1 and Type 2 Hypervisors Hypervisor Type 2, also known as hosted, is an implementation of the Hypervisor over and above the OS installed on the host machine. With that, the OS will impose some limitations to the Hypervisor to operate, and these limitations are going to reflect on the performance of the VM. To understand that, let me explain how a process is placed on the processor: the processor has what we call Rings on which the processes are placed, based on prioritization. The main Rings are 0 and 3. Kernel processes are placed on Ring 0 as they are vital to the OS. Application processes are placed on Ring 3, and, as a result, they will have less priority when compared to Ring 0. The issue on Hypervisors Type 2 is that it will be considered an application, and will run on Ring 3. Let's have a look at it: As you can see from the preceding diagram, the hypervisor has an additional layer to access the hardware. Now, let's compare it with Hypervisor Type 1: The impact is immediate. As you can see, Hypervisor Type 1 has total control of the underlying hardware. In fact, when you enable Virtualization Assistance (hardware-assisted virtualization) at the server BIOS, you are enabling what we call Ring -1, or Ring decompression, on the processor and the Hypervisor will run on this Ring. The question you might have is "And what about the host OS?" If you install the Hyper-V role on a Windows Server for the first time, you may note that after installation, the server will restart. But, if you're really paying attention, you will note that the server will actually reboot twice. This behavior is expected, and the reason it will happen is because the OS is not only installing and enabling Hyper-V bits, but also changing its architecture to the Type 1 Hypervisor. In this mode, the host OS will operate in the same way a VM does, on top of the Hypervisor, but on what we call parent partition. The parent partition will play a key role as the boot partition and in supporting the child partitions, or guest OS, where the VMs are running. The main reason for this partition model is the key attribute of a Hypervisor: isolation. For Microsoft Hyper-V Server you don't have to install the Hyper-V role, as it will be installed when you install the OS, so you won't be able to see the server booting twice. With isolation, you can ensure that a given VM will never have access to another VM. That means that if you have a compromised VM, with isolation, the VM will never infect another VM or the host OS. The only way a VM can access another VM is through the network, like all other devices in your network. Actually, the same is true for the host OS. This is one of the reasons why you need an antivirus for the host and the VMs, but this will be discussed later. The major difference between Type 1 and Type 2 now is that kernel processes from both host OS and VM OS will run on Ring 0. Application processes from both host OS and VM OS will run on Ring 3. However, there is one piece left. The question now is "What about device drivers?" Microkernel and Monolithic Type 1 Hypervisors Have you tried to install Hyper-V on a laptop? What about an all-in-one device? A PC? A server? An x64 based tablet? They all worked, right? And they're supposed to work. As Hyper-V is a Microkernel Type 1 Hypervisor, all the device drivers are hosted on the parent partition. A Monolithic Type 1 Hypervisor hosts its drivers on the Hypervisor itself. VMware ESXi works this way. That's why you should never use a standard ESXi media to install an ESXi host. The hardware manufacturer will provide you with an appropriate media with the correct drivers for the specific hardware. The main advantage of the Monolithic Type 1 Hypervisor is that, as it always has the correct driver installed, you will never have a performance issue due to an incorrect driver. On the other hand, you won't be able to install this on any device. The Microkernel Type 1 Hypervisor, on the other hand, hosts its drivers on the parent partition. That means that if you installed the host OS on a device, and the drivers are working, the Hypervisor, and in this case Hyper-V, will work just fine. There are other hardware requirements. These will be discussed later in this article. The other side of this is that if you use a generic driver, or a wrong version of it, you may have performance issues, or even driver malfunction. What you have to keep in mind here is that Microsoft does not certify drivers for Hyper-V. Device drivers are always certified for Windows Server. If the driver is certified for Windows Server, it is also certified for Hyper-V. But you always have to ensure the use of correct driver for a given hardware. Let's take a better look at how Hyper-V works as a Microkernel Type 1 Hypervisor: As you can see from the preceding diagram, there are multiple components to ensure that the VM will run perfectly. However, the major component is the Integration Components (IC), also called Integration Services. The IC is a set of tools that you should install or upgrade on the VM, so that the VM OS will be able to detect the virtualization stack and run as a regular OS on a given hardware. To understand this more clearly, let's see how an application accesses the hardware and understand all the processes behind it. When the application tries to send a request to the hardware, the kernel is responsible for interpreting this call. As this OS is running on an Enlightened Child Partition (Means that IC is installed), the Kernel will send this call to the Virtual Service Client (VSC) that operates as a synthetic device driver. The VSC is responsible for communicating with the Virtual Service Provider (VSP) on the parent partition, through VMBus, so the VSC can use the hardware resource. The VMBus will then be able to communicate with the hardware for the VM. The VMBus, a channel-based communication, is actually responsible for communicating with the parent partition and hardware. For the VMBus to access the hardware, it will communicate directly with a component on the Hypervisor called hypercalls. These hypercalls are then redirected to the hardware. However, only the parent partition can actually access the physical processor and memory. The child partitions access a virtual view of these components that are translated on the guest and the host partitions. New processors have a feature called Second Level Address Translation (SLAT) or Nested Paging. This feature is extremely important on high performance VMs and hosts, as it helps reduce the overhead of the virtual to physical memory and processor translation. On Windows 8, SLAT is a requirement for Hyper-V. It is important to note that Enlightened Child Partitions, or partitions with IC, can be Windows or Linux OS. If the child partitions have a Linux OS, the name of the component is Linux Integration Services (LIS), but the operation is actually the same. Another important fact regarding ICs is that they are already present on Windows Server 2008 or later. But, if you are running a newer version of Hyper-V, you have to upgrade the IC version on the VM OS. For example, if you are running Hyper-V 2012 R2 on the host OS and the guest OS is running Windows Server 2012 R2, you probably don't have to worry about it. But if you are running Hyper-V 2012 R2 on the host OS and the guest OS is running Windows Server 2012, then you have to upgrade the IC on the VM to match the parent partition version. Running guest OS Windows Server 2012 R2 on a VM on top of Hyper-V 2012 is not recommended. For Linux guest OS, the process is the same. Linux kernel version 3 or later already have LIS installed. If you are running an old version of Linux, you should verify the correct LIS version of your OS. To confirm the Linux and LIS versions, you can refer to an article at http://technet.microsoft.com/library/dn531030.aspx. Another situation is when the guest OS does not support IC or LIS, or an Unenlightened Child Partition. In this case, the guest OS and its kernel will not be able to run as an Enlightened Child Partition. As the VMBus is not present in this case, the utilization of hardware will be made by emulation and performance will be degraded. This only happens with old versions of Windows and Linux, like Windows 2000 Server, Windows NT, and CentOS 5.8 or earlier, or in case that the guest OS does not have or support IC. Now that you understand how the Hyper-V architecture works, you may be thinking "Okay, so for all of this to work, what are the requirements?" Hyper-V requirements and processor features At this point, you can see that there is a lot of effort for putting all of this to work. In fact, this architecture is only possible because hardware and software companies worked together in the past. The main goal of both type of companies was to enable virtualization of operating systems without changing them. Intel and AMD created, each one with its own implementation, a processor feature called virtualization assistance so that the Hypervisor could run on Ring 0, as explained before. But this is just the first requirement. There are other requirement as well, which are as follows: Virtualization assistance (also known as Hardware-assisted virtualization): This feature was created to remove the necessity of changing the OS for virtualizing it. On Intel processors, it is known as Intel VT-x. All recent processor families support this feature, including Core i3, Core i5, and Core i7. The complete list of processors and features can be found at http://ark.intel.com/Products/VirtualizationTechnology. You can also use this tool to check if your processor meets this requirement which can be downloaded at: https://downloadcenter.intel.com/Detail_Desc.aspx?ProductID=1881&DwnldID=7838. On AMD Processors, this technology is known as AMD-V. Like Intel, all recent processor families support this feature. AMD provides a tool to check processor compatibility that can be downloaded at http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization. Data Execution Prevention (DEP): This is a security feature that marks memory pages as either executable or nonexecutable. For Hyper-V to run, this option must be enabled on the System BIOS. For an Intel-based processor, this feature is called Execute Disable bit (Intel XD bit) and No Execute Bit (AMD NX bit). This configuration will vary from one System BIOS to another. Check with your hardware vendor how to enable it on System BIOS. x64 (64-bit) based processor: This processor feature uses a 64-bit memory address. Although you may find that all new processors are x64, you might want to check if this is true before starting your implementation. The compatibility checkers above, from Intel and AMD, will show you if your processor is x64. Second Level Address Translation (SLAT): As discussed before, SLAT is not a requirement for Hyper-V to work. This feature provides much more performance on the VMs as it removes the need for translating physical and virtual pages of memory. It is highly recommended to have the SLAT feature on the processor ait provides more performance on high performance systems. As also discussed before, SLAT is a requirement if you want to use Hyper-V on Windows 8 or 8.1. To check if your processor has the SLAT feature, use the Sysinternals tool—Coreinfo— that can be downloaded at http://technet.microsoft.com/en-us/sysinternals/cc835722.aspx. There are some specific processor features that are not used exclusively for virtualization. But when the VM is initiated, it will use these specific features from the processor. If the VM is initiated and these features are allocated on the guest OS, you can't simply remove them. This is a problem if you are going to Live Migrate this VM from a host to another host; if these specific features are not available, you won't be able to perform the operation. At this moment, you have to understand that Live Migration moves a powered-on VM from one host to another. If you try to Live Migrate a VM between hosts with different processor types, you may be presented with an error. Live Migration is only permitted between the same processor vendor: Intel-Intel or AMD-AMD. Intel-AMD Live Migration is not allowed under any circumstance. If the processor is the same on both hosts, Live Migration and Share Nothing Live Migration will work without problems. But even within the same vendor, there can be different processor families. In this case, you can remove these specific features from the Virtual Processor presented to the VM. To do that, open Hyper-V Manager | Settings... | Processor | Processor Compatibility. Mark the Migrate to a physical computer with a different processor version option. This option is only available if the VM is powered off. Keep in mind that enabling this option will remove processor-specific features for the VM. If you are going to run an application that requires these features, they will not be available and the application may not run. Now that you have checked all the requirements, you can start planning your server for virtualization with Hyper-V. This is true from the perspective that you understand how Hyper-V works and what are the requirements for it to work. But there is another important subject that you should pay attention to when planning your server: memory. Memory configuration I believe you have heard this one before "The application server is running under performance". In the virtualization world, there is an obvious answer to it: give more virtual hardware to the VM. Although it seems to be the logical solution, the real effect can be totally opposite. During the early days, when servers had just a few sockets, processors, and cores, a single channel made the communication between logical processors and memory. But server hardware has evolved, and today, we have servers with 256 logical processors and 4 TB of RAM. To provide better communication between these components, a new concept emerged. Modern servers with multiple logical processors and high amount of memory use a new design called Non-Uniform Memory Access (NUMA) architecture. Non-Uniform Memory Access (NUMA) architecture NUMA is a memory design that consists of allocating memory to a given node, or a cluster of memory and logical processors. Accessing memory from a processor inside the node is notably faster than accessing memory from another node. If a processor has to access memory from another node, the performance of the process performing the operation will be affected. Basically, to solve this equation you have to ensure that the process inside the guest VM is aware of the NUMA node and is able to use the best available option: When you create a virtual machine, you decide how many virtual processors and how much virtual RAM this VM will have. Usually, you assign the amount of RAM that the application will need to run and meet the expected performance. For example, you may ask a software vendor on the application requirements and this software vendor will say that the application would be using at least 8 GB of RAM. Suppose you have a server with 16 GB of RAM. What you don't know is that this server has four NUMA nodes. To be able to know how much memory each NUMA node has, you must divide the total amount of RAM installed on the server by the number of NUMA nodes on the system. The result will be the amount of RAM of each NUMA node. In this case, each NUMA node has a total of 4 GB of RAM. Following the instructions of the software vendor, you create a VM with 8 GB of RAM. The Hyper-V standard configuration is to allow NUMA spanning, so you will be able to create the VM and start it. Hyper-V will accommodate 4 GB of RAM on two NUMA nodes. This NUMA spanning configuration means that a processor can access the memory on another NUMA node. As mentioned earlier, this will have an impact on the performance if the application is not aware of it. On Hyper-V, prior to the 2012 version, the guest OS was not informed about the NUMA configuration. Basically, in this case, the guest OS would see one NUMA node with 8 GB of RAM, and the allocation of memory would be made without NUMA restrictions, impacting the final performance of the application. Hyper-V 2012 and 2012 R2 have the same feature—the guest OS will see the virtual NUMA (vNUMA) presented to the child partition. With this feature, the guest OS and/or the application can make a better choice on where to allocate memory for each process running on this VM. NUMA is not a virtualization technology. In fact, it has been used for a long time, and even applications like SQL Server 2005 already used NUMA to better allocate the memory that its processes are using. Prior to Hyper-V 2012, if you wanted to avoid this behavior, you had two choices: Create the VM and allocate the maximum vRAM of a single NUMA node for it, as Hyper-V will always try to allocate the memory inside of a single NUMA node. In the above case, the VM should not have more than 4 GB of vRAM. But for this configuration to really work, you should also follow the next choice. Disable NUMA Spanning on Hyper-V. With this configuration disabled, you will not be able to run a VM if the memory configuration exceeds a single NUMA node. To do this, you should clear the Allow virtual machines to span physical NUMA nodes checkbox on Hyper-V Manager | Hyper-V Settings... | NUMA Spanning. Keep in mind that disabling this option will prevent you from running a VM if no nodes are available. You should also remember that even with Hyper-V 2012, if you create a VM with 8 GB of RAM using two NUMA nodes, the application on top of the guest OS (and the guest OS) must understand the NUMA topology. If the application and/or guest OS are not NUMA aware, vNUMA will not have effect and the application can still have performance issues. At this point you are probably asking yourself "How do I know how many NUMA nodes I have on my server?" This was harder to find in the previous versions of Windows Server and Hyper-V Server. In versions prior to 2012, you should open the Performance Monitor and check the available counters in Hyper-V VM Vid NUMA Node. The number of instances represents the number of NUMA Nodes. In Hyper-V 2012, you can check the settings for any VM. Under the Processor tab, there is a new feature available for NUMA. Let's have a look at this screen to understand what it represents: In Configuration, you can easily confirm how many NUMA nodes the host running this VM has. In the case above, the server has only 1 NUMA node. This means that all memory will be allocated close to the processor. Multiple NUMA nodes are usually present on servers with high amount of logical processors and memory. In the NUMA topology section, you can ensure that this VM will always run with the informed configuration. This is presented to you because of a new Hyper-V 2012 feature called Share Nothing Live Migration, which will be explained in detail later. This feature allows you to move a VM from one host to another without turning the VM off, with no cluster and no shared storage. As you can move the VM turned on, you might want to force the processor and memory configuration, based on the hardware of your worst server, ensuring that your VM will always meet your performance expectations. The Use Hardware Topology button will apply the hardware topology in case you moved the VM to another host or in case you changed the configuration and you want to apply the default configuration again. To summarize, if you want to make sure that your VM will not have performance problems, you should check how many NUMA nodes your server has and divide the total amount of memory by it; the result is the total memory on each node. Creating a VM with more memory than a single node will make Hyper-V present a vNUMA to the guest OS. Ensuring that the guest OS and applications are NUMA aware is also important, so that the guest OS and application can use this information to allocate memory for a process on the correct node. NUMA is important to ensure that you will not have problems because of host configuration and misconfiguration on the VM. But, in some cases, even when planning the VM size, you will come to a moment when the VM memory is stressed. In these cases, Hyper-V can help with another feature called Dynamic Memory. Summary In this we learned about the Hypervisor architecture and different Hypervisor types. We explored in brief about Microkernel and Monolithic Type 1 Hypervisors. In addition to this, this article also explains the Hyper-V requirements and processor features, Memory configuration and the NUMA architecture. Resources for Article: Further resources on this subject: Planning a Compliance Program in Microsoft System Center 2012 [Article] So, what is Microsoft © Hyper-V server 2008 R2? [Article] Deploying Applications and Software Updates on Microsoft System Center 2012 Configuration Manager [Article]
Read more
  • 0
  • 0
  • 6502
Modal Close icon
Modal Close icon