Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Cloud Computing

121 Articles
article-image-deploying-first-server
Packt
04 Jan 2017
16 min read
Save for later

Deploying First Server

Packt
04 Jan 2017
16 min read
In this article by Kirill Shirinkin, the author of the book Getting Started with Terraform, we will know how we can proceed to learning how exactly Terraform works and how to use it. In this we will learn a bit about Terraform history, install it on our workstation, prepare working environment and run the tool for the first time. After having everything ready for work we will figure out what is a Terraform provider and then we will take a quick tour of what AWS and EC2 is. With this knowledge in place, we will first create an EC2 instance by hand (just to understand the pain that Terraform will eliminate) and then we will do exactly the same with the help of Terraform template. That will allow us to study the nature of Terraform state file. (For more resources related to this topic, see here.) History of Terraform Terraform was first released in July 2014 by a company called Hashicorp. That is the same company that brought us tools like Vagrant, Packer, Vault and some others. Being the fifth tools in Hashicorp stack, it was focused on describing the complete infrastructure as code. … From physical servers to containers to SaaS products, Terraform is able to create and compose all the components necessary to run any service or application. With Terraform, you describe your complete infrastructure as code, even as it spans multiple service providers. Your servers may come from AWS, your DNS may come from CloudFlare, and your database may come from Heroku. Terraform will build all these resources across all these providers in parallel. Terraform codifies knowledge about your infrastructure unlike any other tool before, and provides the workflow and tooling for safely changing and updating infrastructure. - https://www.hashicorp.com/blog/terraform.html Terraform is an open source tool released under Mozilla public license, version 2.0. The code is stored (as all other tools by Hashicorp) on GitHub and anyone can contribute to its development. As part of its Atlas product Hashicorp also offers hosted Terraform Enterprise services, which solves some of the problems, which open source version doesn't. This includes central facility to run Terraform from, access control policies, remote state file storage, notifications, built-in GitHub integration and more. Despite support of over 40 various providers, the main focus of Hashicorp developers is on Amazon Web Services, Google Cloud and Microsoft Azure. All other providers are developed and supported by community, meaning that if you are not using the main three than you might have to contribute to the codebase yourself. The code of Terraform is written in Go programming language and is released as a single binary for all major operating systems. Windows, Mac OS X, FreeBSD, OpenBSD, Salaris and any Linux are supported in both 32-bit and 64-bit versions. Terraform is still a relatively new piece of tech, being just a bit over two years old. It changes a lot over time and gets new features with every release. After learning these facts, let's finally proceed to installing Terraform and setting up our workplace. Preparing work environment In this we will focus on using Terraform in a Linux environment. The general usage of the tool should be the same on all platforms, though some advanced topics and practices. As mentioned in previous section, Terraform is distributed as a single binary, packaged inside Zip archive. Unfortunately, Hashicorp does not provide native packages for operating systems. That means the first step is too install unzip. Depending on your package manager it could be done by running sudo yum install unzip or sudo apt-get install unzip or might be even already installed. In any case, after making sure that you can un-archive Zip files, proceed to downloading Terraform from official website: https://www.terraform.io/downloads.html. Unzip it to any convenient folder. Make sure that this folder is available in your PATH environment variable. Full installation commands sequence could look like this: $> curl -O https://releases.hashicorp.com/terraform/0.7.2/terraform_0.7.2_linux_amd64.zip $> sudo unzip terraform_0.7.2_linux_amd64.zip -d /usr/local/bin/ That will extract Terraform binary to /usr/local/bin, which is already available in PATH on Linux systems. Finally, let's verify our installation: $> terraform -v Terraform v0.7.2 We have a working Terraform installation now. We are ready to write our first template. First, create an empty directory and name it packt-terraform and enter it: $> mkdir packt-terraform && cd packt-terraform When you run terraform commands it looks for files with .tf extension in a directory you run it from. Be careful: Terraform will load all files with .tf extension if you run it without arguments. Let's create our very first, not very useful yet template: $> touch template.tf To apply template you need to run terraform apply command. What does this applying mean? In Terraform, when you run apply, it will read your templates and it will try to create an infrastructure exactly as it's defined in your templates. For now, let's just apply our empty template: $> terraform apply Apply complete! Resources: 0 added, 0 changed, 0 destroyed. After each run is finished you get a number of resources that you've added, changed and destroyed. In this case, it did nothing, as we just have an empty file instead of a real template.  To make Terraform do something useful we first need to configure our provider, and even before that we need to find out what is provider. What are the many Terraform Providers Provider is something you use to configure access to the service you create resources for. For example, if you want to create AWS resources, you need to configure AWS provider, which would specify credentials to access APIs of many AWS services. At the moment of writing Terraform has 43 providers. This impressive list includes not only major cloud providers like AWS and Google Cloud, but also a smaller services, like Fastly, a Content Delivery Network (CDN) provider. Not every provider requires explicit configuration. Some of them do not even deal with external services. Instead, they provide resources for local entities. For example, you could use TLS provider to generate keys and certificates. But still, most of providers are dealing with one or another external API and requires configuration. In this we will be using AWS provider. Before we configure it, let's have a short introduction to AWS. If you are already familiar to this platform, feel free to skip next session and proceed directly to Configuring AWS provider. Short introduction to AWS Amazon Web Services is a cloud offering from Amazon, an online retail giant. Back in early 2000s, Amazon invested money into an automated platform, which would provide services for things like network, storage and computation to developers. Developers then don't need to manage underlying infrastructure. Instead, they would use provided services via APIs to provision virtual machines, storage buckets and so on. The platform, initially built to power Amazon itself, was open for public usage in 2006. The first two services: Simple Storage Service (S3) and Elastic Compute Cloud (EC2) were released and anyone could pay for using them. Fast forward 10 years. AWS has now over 70 different services, covering practically everything modern infrastructure would need. It has services for virtual networking, queue processing, transactional emails, storage, DNS, relational databases and many many others. Businesses like Netflix completely moved away from in-house hardware and instead are building new type of infrastructure on top of cloud resources, getting significant benefits in terms of flexibility and cost-savings and focusing on working on a product, rather than scaling and maturing own data center. With such an impressive list of services, it becomes increasingly hard to juggle all involved components via AWS Management Console: in-browser interface for working with AWS. Of course, AWS provides APIs for every service it has, but ones again: the number and intersection of them can be very high, and it only grows as you keep relying on the cloud. This led exactly to the problems you end either with intense ClickOps practices or you script everything you can. These problems make AWS perfect candidate for exploring Terraform, as we can fully understand the pain caused by direct usage of its services. Of course, AWS is not free to use, but luckily for a long time now they provide Free Tier. Free Tier allows you to use lots of (but not all) services for free with certain limitations. For example, you can use single EC2 instance for 750 hours a month for 12 months for free, as long as it has t2.micro type. EC2 instances are, simply, virtual servers. You pay for them per-hour of usage, and you can choose from a pre-defined list of types. Types are just different combinations of characteristics. Some are optimized for high memory usage, others were created for processor-heavy tasks. Let's create a brand new AWS account for our Terraform learning goals, as following: Open https://aws.amazon.com/free/ and click Create a free account. Follow on-screen instructions to complete registration. Please notice, that in order to use Free Tier you have to provide your credit card details. But you won't be charged unless you exceed your free usage limit. Using Elastic Compute Cloud Creating an instance through management console Just to get a feel of AWS Management Console and to fully understand how much Terraform simplifies working with AWS, let's create a single EC2 instance manually. Login to the console and choose EC2 from the list of services: Click on Launch Instance: Choose AWS Marketplace from left sidebar, type Centos in search box and click Select button for first search result: On each of the next pages just click Next till you reach the end of the process. As you see, it's not really a fast process to create a single virtual server on EC2. You have to choose AMI, instance type, configure network details and permissions, select or generate an SSH-key, properly tag it, pick right security groups and add storage. Imagine, if your day would consist only of manual tasks like this. What a boring job would it be? AMI is a source image, an instance is created from. You can create your own AMIs, use the ones provided by AWS or select one from community at AWS Marketplace. Security Group (SG) is like a Firewall. You can attach multiple SGs to an instance and define inbound and outbound rules. It allows you to configure access not only for IP ranges, but also for other security groups. And, of course, we looked at only a single service, EC2. And as you know already, there are over 70 of them, each with its own interface to click through. Let's take a look now how to achieve the same with AWS CLI. Creating an instance with AWS CLI AWS provides a CLI to interact with its APIs. It's written in Python. You can follow installation instructions from the official guide to get started: https://aws.amazon.com/cli/ Perhaps, the most important part of setting up AWS CLI is to configure access keys. We will also need these keys for Terraform. To get them, click on your username in top right part of AWS Management Console, click on Security Credentials and then download your keys from Access Keys (Access Key ID and Secret Access Key) menu. Warning: using root account access keys is considered a bad practice when working with AWS. You should use IAM users and per-user keys. For the needs of this root keys are okay, but as soon as you move production systems to AWS, consider using IAM and reduce root account usage to minimum. Once AWS CLI is installed, run aws configure command. It will prompt you for your access keys and region. Once you are finished, you can use it to talk to AWS API. Creating an EC2 instance will look like this: $> aws ec2 run-instances --image-id ami-xxxxxxxx --count 1 --instance-type t2.micro --key-name MyKeyPair --security-groups my-sg While already much better, than doing it from Management Console, it's still a long command to execute, and it covers only creation. For tracking, if instance is still there, updating it and destroying you need to construct similar long from command line calls. Let's finally do it properly: with Terraform. Configuring AWS Provider Before using Terraform to create an instance we need to configure AWS provider. This is the first piece of code we will write in our template. Templates are written in special language called Hashicorp Configuration Language (HCL) https://github.com/hashicorp/hcl. You can also write your templates in JSON, but it is recommended only if template itself is generated by a machine. There are four different ways to configure credentials: Static credentials With this method, you just hard-code your access keys write inside your template. It looks like this: provider "aws" { access_key = "xxxxxxxxxxxxx" secret_key = "xxxxxxxxxxxxx" region = "us-east-1" } Though the simplest one, it is also a least flexible and secured one. You don't want to give your credentials just like this to everyone in the team. Rather, each team member should use his or her own keys. Environment variables If not specified in the template, Terraform will try to read configuration from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. You can also set your region with AWS_DEFAULT_REGION variable. In this case, complete configuration goes down to: provider "aws" {} Credentials file If Terraform won't find keys in the template or environment variables, it will try to fetch them from credentials file, which is typically stored in ~/.aws/credentials. If you previously installed and configured AWS CLI, then you already have credentials file generated for you. If you did not do it, then you can add it yourself, with content like this: [default] aws_access_key_id = xxxxxxxxxxxxx aws_secret_access_key = xxxxxxxxxxxxx You should always avoid setting credentials directly in the template. It's up to you if you use environment variables or credentials file. Whichever method you picked, let's add following configuration to template.tf: provider "aws" { region = "eu-central-1" } Running terraform apply still won't do anything, because we did not specify any resources we want our infrastructure to have. Let's do it. Creating EC2 instance with Terraform Resources are components of your infrastructure. It can be something as complex as complete virtual server, or something as simple as DNS record. Each resource belongs to a provider and type of the resource is suffixed with provider name. Configuration of a resource takes this form then: resource "provider-name_resource-type" "resource-name" { parameter_name = parameter_value } There are three types of things you can configure inside resource block: resource-specific parameters, meta-parameters and provisioners. For now, let's focus on resource-specific parameters. They are unique to each resource type. We will create an EC2 instance, which is created with aws_instance resource. To create an instance we need to set at least two parameters: ami and instance_type. Some parameters are required, while others are optional, ami and instance_type being the required ones. You can always check the complete list of available parameters in the docs, on the page dedicated to the particular resource. For example, to get the list and description of all aws_instance resource parameters check out https://www.terraform.io/docs/providers/aws/r/instance.html. We'll be using official Centos 7 AMI. As we configured AWS region to eu-central-1, then need AMI with id ami-9bf712f4. We will use t2.micro instance type, as it's the cheapest one and available as part of Free Tier offering. Update the template to look like this: # Provider configuration provider "aws" { region = "eu-central-1" } # Resource configuration resource "aws_instance" "hello-instance" { ami = "ami-378f925b" instance_type = "t2.micro" tags { Name = "hello-instance" } } You might also need to specify subnet_id parameter, in case you don't have a default VPC. For this you will need to create a VPC and a subnet. You can either do it now yourself. As you noticed, HCL allows commenting your code using hash sign in front of the text you want to be commented. There is another thing to look at: tags parameter. Terraform is not limited to simple string values. You can also have numbers, boolean values (true, false), lists (["elem1", "elem2", "elem3"])and maps. tags parameter is a map of tags for the instance.  Let's apply this template! $> terraform apply aws_instance.hello-instance: Creating... ami: "" => "ami-378f925b" < ……………………. > instance_type: "" => "t2.micro" key_name: "" => "<computed>" < ……………………. > tags.%: "" => "1" tags.Name: "" => "hello-instance" tenancy: "" => "<computed>" vpc_security_group_ids.#: "" => "<computed>" aws_instance.hello-instance: Still creating... (10s elapsed) aws_instance.hello-instance: Still creating... (20s elapsed) aws_instance.hello-instance: Still creating... (30s elapsed) aws_instance.hello-instance: Creation complete Apply complete! Resources: 1 added, 0 changed, 0 destroyed. The state of your infrastructure has been saved to the path below. This state is required to modify and destroy your infrastructure, so keep it safe. To inspect the complete state use the terraform show` command. State path: terraform.tfstate Wow, that's a lot of output for a simple command creating a single instance. Some parts of it were replaced with arrows-wrapped dots, so don't be surprised when you will see even more parameters values when you actually run the command. Before digging into the output, let's first verify that the instance was really created, in AWS Management Console. With just 12 lines of code and single Terraform command invocation we got our EC2 instance running. So far the result we got is not that different from using AWS CLI though: we only created a resource. What is of more interest is how do we update and destroy this instance using the same template. And to understand how Terraform does it we need to learn what the state file is. Summary In this article we learned that how to update our server using the same template and, finally, destroy it. We will already have a solid knowledge of Terraform basics and we will be ready to template your existing infrastructure. Resources for Article: Further resources on this subject: Provision IaaS with Terraform [article] Start Treating your Infrastructure as Code [article] OpenStack Networking in a Nutshell [article]
Read more
  • 0
  • 0
  • 10988

article-image-rdo-installation
Packt
09 Aug 2016
26 min read
Save for later

RDO Installation

Packt
09 Aug 2016
26 min read
In this article by Dan Radez, author of OpenStack Essentials - Second Edition, we will see how OpenStack has a very modular design, and because of this design, there are lots of moving parts. It is overwhelming to start walking through installing and using OpenStack without understanding the internal architecture of the components that make up OpenStack. In this article, we'll look at these components. Each component in OpenStack manages a different resource that can be virtualized for the end user. Separating the management of each of the types of resources that can be virtualized into separate components makes the OpenStack architecture very modular. If a particular service or resource provided by a component is not required, then the component is optional to an OpenStack deployment. Once the components that make up OpenStack have been covered, we will discuss the configuration of a community-supported distribution of OpenStack called RDO. (For more resources related to this topic, see here.) OpenStack architecture Let's start by outlining some simple categories to group these services into. Logically, the components of OpenStack are divided into three groups: Control Network Compute The control tier runs the Application Programming Interface (API) services, web interface, database, and message bus. The network tier runs network service agents for networking, and the compute tier is the virtualization hypervisor. It has services and agents to handle virtual machines. All of the components use a database and/or a message bus. The database can be MySQL, MariaDB, or PostgreSQL. The most popular message buses are RabbitMQ, Qpid, and ActiveMQ. For smaller deployments, the database and messaging services usually run on the control node, but they could have their own nodes if required. In a simple multi-node deployment, the control and networking services are installed on one server and the compute services are installed onto another server. OpenStack could be installed on one node or more than two nodes, but a good baseline for being able to scale out later is to put control and network together and compute by itself Now that a base logical architecture of OpenStack has been defined, let's look at what components make up this basic architecture. To do that, we'll first touch on the web interface and then work toward collecting the resources necessary to launch an instance. Finally, we will look at what components are available to add resources to a launched instance. Dashboard The OpenStack dashboard is the web interface component provided with OpenStack. You'll sometimes hear the terms dashboard and Horizon used interchangeably. Technically, they are not the same thing. This article will refer to the web interface as the dashboard. The team that develops the web interface maintains both the dashboard interface and the Horizon framework that the dashboard uses. More important than getting these terms right is understanding the commitment that the team that maintains this code base has made to the OpenStack project. They have pledged to include support for all the officially accepted components that are included in OpenStack. Visit the OpenStack website (http://www.openstack.org/) to get an official list of OpenStack components. The dashboard cannot do anything that the API cannot do. All the actions that are taken through the dashboard result in calls to the API to complete the task requested by the end user. Throughout this article, we will examine how to use the web interface and the API clients to execute tasks in an OpenStack cluster. Next, we will discuss both the dashboard and the underlying components that the dashboard makes calls to when creating OpenStack resources. Keystone Keystone is the identity management component. The first thing that needs to happen while connecting to an OpenStack deployment is authentication. In its most basic installation, Keystone will manage tenants, users, and roles and be a catalog of services and endpoints for all the components in the running cluster. Everything in OpenStack must exist in a tenant. A tenant is simply a grouping of objects. Users, instances, and networks are examples of objects. They cannot exist outside of a tenant. Another name for a tenant is a project. On the command line, the term tenant is used. In the web interface, the term project is used. Users must be granted a role in a tenant. It's important to understand this relationship between the user and a tenant via a role. For now, understand that a user cannot log in to the cluster unless they are a member of a tenant. Even the administrator has a tenant. Even the users the OpenStack components use to communicate with each other have to be members of a tenant to be able to authenticate. Keystone also keeps a catalog of services and endpoints of each of the OpenStack components in the cluster. This is advantageous because all of the components have different API endpoints. By registering them all with Keystone, an end user only needs to know the address of the Keystone server to interact with the cluster. When a call is made to connect to a component other than Keystone, the call will first have to be authenticated, so Keystone will be contacted regardless. Within the communication to Keystone, the client also asks Keystone for the address of the component the user intended to connect to. This makes managing the endpoints easier. If all the endpoints were distributed to the end users, then it would be a complex process to distribute a change in one of the endpoints to all of the end users. By keeping the catalog of services and endpoints in Keystone, a change is easily distributed to end users as new requests are made to connect to the components. By default, Keystone uses username/password authentication to request a token and the acquired tokens for subsequent requests. All the components in the cluster can use the token to verify the user and the user's access. Keystone can also be integrated into other common authentication systems instead of relying on the username and password authentication provided by Keystone Glance Glance is the image management component. Once we're authenticated, there are a few resources that need to be available for an instance to launch. The first resource we'll look at is the disk image to launch from. Before a server is useful, it needs to have an operating system installed on it. This is a boilerplate task that cloud computing has streamlined by creating a registry of pre-installed disk images to boot from. Glance serves as this registry within an OpenStack deployment. In preparation for an instance to launch, a copy of a selected Glance image is first cached to the compute node where the instance is being launched. Then, a copy is made to the ephemeral disk location of the new instance. Subsequent instances launched on the same compute node using the same disk image will use the cached copy of the Glance image. The images stored in Glance are sometimes called sealed-disk images. These images are disk images that have had the operating system installed but have had things such as the Secure Shell (SSH) host key and network device MAC addresses removed. This makes the disk images generic, so they can be reused and launched repeatedly without the running copies conflicting with each other. To do this, the host-specific information is provided or generated at boot. The provided information is passed in through a post-boot configuration facility called cloud-init. Usually, these images are downloaded from distribution's download pages. If you search the internet for your favorite distribution's name and cloud image, you will probably get a link to where to download a generic pre-built copy of a Glance image, also known as a cloud image. The images can also be customized for special purposes beyond a base operating system installation. If there was a specific purpose for which an instance would be launched many times, then some of the repetitive configuration tasks could be performed ahead of time and built into the disk image. For example, if a disk image was intended to be used to build a cluster of web servers, it would make sense to install a web server package on the disk image before it was used to launch an instance. It would save time and bandwidth to do it once before it is registered with Glance instead of doing this package installation and configuration over and over each time a web server instance is booted. There are quite a few ways to build these disk images. The simplest way is to do a virtual machine installation manually, make sure that the host-specific information is removed, and include cloud-init in the built image. Cloud-init is packaged in most major distributions; you should be able to simply add it to a package list. There are also tools to make this happen in a more autonomous fashion. Some of the more popular tools are virt-install, Oz, and appliance-creator. The most important thing about building a cloud image for OpenStack is to make sure that cloud-init is installed. Cloud-init is a script that should run post boot to connect back to the metadata service. Neutron Neutron is the network management component. With Keystone, we're authenticated, and from Glance, a disk image will be provided. The next resource required for launch is a virtual network. Neutron is an API frontend (and a set of agents) that manages the Software Defined Networking (SDN) infrastructure for you. When an OpenStack deployment is using Neutron, it means that each of your tenants can create virtual isolated networks. Each of these isolated networks can be connected to virtual routers to create routes between the virtual networks. A virtual router can have an external gateway connected to it, and external access can be given to each instance by associating a floating IP on an external network with an instance. Neutron then puts all the configuration in place to route the traffic sent to the floating IP address through these virtual network resources into a launched instance. This is also called Networking as a Service (NaaS). NaaS is the capability to provide networks and network resources on demand via software. By default, the OpenStack distribution we will install uses Open vSwitch to orchestrate the underlying virtualized networking infrastructure. Open vSwitch is a virtual managed switch. As long as the nodes in your cluster have simple connectivity to each other, Open vSwitch can be the infrastructure configured to isolate the virtual networks for the tenants in OpenStack. There are also many vendor plugins that would allow you to replace Open vSwitch with a physical managed switch to handle the virtual networks. Neutron even has the capability to use multiple plugins to manage multiple network appliances. As an example, Open vSwitch and a vendor's appliance could be used in parallel to manage virtual networks in an OpenStack deployment. This is a great example of how OpenStack is built to provide flexibility and choice to its users. Networking is the most complex component of OpenStack to configure and maintain. This is because Neutron is built around core networking concepts. To successfully deploy Neutron, you need to understand these core concepts and how they interact with one another. Nova Nova is the instance management component. An authenticated user who has access to a Glance image and has created a network for an instance to live on is almost ready to tie all of this together and launch an instance. The last resources that are required are a key pair and a security group. A key pair is simply an SSH key pair. OpenStack will allow you to import your own key pair or generate one to use. When the instance is launched, the public key is placed in the authorized_keys file so that a password-less SSH connection can be made to the running instance. Before that SSH connection can be made, the security groups have to be opened to allow the connection to be made. A security group is a firewall at the cloud infrastructure layer. The OpenStack distribution we'll use will have a default security group with rules to allow instances to communicate with each other within the same security group, but rules will have to be added for Internet Control Message Protocol (ICMP), SSH, and other connections to be made from outside the security group. Once there's an image, network, key pair, and security group available, an instance can be launched. The resource's identifiers are provided to Nova, and Nova looks at what resources are being used on which hypervisors, and schedules the instance to spawn on a compute node. The compute node gets the Glance image, creates the virtual network devices, and boots the instance. During the boot, cloud-init should run and connect to the metadata service. The metadata service provides the SSH public key needed for SSH login to the instance and, if provided, any post-boot configuration that needs to happen. This could be anything from a simple shell script to an invocation of a configuration management engine. Cinder Cinder is the block storage management component. Volumes can be created and attached to instances. Then they are used on the instances as any other block device would be used. On the instance, the block device can be partitioned and a filesystem can be created and mounted. Cinder also handles snapshots. Snapshots can be taken of the block volumes or of instances. Instances can also use these snapshots as a boot source. There is an extensive collection of storage backends that can be configured as the backing store for Cinder volumes and snapshots. By default, Logical Volume Manager (LVM) is configured. GlusterFS and Ceph are two popular software-based storage solutions. There are also many plugins for hardware appliances. Swift Swift is the object storage management component. Object storage is a simple content-only storage system. Files are stored without the metadata that a block filesystem has. These are simply containers and files. The files are simply content. Swift has two layers as part of its deployment: the proxy and the storage engine. The proxy is the API layer. It's the service that the end user communicates with. The proxy is configured to talk to the storage engine on the user's behalf. By default, the storage engine is the Swift storage engine. It's able to do software-based storage distribution and replication. GlusterFS and Ceph are also popular storage backends for Swift. They have similar distribution and replication capabilities to those of Swift storage. Ceilometer Ceilometer is the telemetry component. It collects resource measurements and is able to monitor the cluster. Ceilometer was originally designed as a metering system for billing users. As it was being built, there was a realization that it would be useful for more than just billing and turned into a general-purpose telemetry system. Ceilometer meters measure the resources being used in an OpenStack deployment. When Ceilometer reads a meter, it's called a sample. These samples get recorded on a regular basis. A collection of samples is called a statistic. Telemetry statistics will give insights into how the resources of an OpenStack deployment are being used. The samples can also be used for alarms. Alarms are nothing but monitors that watch for a certain criterion to be met. Heat Heat is the orchestration component. Orchestration is the process of launching multiple instances that are intended to work together. In orchestration, there is a file, known as a template, used to define what will be launched. In this template, there can also be ordering or dependencies set up between the instances. Data that needs to be passed between the instances for configuration can also be defined in these templates. Heat is also compatible with AWS CloudFormation templates and implements additional features in addition to the AWS CloudFormation template language. To use Heat, one of these templates is written to define a set of instances that needs to be launched. When a template launches, it creates a collection of virtual resources (instances, networks, storage devices, and so on); this collection of resources is called a stack. When a stack is spawned, the ordering and dependencies, shared configuration data, and post-boot configuration are coordinated via Heat. Heat is not configuration management. It is orchestration. It is intended to coordinate launching the instances, passing configuration data, and executing simple post-boot configuration. A very common post-boot configuration task is invoking an actual configuration management engine to execute more complex post-boot configuration. OpenStack installation The list of components that have been covered is not the full list. This is just a small subset to get you started with using and understanding OpenStack. Further components that are defaults in an OpenStack installation provide many advanced capabilities that we will not be able to cover. Now that we have introduced the OpenStack components, we will illustrate how they work together as a running OpenStack installation. To illustrate an OpenStack installation, we first need to install one. Let's use the RDO Project's OpenStack distribution to do that. RDO has two installation methods; we will discuss both of them and focus on one of them throughout this article. Manual installation and configuration of OpenStack involves installing, configuring, and registering each of the components we covered in the previous part, and also multiple databases and a messaging system. It's a very involved, repetitive, error-prone, and sometimes confusing process. Fortunately, there are a few distributions that include tools to automate this installation and configuration process. One such distribution is the RDO Project distribution. RDO, as a name, doesn't officially mean anything. It is just the name of a community-supported distribution of OpenStack. The RDO Project takes the upstream OpenStack code, packages it in RPMs and provides documentation, forums, IRC channels, and other resources for the RDO community to use and support each other in running OpenStack on RPM-based systems. There are no modifications to the upstream OpenStack code in the RDO distribution. The RDO project packages the code that is in each of the upstream releases of OpenStack. This means that we'll use an open source, community-supported distribution of vanilla OpenStack for our example installation. RDO should be able to be run on any RPM-based system. We will now look at the two installation tools that are part of the RDO Project, Packstack and RDO Triple-O. We will focus on using RDO Triple-O in this article. The RDO Project recommends RDO Triple-O for installations that intend to deploy a more feature-rich environment. One example is High Availability. RDO Triple-O is able to do HA deployments and Packstack is not. There is still great value in doing an installation with Packstack. Packstack is intended to give you a very lightweight, quick way to stand up a basic OpenStack installation. Let's start by taking a quick look at Packstack so you are familiar with how quick and lightweight is it. Installing RDO using Packstack Packstack is an installation tool for OpenStack intended for demonstration and proof-of-concept deployments. Packstack uses SSH to connect to each of the nodes and invokes a puppet run (specifically, a puppet apply) on each of the nodes to install and configure OpenStack. RDO website: http://openstack.redhat.com Packstack installation: http://openstack.redhat.com/install/quickstart The RDO Project quick start gives instructions to install RDO using Packstack in three simple steps: Update the system and install the RDO release rpm as follows: sudo yum update -y sudo yum install -y http://rdo.fedorapeople.org/rdo-release.rpm Install Packstack as shown in the following command: sudo yum install -y openstack-packstack Run Packstack as shown in the following command: sudo packstack --allinone The all-in-one installation method works well to run on a virtual machine as your all-in-one OpenStack node. In reality, however, a cluster will usually use more than one node beyond a simple learning environment. Packstack is capable of doing multinode installations, though you will have to read the RDO Project documentation for Packstack on the RDO Project wiki. We will not go any deeper with Packstack than the all-in-one installation we have just walked through. Don't avoid doing an all-in-one installation; it really is as simple as the steps make it out to be, and there is value in getting an OpenStack installation up and running quickly. Installing RDO using Triple-O The Triple-O project is an OpenStack installation tool developed by the OpenStack community. A Triple-O deployment consists of two OpenStack deployments. One of the deployments is an all-in-one OpenStack installation that is used as a provisioning tool to deploy a multi-node target OpenStack deployment. This target deployment is the deployment intended for end users. Triple-O stands for OpenStack on OpenStack. OpenStack on OpenStack would be OOO, which lovingly became referred to as Triple-O. It may sound like madness to use OpenStack to deploy OpenStack, but consider that OpenStack is really good at provisioning virtual instances. Triple-O applies this strength to bare-metal deployments to deploy a target OpenStack environment. In Triple-O, the two OpenStacks are called the undercloud and the overcloud. The undercloud is a baremetal management enabled all-in-one OpenStack installation that will build for you in a very prescriptive way. Baremetal management enabled means it is intended to manage physical machines instead of virtual machines. The overcloud is the target deployment of OpenStack that is intended be exposed to end users. The undercloud will take a cluster of nodes provided to it and deploy the overcloud to them, a fully featured OpenStack deployment. In real deployments, this is done with a collection of baremetal nodes. Fortunately, for learning purposes, we can mock having a bunch of baremetal nodes by using virtual machines. Mind blown yet? Let's get started with this RDO Manager based OpenStack installation to start unraveling what all this means. There is an RDO Manager quickstart project that we will use to get going. The RDO Triple-O wiki page will be the most up-to-date place to get started with RDO Triple-O. If you have trouble with the directions in this article, please refer to the wiki. OpenSource changes rapidly and RDO Triple-O is no exception. In particular, note that the directions refer to the Mitaka release of OpenStack. The name of the release will most likely be the first thing that changes on the wiki page that will impact your future deployments with RDO Triple-O. Start by downloading the pre-built undercloud image from the RDO Project's repositories. This is something you could build yourself but it would take much more time and effort to build than it would take to download the pre-built one. As mentioned earlier, the undercloud is a pretty prescriptive all-in-one deployment which lends itself well to starting with a pre-built image. These instructions come from the readme of the triple-o quickstart github repository (https://github.com/redhat-openstack/tripleo-quickstart/): myhost# mkdir -p /usr/share/quickstart_images/ myhost# cd /usr/share/quickstart_images/ myhost# wget https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2.md5 https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2 Make sure that your ssh key exists: Myhost# ls ~/.ssh If you don't see the id_rsa and id_rsa.pub files in that directory list, run the command ssh-keygen. Then make sure that your public key is in the authorized keys file: myhost# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys Once you have the undercloud image and you ssh keys pull a copy of the quickstart.sh file, install the dependencies and execute the quickstart script: myhost# cd ~ myhost# wget https://raw.githubusercontent.com/redhat-openstack/tripleo-quickstart/master/quickstart.sh myhost#sh quickstart.sh -u file:///usr/share/quickstart_images/undercloud.qcow2 localhost quickstart.sh will use Ansible to set up the undercloud virtual machine and will define a few extra virtual machines that will be used to mock a collection of baremetal nodes for an overcloud deployment. To see the list of virtual machines that quickstack.sh created, use virsh to list them: myhost# virsh list --all Id Name State ---------------------------------------------------- 17 undercloud running - ceph_0 shut off - compute_0 shut off - control_0 shut off - control_1 shut off - control_2 shut off Along with the undercloud virtual machine, there are ceph, compute, and control virtual machine definitions. These are the nodes that will be used to deploy the OpenStack overcloud. Using virtual machines like this to deploy OpenStack is not suitable for anything but your own personal OpenStack enrichment. These virtual machines represent physical machines that would be used in a real deployment that would be exposed to end users. To continue the undercloud installation, connect to the undercloud virtual machine and run the undercloud configuration: myhost# ssh -F /root/.quickstart/ssh.config.ansible undercloud undercloud# openstack undercloud install The undercloud install command will set up the undercloud machine as an all-in-one OpenStack installation ready be told how to deploy the overcloud. Once the undercloud installation is completed, the final steps are to seed the undercloud with configuration about the overcloud deployment and execute the overcloud deployment: undercloud# source stackrc undercloud# openstack overcloud image upload undercloud# openstack baremetal import --json instackenv.json undercloud# openstack baremetal configure boot undercloud# neutron subnet-list undercloud# neutron subnet-update <subnet-uuid> --dns-nameserver 8.8.8.8 There are also some scripts and other automated ways to make these steps happen: look at the output of the quickstart script or Triple-O quickstart docs in the GitHub repository to get more information about how to automate some of these steps. The source command puts information into the shell environment to tell the subsequent commands how to communicate with the undercloud. The image upload command uploads disk images into Glance that will be used to provision the overcloud nodes.. The first baremetal command imports information about the overcloud environment that will be deployed. This information was written to the instackenv.json file when the undercloud virtual machine was created by quickstart.sh. The second configures the images that were just uploaded in preparation for provisioning the overcloud nodes. The two neutron commands configure a DNS server for the network that the overclouds will use, in this case Google's. Finally, execute the overcloud deploy: undercloud# openstack overcloud deploy --control-scale 1 --compute-scale 1 --templates --libvirt-type qemu --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Let's talk about what this command is doing. In OpenStack, there are two basic node types, control and compute. A control node runs the OpenStack API services, OpenStack scheduling service, database services, and messaging services. Pretty much everything except the hypervisors are part of the control tier and are segregated onto control nodes in a basic deployment. In an HA deployment, there are at least three control nodes. This is why you see three control nodes in the list of virtual machines quickstart.sh created. RDO Triple-O can do HA deployments, though we will focus on non-HA deployments in this article. Note that in the command you have just executed, control scale and compute scale are both set to one. This means that you are deploying one control and one compute node. The other virtual machines will not be used. Take note of the libvirt-type parameter. It is only required if the compute node itself it virtualized, which is what we are doing with RDO Triple-O, to set the configuration properly for the instances to nested. Nested virtualization is when virtual machines are running inside of a virtual machine. In this case, the instances will be virtual machines running inside of the compute node, which is a virtual machine. Finally, the ceph storage scale and storage environment file will deploy Ceph at the storage backend for Glance and Cinder. If you leave off the Ceph and storage environment file parameters, one less virtual machine will be used for deployment. The indication the overcloud deploy has succeeded will give you a Keystone endpoint and a success message: Overcloud Endpoint: http://192.0.2.6:5000/v2.0 Overcloud Deployed Connecting to your Overcloud Finally, before we dig into looking at the OpenStack components that have been installed and configured, let's identify three ways that you can connect to the freshly installed overcloud deployment: From the undercloud: This is the quickest way to access the overcloud. When the overcloud deployment completed, a file named overcloudrc was created. Install the client libraries: Both RDO Triple-O and Packstack were installed from the RDO release repository. By installing this release repository, in the same way that was demonstrated earlier for Packstack on another computer, the OpenStack client libraries can be installed on that computer. If these libraries are installed on a computer that can route the network the overcloud was installed on then the overcloud can be accessed from that computer the same as it can from the undercloud. This is helpful if you do not want to be tied to jumping through the undercloud node to access the overcloud: laptop# sudo yum install -y http://rdo.fedorapeople.org/rdo-release.rpm laptop# sudo yum install python-openstackclient In addition to the client package, you will also need the overcloudrc file from the undercloud. As an example, you can install the packages on the host machine you have just run quickstart.sh and make the overcloud routable by adding an IP address to the OVS bridge the virtual machines were attached to: myhost# sudo ip addr add 192.0.2.222/24 dev bridget myhost# sudo ip link set up dev bridget Once this is done, the commands in the subsequent parts could be run from the host machine instead of the undercloud virtual machine. The OpenStack dashboard: OpenStack's included web interface is called the dashboard. In the installation you have just completed, you can access the overcloud's dashboard by first running the two ip commands used in the second option above then connecting to the IP address indicated as the overcloud endpoint but on port 80 instead of 5000: http://192.0.2.6/. Summary After looking at the components that make up an OpenStack installation, we used RDO Triple-O as a provisioning tool. We now have OpenStack installed and running. Now that OpenStack is installed and running, let's walk through each of the components discussed to learn how to use each of them. Resources for Article: Further resources on this subject: Keystone – OpenStack Identity Service [article] Concepts for OpenStack [article] Setting up VPNaaS in OpenStack [article]
Read more
  • 0
  • 0
  • 10278

article-image-planning-failure-and-success
Packt
27 Dec 2016
24 min read
Save for later

Planning for Failure (and Success)

Packt
27 Dec 2016
24 min read
In this article by Michael Solberg and Ben Silverman, the author of the book Openstack for Architects, we will be walking through how to architect your cloud to avoid hardware and software failures. The OpenStack control plane is comprised of web services, application services, database services, and a message bus. Each of these tiers require different approaches to make them highly available and some organizations will already have defined architectures for each of the services. We've seen that customers either reuse those existing patterns or adopt new ones which are specific to the OpenStack platform. Both of these approaches make sense, depending on the scale of the deployment. Many successful deployments actually implement a blend of these. For example, if your organization already has a supported pattern for highly available MySQL databases, you might chose that pattern instead of the one outlined in this article. If your organization doesn't have a pattern for highly available MongoDB, you might have to architect a new one. (For more resources related to this topic, see here.) Building a highly available control plane Back in the Folsom and Grizzly days, coming up with an high availability (H/A) design for the OpenStack control plane was something of a black art. Many of the technologies recommended in the first iterations of the OpenStack High Availability Guide were specific to the Ubuntu distribution of Linux and were unavailable on the Red Hat Enterprise Linux-derived distributions. The now-standard cluster resource manager (Pacemaker) was unsupported by Red Hat at that time. As such, architects using Ubuntu might use one set of software, those using CentOS or RHEL might use another set of software, and those using a Rackspace or Mirantis distribution might use yet another set of software. However, these days, the technology stack has converged and the H/A pattern is largely consistent regardless of the distribution used. About failure and success When we design a highly available OpenStack control plane, we're looking to mitigate two different scenarios: The first is failure. When a physical piece of hardware dies, we want to make sure that we recover without human interaction and continue to provide service to our users The second and perhaps more important scenario is success Software systems always work as designed and tested until humans start using them. While our automated test suites will try to launch a reasonable number of virtual objects, humans are guaranteed to attempt to launch an unreasonable number. Also, many of the OpenStack projects we've worked on have grown far past their expected size and need to be expanded on the fly. There are a few different types of success scenarios that we need to plan for when architecting an OpenStack cloud. First, we need to plan for a growth in the number of instances. This is relatively straightforward. Each additional instance grows the size of the database, it grows the amount of metering data in Ceilometer, and, most importantly, it will grow the number of compute nodes. Adding compute nodes and reporting puts strain on the message bus, which is typically the limiting factor in the size of OpenStack regions or cells. We'll talk more about this when we talk about dividing up OpenStack clouds into regions, cells, and Availability Zones. The second type of growth we need to plan for is an increase in the number of API calls. Deployments which support Continuous Integration(CI) development environments might have (relatively) small compute requirements, but CI typically brings up and tears down environments rapidly. This will generate a large amount of API traffic, which in turn generates a large amount of database and message traffic. In hosting environments, end users might also manually generate a lot of API traffic as they bring up and down instances, or manually check the status of deployments they've already launched. While a service catalog might check the status of instances it has launched on a regular basis, humans tend to hit refresh on their browsers in an erratic fashion. Automated testing of the platform has a tendency to grossly underestimate this kind of behavior. With that in mind, any pattern that we adopt will need to provide for the following requirements: API services must continue to be available during a hardware failure in the control plane The systems which provide API services must be horizontally scalable (and ideally elastic) to respond to unanticipated demands The database services must be vertically or horizontally scalable to respond to unanticipated growth of the platform The message bus can either be vertically or horizontally scaled depending on the technology chosen Finally, every system has its limits. These limits should be defined in the architecture documentation so that capacity planning can account for them. At some point, the control plane has scaled as far as it can and a second control plane should be deployed to provide additional capacity. Although OpenStack is designed to be massively scalable, it isn't designed to be infinitely scalable. High availability patterns for the control plane There are three approaches commonly used in OpenStack deployments these days for achieving high availability of the control plane. The first is the simplest. Take the single-node cloud controller virtualize it, and then make the virtual machine highly available using either VMware clustering or Linux clustering. While this option is simple and it provides for failure scenarios, it scales vertically (not horizontally) and doesn't provide for success scenarios. As such, it should only be used in regions with a limited number of compute nodes and a limited number of API calls. In practice, this method isn't used frequently and we won't spend any more time on it here. The second pattern provides for H/A, but not horizontal scalability. This is the "Active/Passive" scenario described in the OpenStack High Availability Guide. At Red Hat, we used this a lot with our Folsom and Grizzly deployments, but moved away from it starting with Havana. It's similar to the virtualization solution described earlier but instead of relying on VMware clustering or Linux clustering to restart a failed virtual machine, it relies on Linux clustering to restart failed services on a second cloud controller node, also running the same subset of services. This pattern doesn't provide for success scenarios in the Web tier, but can still be used in the database and messaging tiers. Some networking services may still need to be provided as Active/Passive as well. The third H/A pattern available to OpenStack architectures is the Active/Active pattern. In this pattern, services are horizontally scaled out behind a load balancing service or appliance, which is Active/Passive. As a general rule, most OpenStack services should be enabled as Active/Active where possible to allow for success scenarios while mitigating failure scenarios. Ideally, Active/Active services can be scaled out elastically without service disruption by simply adding additional control plane nodes. Both of the Active/Passive and Active/Active designs require clustering software to determine the health of services and the hosts on which they run. In this article, we'll be using Pacemaker as the cluster manager. Some architects may choose to use Keepalived instead of Pacemaker. Active/Passive service configuration In the Active/Passive service configuration, the service is configured and deployed to two or more physical systems. The service is associated with a Virtual IP(VIP)address. A cluster resource manager (normally Pacemaker) is used to ensure that the service and its VIP are enabled on only one of the two systems at any point in time. The resource manager may be configured to favor one of the machines over the other. When the machine that the service is running on fails, the resource manager first ensures that the failed machine is no longer running and then it starts the service on the second machine. Ensuring that the failed machine is no longer running is accomplished through a process known as fencing. Fencing usually entails powering off the machine using the management interface on the BIOS. The fence agent may also talk to a power supply connected to the failed server to ensure that the system is down. Some services (such as the Glance image registry) require shared storage to operate. If the storage is network-based, such as NFS, the storage may be mounted on both the active and the passive nodes simultaneously. If the storage is block-based, such as iSCSI, the storage will only be mounted on the active node and the resource manager will ensure that the storage migrates with the service and the VIP. Active/Active service configuration Most of the OpenStack API services are designed to be run on more than one system simultaneously. This configuration, the Active/Active configuration, requires a load balancer to spread traffic across each of the active services. The load balancer manages the VIP for the service and ensures that the backend systems are listening before forwarding traffic to them. The cluster manager ensures that the VIP is only active on one node at a time. The backend services may or may not be managed by the cluster manager in the Active/Active configuration. Service or system failure is detected by the load balancer and failed services are brought out of rotation. There are a few different advantages to the Active/Active service configuration, which are as follows: The first advantage is that it allows for horizontal scalability. If additional capacity is needed for a given service, a new system can be brought up which is running the service and it can be added into rotation behind the load balancer without any downtime. The control plane may also be scaled down without downtime in the event that it was over-provisioned. The second advantage is that Active/Active services have a much shorter mean time to recovery. Fencing operations often take up to 2 minutes and fencing is required before the cluster resource manager will move a service from a failed system to a healthy one. Load balancers can immediately detect system failure and stop sending requests to unresponsive nodes while the cluster manager fences them in the background. Whenever possible, architects should employ the Active/Active pattern for the control plane services. OpenStack service specifics In this section, we'll walk through each of the OpenStack services and outline the H/A strategy for them. While most of the services can be configured as Active/Active behind a load balancer, some of them must be configured as Active/Passive and others may be configured as Active/Passive. Some of the configuration is dependent on a particular version of OpenStack as well, especially, Ceilometer, Heat, and Neutron. The following details are current as of the Liberty release of OpenStack. The OpenStack web services As a general rule, all of the web services and the Horizon dashboard may be run Active/Active. These include the API services for Keystone, Glance, Nova, Cinder, Neutron, Heat, and Ceilometer. The scheduling services for Nova, Cinder, Neutron, Heat, and Ceilometer may also be deployed Active/Active. These services do not require a load balancer, as they respond to requests on the message bus. The only web service which must be run Active/Passive is the Ceilometer Central agent. This service can be configured to split its workload among multiple instances, however, to support scaling horizontally. The database services All state for the OpenStack web services is stored in a central database—usually a MySQL database. MySQL is usually deployed in an Active/Passive configuration, but can be made Active/Active with the Galera replication extension. Galera is clustering software for MySQL (MariaDB in OpenStack) and this uses synchronous replication to achieve H/A. However, even with Galera, we still recommend directing writes to only one of the replicas—some queries used by the OpenStack services may deadlock when writing to more than one master. With Galera, a load balancer is typically deployed in front of the cluster and is configured to deliver traffic to only one replica at a time. This configuration reduces the mean time to recovery of the service while ensuring that the data is consistent. In practice, many organizations will defer to the database architects for their preference regarding highly available MySQL deployments. After all, it is typically the database administration team who is responsible for responding to failures of that component. Deployments which use the Ceilometer service also require a MongoDB database to store telemetry data. MongoDB is horizontally scalable by design and is typically deployed Active/Active with at least three replicas. The message bus All OpenStack services communicate through the message bus. Most OpenStack deployments these days use the RabbitMQ service as the message bus. RabbitMQ can be configured to be Active/Active through a facility known as "mirrored queues". The RabbitMQ service is not load balanced, each service is given a list of potential nodes and the client is responsible for determining which nodes are active and which ones have failed. Other messaging services used with OpenStack such as ZeroMQ, ActiveMQ, or Qpid may have different strategies and configurations for achieving H/A and horizontal scalability. For these services, refer to the documentation to determine the optimal architecture. Compute, storage, and network agents The compute, storage, and network components in OpenStack has a set of services which perform the work which is scheduled by the API services. These services register themselves with the schedulers on start up over the message bus. The schedulers are responsible for determining the health of the services and scheduling work to active services. The compute and storage services are all designed to be run Active/Active but the network services need some extra consideration. Each hypervisor in an OpenStack deployment runs the nova-compute service. When this service starts up, it registers itself with the nova-scheduler service. A list of currently available nova services is available via the nova service-list command. If a compute node is unavailable, its state is listed as down and the scheduler skips it when performing instance actions. When the node becomes available, the scheduler includes it in the list of available hosts. For KVM or Xen-based deployments, the nova-compute service runs once per hypervisor and is not made highly available. For VMware-based deployments though, a single nova-compute service is run for every vSphere cluster. As such, this service should be made highly available in an Active/Passive configuration. This is typically done by virtualizing the service within a vSphere cluster and configuring the virtual machine to be highly available. Cinder includes a service known as the volume service or cinder-volume. The volume service registers itself with the Cinder scheduler on startup and is responsible for creating, modifying, or deleting LUNs on block storage devices. For backends which support multiple writers, multiple copies of this service may be run in Active/Active configuration. The LVM backend (which is the reference backend) is not highly available, though, and may only have one cinder-volume service for each block device. This is because the LVM backend is responsible for providing iSCSI access to a locally attached storage device. For this reason, highly available deployments of OpenStack should avoid the LVM Cinder backend and instead use a backend that supports multiple cinder-volume services. Finally, the Neutron component of OpenStack has a number of agents, which all require some special consideration for highly available deployments. The DHCP agent can be configured as highly available, and the number of agents which will respond to DHCP requests for each subnet is governed by a parameter in the neutron.conf file, dhcp_agents_per_network. This is typically set to 2, regardless of the number of DHCP agents which are configured to run in a control plane. For most of the history of OpenStack, the L3 routing agent in Neutron has been a single point of failure. It could be made highly available in Active/Passive configuration, but its failover meant the interruption of network connections in the tenant space. Many of the third-party Neutron plugins have addressed this in different ways and the reference Open vSwitch plugin has a highly available L3 agent as of the Juno release. For details on implementing a solution to the single routing point of failure using OpenStack's Distributed Virtual Routers (DVR), refer to the OpenStack Foundation's Neutron documentation at http://docs.openstack.org/liberty/networking-guide/scenario-dvr-ovs.html. Regions, cells, and availability Zones As we mentioned before, OpenStack is designed to be scalable, but not infinitely scalable. There are three different techniques architects can use to segregate an OpenStack cloud—regions, cells, and Availability Zones. In this section, we'll walk through how each of these concepts maps to hypervisor topologies. Regions From an end user's perspective, OpenStack regions are equivalent to regions in Amazon Web Services. Regions live in separate data centers and are often named after their geographical location. If your organization has a data center in Phoenix and one in Raleigh (like ours does) you'll have at least a PHX and a RDU region. Users who want to geographically disperse their workloads will place some of them in PHX and some of them in RDU. Regions have separate API endpoints, and although the Horizon UI has some support for multiple regions, they essentially entirely separate deployments. From an architectural standpoint, there are two main design choices for implementing regions, which are as follows: The first is around authorization. Users will want to have the same credentials for accessing each of the OpenStack regions. There are a few ways to accomplish this. The simplest way is to use a common backing store (usually LDAP) for the Keystone service in each region. In this scenario, the user has to authenticate separately to each region to get a token, but the credentials are the same. In Juno and later, Keystone also supports federation across regions. In this scenario, a Keystone token granted by one region can be presented to another region to authenticate a user. While this currently isn't widely used, it is a major focus area for the OpenStack Foundation and will probably see broader adoption in the future. The second major consideration for regional architectures is whether or not to present a single set of Glance images to each region. While work is currently being done to replicate Glance images across federated clouds, most organizations are manually ensuring that the shared images are consistent. This typically involves building a workflow around image publishing and deprecation which is mindful of the regional layout. Another option for ensuring consistent images across regions is to implement a central image repository using Swift. This also requires shared Keystone and Glance services which span multiple data centers. Details on how to design multiple regions with shared services are in the OpenStack Architecture Design Guide. Cells The Nova compute service has a concept of cells, which can be used to segregate large pools of hypervisors within a single region. This technique is primarily used to mitigate the scalability limits of the OpenStack message bus. The deployment at CERN makes wide use of cells to achieve massive scalability within single regions. Support for cells varies from service to service and as such cells are infrequently used outside a few very large cloud deployments. The CERN deployment is well-documented and should be used as a reference for these types of deployments. In our experience, it's much simpler to deploy multiple regions within a single data center than to implement cells to achieve large scale. The added inconvenience of presenting your users with multiple API endpoints within a geographic location is typically outweighed by the benefits of having a more robust platform. If multiple control planes are available in a geographic region, the failure of a single control plane becomes less dramatic. The cells architecture has its own set of challenges with regard to networking and scheduling of instance placement. Some very large companies that support the OpenStack effort have been working for years to overcome these hurdles. However, many different OpenStack distributions are currently working on a new control plane design. These new designs would begin to split the OpenStack control plane into containers running the OpenStack services in a microservice type architecture. This way the services themselves can be placed anywhere and be scaled horizontally based on the load. One architecture that has garnered a lot of attention lately is the Kolla project that promotes Docker containers and Ansible playbooks to provide production-ready containers and deployment tools for operating OpenStack clouds. To see more, go to https://wiki.openstack.org/wiki/Kolla. Availability Zones Availability Zones are used to group hypervisors within a single OpenStack region. Availability Zones are exposed to the end user and should be used to provide the user with an indication of the underlying topology of the cloud. The most common use case for Availability Zones is to expose failure zones to the user. To ensure the H/A of a service deployed on OpenStack, a user will typically want to deploy the various components of their service onto hypervisors within different racks. This way, the failure of a top of rack switch or a PDU will only bring down a portion of the instances which provide the service. Racks form a natural boundary for Availability Zones for this reason. There are a few other interesting uses of Availability Zones apart from exposing failure zones to the end user. One financial services customer we work with had a requirement for the instances of each line of business to run on dedicated hardware. A combination of Availability Zones and the AggregateMultiTenancyIsolation Nova Scheduler filter were used to ensure that each tenant had access to dedicated compute nodes. Availability Zones can also be used to expose hardware classes to end users. For example, hosts with faster processors might be placed in one Availability Zone and hosts with slower processors might be placed in different Availability Zones. This allows end users to decide where to place their workloads based upon compute requirements. Updating the design document In this article, we walked through the different approaches and considerations for achieving H/A and scalability in OpenStack deployments. As Cloud Architects, we need to decide on the correct approach for our deployment and then document it thoroughly so that it can be evaluated by the larger team in our organization. Each of the major OpenStack vendors has a reference architecture for highly available deployments and those should be used as a starting point for the design. The design should then be integrated with existing Enterprise Architecture and modified to ensure that best practices established by the various stakeholders within an organization are followed. The system administrators within an organization may be more comfortable supporting Pacemaker than Keepalived. The design document presents the choices made for each of these key technologies and gives the stakeholders an opportunity to comment on them before the deployment. Planning the physical architecture The simplest way to achieve H/A is to add additional cloud controllers to the deployment and cluster them. Other deployments may choose to segregate services into different host classes, which can then be clustered. This may include separating the database services into database nodes, separating the messaging services into messaging nodes, and separating the memcached service into memcache nodes. Load balancing services might live on their own nodes as well. The primary considerations for mapping scalable services to physical (or virtual) hosts are the following: Does the service scale horizontally or vertically? Will vertically scaling the service impede the performance of other co-located services? Does the service have particular hardware or network requirements that other services don't have? For example, some OpenStack deployments which use the HAProxy load balancing service chose to separate out the load balancing nodes on a separate hardware. The VIPs which the load balancing nodes host must live on a public, routed network, while the internal IPs of services that they route to don't have that requirement. Putting the HAProxy service on separate hosts allows the rest of the control plane to only have private addressing. Grouping all of the API services on dedicated hosts may ease horizontal scalability. These services don't need to be managed by a cluster resource manager and can be scaled by adding additional nodes to the load balancers without having to update cluster definitions. Database services have high I/O requirements. Segregating these services onto machines which have access to high performance fiber channel may make sense. Finally, you should consider whether or not to virtualize the control plane. If the control plane will be virtualized, creating additional host groups to host dedicated services becomes very attractive. Having eight or nine virtual machines dedicated to the control plane is a very different proposition than having eight or nine physical machines dedicated to the control plane. Most highly available control planes require at least three nodes to ensure that quorum is easily determined by the cluster resource manager. While dedicating three physical nodes to the control function of a hundred node OpenStack deployment makes a lot of sense, dedicating nine physical nodes may not. Many of the organizations that we've worked with will already have a VMware-based cluster available for hosting management appliances and the control plane can be deployed within that existing footprint. Organizations which are deploying a KVM-only cloud may not want to incur the additional operational complexity of managing the additional virtual machines outside OpenStack. Updating the physical architecture design Once the mapping of services to physical (or virtual) machines has been determined, the design document should be updated to include definition of the host groups and their associated functions. A simple example is provided as follows: Load balancer: These systems provide the load balancing services in an Active/Passive configuration Cloud controller: These systems provide the API services, the scheduling services, and the Horizon dashboard services in an Active/Active configuration Database node: These systems provide the MySQL database services in an Active/Passive configuration Messaging node: These systems provide the RabbitMQ messaging services in an Active/Active configuration Compute node: These systems act as KVM hypervisors and run the nova-compute and openvswitch-agent services Deployments which will be using only the cloud controller host group might use the following definitions: Cloud controller: These systems provide the load balancing services in an Active/Passive configuration and the API services, MySQL database services, and RabbitMQ messaging services in an Active/Active configuration Compute node: These systems act as KVM hypervisors and run the nova-compute and openvswitch-agent services After defining the host groups, the physical architecture diagram should be updated to reflect the mapping of host groups to physical machines in the deployment. This should also include considerations for network connectivity. The following is an example architecture diagram for inclusion in the design document: Summary A complete guide to implementing H/A of the OpenStack services is probably worth a book to itself. In this article we started out by covering the main strategies for making OpenStack services highly available and which strategies apply well to each service. Then we covered how OpenStack deployments are typically segmented across physical regions. Finally, we updated our documentation and implemented a few of the technologies we discussed in the lab. While walking through the main considerations for highly available deployments in this article, we've tried to emphasize a few key points: Scalability is at least as important as H/A in cluster design. Ensure that your design is flexible in case of unexpected growth. OpenStack doesn't scale forever. Plan for multiple regions. Also, it's important to make sure that the strategy and architecture that you adopt for H/A is supportable by your organization. Consider reusing existing architectures for H/A in the message bus and database layers. Resources for Article:  Further resources on this subject: Neutron API Basics [article] The OpenFlow Controllers [article] OpenStack Networking in a Nutshell [article]
Read more
  • 0
  • 0
  • 10142

article-image-what-oracle-public-cloud
Packt
14 Oct 2013
8 min read
Save for later

What is Oracle Public Cloud?

Packt
14 Oct 2013
8 min read
(For more resources related to this topic, see here.) Brief review of cloud computing concepts National Institute for Standard and Technology (NIST), USA has defined cloud computing, and this is the best starting point to understand the fundamentals of cloud computing. The definition is as follows: "Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (for example, networks, servers, storage, applications, and services) that can rapidly be provisioned and released with minimal management effort or service provider interaction." The cloud model proposed by NIST is composed of five essential characteristics, three service models, and four deployment models. Essential characteristics The five essential services defined by NIST are as follows: On-demand self-service: This enables users to provision and manage computing resources, such as server time, amount of storage, and network bandwidth, without a need for human administrators. Broad network access: Cloud services and capabilities must be available and accessed from heterogeneous platforms, such as personal computers and mobile devices. Resource pooling: Provider resources are shared among multiple consumers, and these physical and virtual resources are automatically managed according to consumer demand. Rapid elasticity: Resources can be elastically provisioned and released to rapidly scale out and scale in as per user need. Measured service: Cloud provider resources should be tracked for usage by its consumers for the purpose of billing, generally on a pay-per-use basis. Service models Cloud services are available in the following models: Software as a Service (SaaS): When application software is deployed over a cloud, it is called Software as a Service; for example, Oracle planning and budgeting and Salesforce CRM. Consumers can access this application software from heterogeneous devices without having to worry about the management of hardware, network, and operating system. The user only needs to manage some application-specific settings, if required. Platform as a Service (PaaS): Platform as a Service provides an application development platform in the form of a cloud service; for example, Oracle Java Cloud Services, the Google App engine, and so on. Consumers need not bother about the management of hardware, network, and operating system and only supposed to manage application deployment and configuration settings. Infrastructure as a Service(IaaS): Similarly, hardware infrastructure is made available to its consumers using Infrastructure as a Service clouds, for example, Oracle IaaS/Amazon's Elastic Compute Cloud. Consumers need not to manage the hardware infrastructure but have full control over the operating system, development platform, and application. Deployment models Cloud services can be deployed using one of the following four Cloud deployment models: Private cloud: In a private cloud, the cloud facilities are operated only for a specific organization. It can be owned or managed by the organization, a third party, or a combined entity and can be deployed within the organization or on some other location. Community cloud: A community cloud is shared by multiple organizations of similar concerns such as goals, mission, and data. It can be owned or managed by several organizations from the community, a third-party, or a combined entity and can be deployed within the organizations or on some other location. Public cloud: A public cloud is open for a large user group or can provide open access for general users. The public cloud is generally managed by business, academic, or government organizations. Hybrid cloud: A hybrid cloud is a blending of private, community, or public clouds. Oracle Cloud Services Oracle offers a range of services for all the cloud service models. These services are as follows: Oracle SaaS Services include customer relationship management, human capital management, and enterprise resource planning. These applications cover almost all of the requirements of any commercial application. Oracle PaaS Services offer Java Service, Database Service, and Developer Service. At IaaS level, Oracle offers Oracle servers, Oracle Linux, Oracle Enterprise Manager, and so on. Oracle also offers some common services such as Oracle Social Services, Oracle Management Services, Oracle Storage Services , and Oracle Messaging Services. Oracle Cloud Services have been organized by Oracle into various categories, such as Oracle Application Services, Oracle Platform Services, Oracle Social Services, Oracle Common Infrastructure Services and Oracle Managed Cloud Services. Oracle Application Services Oracle application services provide a broad range of industry-strength, commercial Oracle applications on the cloud. These cloud services enable its consumers to easily use, enhance, and administer the applications. The Oracle Application Services are part of a full suite of business applications. It includes: Enterprise Resource Planning (ERP) Service: Oracle's ERP Cloud Service offers a full set of financial and operational capabilities. It contains almost all the applications required by an organization of any size. Planning and budgeting: This application supports planning and budgeting of work flow in the organization. Financial reporting: This application provides timely, accurate financial and management reports required for decision making. Human capital management: This application supports work force management and simplifies the human resource management process. Talent management: This application empowers businesses to recruit and retain the best resource using its effective functionalities. Sales and marketing: This application can be used to capture sales and customer data and presents analytic reports to improve sales. Customer service and support: This application supports various functionalities related to customer satisfaction and support, such as contact center, feedback management, incidence management, and other functions related to customer services. Oracle Platform Services Oracle platform services enable developers to develop rich applications for their organizations. These services support a wide range of technologies for application development, including: Java Service: This service provides an enterprise-level Java application development platform. The applications can be deployed on an Oracle WebLogic server. It provides flexibility to consumers without a vender lock-in problem. This service is accessible through various consoles and interfaces. Database Service: Using this service, consumers can access an Oracle database on a cloud by Oracle Application Services, RESTful web services, Java Services, and so on. It provides complete SQL and PL/SQL support over the cloud. It supports various application development tools, including SQL developer, an application builder named as Oracle Application Express bundled with the Oracle Cloud, and the RESTful web service wizard. Developer Service: This service provides software development Platforms as a Service to its consumers. The facilities provided by this service include project configuration, user management, source control repositories, defect tracking systems, and documentation systems through wiki. Oracle Social Services Oracle social services offer facilities and tools for social presence, social marketing, and social data research. Social Services include: Social networks: This service is a private network that provides tools to capture and store the information flow within the organization, among people, enterprise applications, and business processes. Social marketing: This service provides applications for team management, content management, information publication, and so on. The information can be managed from various social networking sites, including Facebook, Twitter, and Google+. Social engagement and monitoring: This service supports a new type of customer relationship by automatically identifying the consumer opportunity and threats to your organization based on the data available on social networks. Oracle Common Infrastructure Services This group of Oracle Services includes two common services that can be used by and integrated with the other services. These services are: Oracle Storage Service: This service provides online storage facilities to store and manage the data/contents on the cloud. This makes the application deployment cost effective and efficient. This service offers a single point of control, secure, scalable, and high performance access to consumer data. Oracle Messaging Service: This service enables communication between various software components using the common messaging API. It also provides an infrastructure for software components to communicate with each other by sending and receiving the messages to establish a dynamic and automated workflow environment. Oracle Managed Cloud Services Oracle Managed Cloud Services offer a variety of services to customers for smooth transition to the Oracle Cloud and its successful maintenance. These services provide Oracle's expertise in the form of management services to its user. They include the facility for transition, recovery, security, and testing of the services to be migrated. Transition Service: This service is a set of services to facilitate the smooth transition of normal non-cloud applications to the cloud. It includes a Transition Advisory, migration to proven configuration, CEMLI (Customizations, Extensions, Modifications, Localizations, and Integrations) migrations, upgrade assistance, and DBA support services. Disaster Recovery Service: This service provides robust solutions for preventing, detecting, and recovering from sudden outages to keep the functionality running. Security Service: This service assures its customers about the security and compliance of their data. It provides federal security services, Payment Card Industry (PCI) compliance , Health Insurance Portability and Accountability Act(HIPAA) compliance, identity management, strong authentication, Single Sign-On, identity analytic services, and so on. Testing Service: This service helps customers to ensure that the provider's infrastructure is capable of fulfilling their needs at peak load time and assures customers that the system will meet their expectations. Summary In this article we have discussed the various services offered by the Oracle Public Cloud. We have described the categorization of these services. Resources for Article: Further resources on this subject: Remote Job Agent in Oracle 11g Database with Oracle Scheduler [Article] Introduction to Oracle Service Bus & Oracle Service Registry [Article] Configuration, Release and Change Management with Oracle [Article]
Read more
  • 0
  • 0
  • 10055

article-image-deploying-and-synchronizing-azure-active-directory
Packt
08 Feb 2017
5 min read
Save for later

Deploying and Synchronizing Azure Active Directory

Packt
08 Feb 2017
5 min read
In this article by Jan-henrik Damaschke and Oliver Michalski, authors of the book Implementing Azure Solutions, we will learn about growing cloud services identity management and security as well as access policies within cloud environments and cloud services become more essential and important. The Microsoft central instance for this is Azure Active Directory. Every security policy or identity Microsoft provides for their cloud services is based on Azure Active Directory. Within this article you will learn the basics about Azure Active Directory, how you implement Azure AD and hybrid Azure Active Directory with connection to Active Directory Domain Services. We are going to explore the following topics: Azure Active Directory overview Azure Active Directory Subscription Options Azure Active Directory Deployment Azure Active Directory User and Subscription Management How to deploy Azure Active Directory Hybrid Identities with Active Directory Domain Service Azure Active Directory Hybrid high available and none high available Deployments (For more resources related to this topic, see here.) Azure Active Directory Azure Active Directory or Azure AD a multi-tenant cloud based directory and identity management service developed by Microsoft. Azure AD also includes a full suite of identity management capabilities including multi-factor authentication, device registration, self-service password management, self-service group management, privileged account management, role based access control, application usage monitoring, rich auditing and security monitoring and alerting. Azure AD can be integrated with an existing Windows Server Active Directory, giving organizations the ability to leverage their existing on-premises identities to manage access to cloud based SaaS applications. After this article you will know how to setup Azure AD and Azure Connect. You will also able to design a high available infrastructure for identity replication. The following figure describes the general structure of Azure AD in a hybrid deployment with Active Directory Domain Services: Source: https://azure.microsoft.com/en-us/documentation/articles/active-directory-whatis/ Customers who already using Office 365, CRM online or Intune using Azure AD for their service. You can easily identify if you use Azure AD if you have a username like user@domain.onmicrosoft.com (.de or .cn are also possible if you are using Microsoft Cloud Germany or Azure China). Azure AD is a multi-tenant, geo-distributed, high available service running in 28 and more datacenters around the world. Microsoft implemented automated failover and at least a minimum of two copies of you Active Directory service in other regional or global datacenters. Regularly your directory is running in your primary datacenter, is replicated into another two in your region. If you only have two Azure datacenters in your region like in Europe, the copy will distribute to another region outside yours: Azure Active Directory options There are currently three selectable options for Azure Active Directory with different features to use. There will be a fourth option, Azure Active Directory Premium P2 available later 2016. Azure AD free Supports common features such as: Directory objects with up to 500,000 objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Azure AD basic Supports common features such as: Directory objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Azure AD premium P1 Supports common features such as: Directory Objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Premium features such as: Self-service group and app management/self-service application additions/ dynamic groups Self-service password reset/change/unlock with on-premises write-back Multi-factor authentication (the Cloud and on-premises (MFA server)) MIM CAL with MIM server Cloud app discovery Connect health Automatic password rollover for group accounts Within Q3/2016 Microsoft will also enable customers to use the Azure Active Directory P2 plan, includes all the capabilities in Azure AD Premium P1 as well as the new identity protection and privileged identity management capabilities. That is a necessary step for Microsoft to extend it’s offering for Windows 10 Device Management with Azure AD. Currently Azure AD enable Windows 10 customers to join a device to Azure AD, implement SSO for desktops, Microsoft passport for Azure AD and central Administrator Bitlocker recovery. It also adds automatic classification to Active Directory Rights Management Service in Azure AD. Depending on what you plan to do with your Azure Environment, you should choose your Azure Active Directory Option.  
Read more
  • 0
  • 0
  • 9938

article-image-hdfs-and-mapreduce
Packt
17 Jul 2014
11 min read
Save for later

HDFS and MapReduce

Packt
17 Jul 2014
11 min read
(For more resources related to this topic, see here.) Essentials of HDFS HDFS is a distributed filesystem that has been designed to run on top of a cluster of industry standard hardware. The architecture of HDFS is such that there is no specific need for high-end hardware. HDFS is a highly fault-tolerant system and can handle failures of nodes in a cluster without loss of data. The primary goal behind the design of HDFS is to serve large data files efficiently. HDFS achieves this efficiency and high throughput in data transfer by enabling streaming access to the data in the filesystem. The following are the important features of HDFS: Fault tolerance: Many computers working together as a cluster are bound to have hardware failures. Hardware failures such as disk failures, network connectivity issues, and RAM failures could disrupt processing and cause major downtime. This could lead to data loss as well slippage of critical SLAs. HDFS is designed to withstand such hardware failures by detecting faults and taking recovery actions as required. The data in HDFS is split across the machines in the cluster as chunks of data called blocks. These blocks are replicated across multiple machines of the cluster for redundancy. So, even if a node/machine becomes completely unusable and shuts down, the processing can go on with the copy of the data present on the nodes where the data was replicated. Streaming data: Streaming access enables data to be transferred in the form of a steady and continuous stream. This means if data from a file in HDFS needs to be processed, HDFS starts sending the data as it reads the file and does not wait for the entire file to be read. The client who is consuming this data starts processing the data immediately, as it receives the stream from HDFS. This makes data processing really fast. Large data store: HDFS is used to store large volumes of data. HDFS functions best when the individual data files stored are large files, rather than having large number of small files. File sizes in most Hadoop clusters range from gigabytes to terabytes. The storage scales linearly as more nodes are added to the cluster. Portable: HDFS is a highly portable system. Since it is built on Java, any machine or operating system that can run Java should be able to run HDFS. Even at the hardware layer, HDFS is flexible and runs on most of the commonly available hardware platforms. Most production level clusters have been set up on commodity hardware. Easy interface: The HDFS command-line interface is very similar to any Linux/Unix system. The commands are similar in most cases. So, if one is comfortable with the performing basic file actions in Linux/Unix, using commands with HDFS should be very easy. The following two daemons are responsible for operations on HDFS: Namenode Datanode The namenode and datanodes daemons talk to each other via TCP/IP. Configuring HDFS All HDFS-related configuration is done by adding/updating the properties in the hdfs-site.xml file that is found in the conf folder under the Hadoop installation folder. The following are the different properties that are part of the hdfs-site.xml file: dfs.namenode.servicerpc-address: This specifies the unique namenode RPC address in the cluster. Services/daemons such as the secondary namenode and datanode daemons use this address to connect to the namenode daemon whenever it needs to communicate. This property is shown in the following code snippet: <property> <name>dfs.namenode.servicerpc-address</name> <value>node1.hcluster:8022</value> </property> dfs.namenode.http-address: This specifies the URL that can be used to monitor the namenode daemon from a browser. This property is shown in the following code snippet: <property> <name>dfs.namenode.http-address</name> <value>node1.hcluster:50070</value> </property> dfs.replication: This specifies the replication factor for data block replication across the datanode daemons. The default is 3 as shown in the following code snippet: <property> <name>dfs.replication</name> <value>3</value> </property> dfs.blocksize: This specifies the block size. In the following example, the size is specified in bytes (134,217,728 bytes is 128 MB): <property> <name>dfs.blocksize</name> <value>134217728</value> </property> fs.permissions.umask-mode: This specifies the umask value that will be used when creating files and directories in HDFS. This property is shown in the following code snippet: <property> <name>fs.permissions.umask-mode</name> <value>022</value> </property> The read/write operational flow in HDFS To get a better understanding of HDFS, we need to understand the flow of operations for the following two scenarios: A file is written to HDFS A file is read from HDFS HDFS uses a single-write, multiple-read model, where the files are written once and read several times. The data cannot be altered once written. However, data can be appended to the file by reopening it. All files in the HDFS are saved as data blocks. Writing files in HDFS The following sequence of steps occur when a client tries to write a file to HDFS: The client informs the namenode daemon that it wants to write a file. The namenode daemon checks to see whether the file already exists. If it exists, an appropriate message is sent back to the client. If it does not exist, the namenode daemon makes a metadata entry for the new file. The file to be written is split into data packets at the client end and a data queue is built. The packets in the queue are then streamed to the datanodes in the cluster. The list of datanodes is given by the namenode daemon, which is prepared based on the data replication factor configured. A pipeline is built to perform the writes to all datanodes provided by the namenode daemon. The first packet from the data queue is then transferred to the first datanode daemon. The block is stored on the first datanode daemon and is then copied to the next datanode daemon in the pipeline. This process goes on till the packet is written to the last datanode daemon in the pipeline. The sequence is repeated for all the packets in the data queue. For every packet written on the datanode daemon, a corresponding acknowledgement is sent back to the client. If a packet fails to write onto one of the datanodes, the datanode daemon is removed from the pipeline and the remainder of the packets is written to the good datanodes. The namenode daemon notices the under-replication of the block and arranges for another datanode daemon where the block could be replicated. After all the packets are written, the client performs a close action, indicating that the packets in the data queue have been completely transferred. The client informs the namenode daemon that the write operation is now complete. The following diagram shows the data block replication process across the datanodes during a write operation in HDFS: Reading files in HDFS The following steps occur when a client tries to read a file in HDFS: The client contacts the namenode daemon to get the location of the data blocks of the file it wants to read. The namenode daemon returns the list of addresses of the datanodes for the data blocks. For any read operation, HDFS tries to return the node with the data block that is closest to the client. Here, closest refers to network proximity between the datanode daemon and the client. Once the client has the list, it connects the closest datanode daemon and starts reading the data block using a stream. After the block is read completely, the connection to datanode is terminated and the datanode daemon that hosts the next block in the sequence is identified and the data block is streamed. This goes on until the last data block for that file is read. The following diagram shows the read operation of a file in HDFS: Understanding the namenode UI Hadoop provides web interfaces for each of its services. The namenode UI or the namenode web interface is used to monitor the status of the namenode and can be accessed using the following URL: http://<namenode-server>:50070/ The namenode UI has the following sections: Overview: The general information section provides basic information of the namenode with options to browse the filesystem and the namenode logs. The following is the screenshot of the Overview section of the namenode UI: The Cluster ID parameter displays the identification number of the cluster. This number is same across all the nodes within the cluster. A block pool is a set of blocks that belong to a single namespace. The Block Pool Id parameter is used to segregate the block pools in case there are multiple namespaces configured when using HDFS federation. In HDFS federation, multiple namenodes are configured to scale the name service horizontally. These namenodes are configured to share datanodes amongst themselves. We will be exploring HDFS federation in detail a bit later. Summary: The following is the screenshot of the cluster's summary section from the namenode web interface: Under the Summary section, the first parameter is related to the security configuration of the cluster. If Kerberos (the authorization and authentication system used in Hadoop) is configured, the parameter will show as Security is on. If Kerberos is not configured, the parameter will show as Security is off. The next parameter displays information related to files and blocks in the cluster. Along with this information, the heap and non-heap memory utilization is also displayed. The other parameters displayed in the Summary section are as follows: Configured Capacity: This displays the total capacity (storage space) of HDFS. DFS Used: This displays the total space used in HDFS. Non DFS Used: This displays the amount of space used by other files that are not part of HDFS. This is the space used by the operating system and other files. DFS Remaining: This displays the total space remaining in HDFS. DFS Used%: This displays the total HDFS space utilization shown as percentage. DFS Remaining%: This displays the total HDFS space remaining shown as percentage. Block Pool Used: This displays the total space utilized by the current namespace. Block Pool Used%: This displays the total space utilized by the current namespace shown as percentage. As you can see in the preceding screenshot, in this case, the value matches that of the DFS Used% parameter. This is because there is only one namespace (one namenode) and HDFS is not federated. DataNodes usages% (Min, Median, Max, stdDev): This displays the usages across all datanodes in the cluster. This helps administrators identify unbalanced nodes, which may occur when data is not uniformly placed across the datanodes. Administrators have the option to rebalance the datanodes using a balancer. Live Nodes: This link displays all the datanodes in the cluster as shown in the following screenshot: Dead Nodes: This link displays all the datanodes that are currently in a dead state in the cluster. A dead state for a datanode daemon is when the datanode daemon is not running or has not sent a heartbeat message to the namenode daemon. Datanodes are unable to send heartbeats if there exists a network connection issue between the machines that host the datanode and namenode daemons. Excessive swapping on the datanode machine causes the machine to become unresponsive, which also prevents the datanode daemon from sending heartbeats. Decommissioning Nodes: This link lists all the datanodes that are being decommissioned. Number of Under-Replicated Blocks: This represents the number of blocks that have not replicated as per the replication factor configured in the hdfs-site.xml file. Namenode Journal Status: The journal status provides location information of the fsimage file and the state of the edits logfile. The following screenshot shows the NameNode Journal Status section: NameNode Storage: The namenode storage table provides the location of the fsimage file along with the type of the location. In this case, it is IMAGE_AND_EDITS, which means the same location is used to store the fsimage file as well as the edits logfile. The other types of locations are IMAGE, which stores only the fsimage file and EDITS, which stores only the edits logfile. The following screenshot shows the NameNode Storage information:
Read more
  • 0
  • 0
  • 9726
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-software-defined-data-center
Packt
14 Nov 2016
33 min read
Save for later

The Software-defined Data Center

Packt
14 Nov 2016
33 min read
In this article by Valentin Hamburger, author of the book Building VMware Software-Defined Data Centers, we are introduced and briefed about the software-defined data center (SDDC) that has been introduced by VMware, to further describe the move to a cloud like IT experience. The term software-defined is the important bit of information. It basically means that every key function in the data center is performed and controlled by software, instead of hardware. This opens a whole new way of operating, maintaining but also innovating in a modern data center. (For more resources related to this topic, see here.) But how does a so called SDDC look like – and why is a whole industry pushing so hard towards its adoption? This question might also be a reason why you are reading this article, which is meant to provide a deeper understanding of it and give practical examples and hints how to build and run such a data center. Meanwhile it will also provide the knowledge of mapping business challenges with IT solutions. This is a practice which becomes more and more important these days. IT has come a long way from a pure back office, task oriented role in the early days, to a business relevant asset, which can help organizations to compete with their competition. There has been a major shift from a pure infrastructure provider role to a business enablement function. Today, most organizations business is just as good as their internal IT agility and ability to innovate. There are many examples in various markets where a whole business branch was built on IT innovations such as Netflix, Amazon Web Services, Uber, Airbnb – just to name a few. However, it is unfair to compare any startup with a traditional organization. A startup has one application to maintain and they have to build up a customer base. A traditional organization has a proven and wide customer base and many applications to maintain. So they need to adapt their internal IT to become a digital enterprise, with all the flexibility and agility of a startup, but also maintaining the trust and control over their legacy services. This article will cover the following points: Why is there a demand for SDDC in IT What is SDDC Understand the business challenges and map it to SDDC deliverables The relation of a SDDC and an internal private cloud Identify new data center opportunities and possibilities Become a center of innovation to empower your organizations business The demand for change Today organizations face different challenges in the market to stay relevant. The biggest move was clearly introduced by smartphones and tablets. It was not just a computer in a smaller device, they changed the way IT is delivered and consumed by end users. These devices proved that it can be simple to consume and install applications. Just search in an app store – choose what you like – use it as long as you like it. If you do not need it any longer, simply remove it. All with very simplistic commands and easy to use gestures. More and more people relying on IT services by using a smartphone as their terminal to almost everything. These devices created a demand for fast and easy application and service delivery. So in a way, smartphones have not only transformed the whole mobile market, they also transformed how modern applications and services are delivered from organizations to their customers. Although it would be quite unfair to compare a large enterprise data center with an app store or enterprise service delivery with any app installs on a mobile device, there are startups and industries which rely solely on the smartphone as their target for services, such as Uber or WhatsApp. On the other side, smartphone apps also introduce a whole new way of delivering IT services, since any company never knows how many people will use the app simultaneously. But in the backend they still have to use web servers and databases to continuously provide content and data for these apps. This also introduces a new value model for all other companies. People start to judge a company by the quality of their smartphone apps available. Also people started to migrate to companies which might offer a better smartphone integration as the previous one used. This is not bound to a single industry, but affects a broad spectrum of industries today such as the financial industry, car manufacturers, insurance groups, and even food retailers, just to name a few. A classic data center structure might not be ideal for quick and seamless service delivery. These architectures are created by projects to serve a particular use case for a couple of years. An example of this bigger application environments are web server farms, traditional SAP environments, or a data warehouse. Traditionally these were designed with an assumption about their growth and use. Special project teams have set them up across the data center pillars, as shown in the following figure. Typically, those project teams separate after such the application environment has been completed. All these pillars in the data center are required to work together, but every one of them also needs to mind their own business. Mostly those different divisions also have their own processes which than may integrate in a data center wide process. There was a good reason to structure a data center in this way, the simple fact that nobody can be an expert for every discipline. Companies started to create groups to operate certain areas in a data center, each building their own expertise for their own subject. This was evolving and became the most applied model for IT operations within organizations. Many, if not all, bigger organizations have adopted this approach and people build their careers on this definitions. It served IT well for decades and ensured that each party was adding its best knowledge to any given project. However, this setup has one flaw, it has not been designed for massive change and scale. The bigger these divisions get, the slower they can react to request from other groups in the data center. This introduces a bi-directional issue – since all groups may grow in a similar rate, the overall service delivery time might also increase exponentially. Unfortunately, this also introduces a cost factor when it comes to service deployments across these pillars. Each new service, an organization might introduce or develop, will require each area of IT to contribute. Traditionally, this is done by human hand overs from one department to the other. Each of these hand overs will delay the overall project time or service delivery time, which is also often referred to as time to market. It reflects the needed time interval from the request of a new service to its actual delivery. It is important to mention that this is a level of complexity every modern organization has to deal with, when it comes to application deployment today. The difference between organizations might be in the size of the separate units, but the principle is always the same. Most organizations try to bring their overall service delivery time down to be quicker and more agile. This is often related to business reasons as well as IT cost reasons. In some organizations the time to deliver a brand new service from request to final roll out may take 90 working days. This means – a requestor might wait 18 weeks or more than four and a half month from requesting a new business service to its actual delivery. Do not forget that this reflects the complete service delivery – over all groups until it is ready for production. Also, after these 90 days the requirement of the original request might have changed which would lead into repeating the entire process. Often a quicker time to market is driven by the lines of business (LOB) owners to respond to a competitor in the market, who might already deliver their services faster. This means that today's IT has changed from a pure internal service provider to a business enabler supporting its organization to fight the competition with advanced and innovative services. While this introduces a great chance to the IT department to enable and support their organizations business, it also introduces a threat at the same time. If the internal IT struggles to deliver what the business is asking for, it may lead to leverage shadow IT within the organization. The term shadow IT describes a situation where either the LOBs of an organization or its application developers have grown so disappointed with the internal IT delivery times, that they actually use an external provider for their requirements. This behavior is not agreed with the IT security and can lead to heavy business or legal troubles. This happens more often than one might expect, and it can be as simple as putting some internal files on a public cloud storage provider. These services grant quick results. It is as simple as Register – Download – Use. They are very quick in enrolling new users and sometimes provide a limited use for free. The developer or business owner might not even be aware that there is something non-compliant going on while using this services. So besides the business demand for a quicker service delivery and the security aspect, there an organizations IT department has now also the pressure of staying relevant. But SDDC can provide much more value to the IT than just staying relevant. The automated data center will be an enabler for innovation and trust and introduce a new era of IT delivery. It can not only provide faster service delivery to the business, it can also enable new services or offerings to help the whole organization being innovative for their customers or partners. Business challenges—the use case Today's business strategies often involve a digital delivery of services of any kind. This implies that the requirements a modern organization has towards their internal IT have changed drastically. Unfortunately, the business owners and the IT department tend to have communication issues in some organizations. Sometimes they even operate completely disconnected from each other, as if each of them where their own small company within the organization. Nevertheless, a lot of data center automation projects are driven by enhanced business requirements. In some of these cases, the IT department has not been made aware of what these business requirements look like, or even what the actual business challenges are. Sometimes IT just gets as little information as: We are doing cloud now. This is a dangerous simplification since, the use case is key when it comes to designing and identifying the right solution to solve the organizations challenges. It is important to get the requirements from both sides, the IT delivery side as well as the business requirements and expectations. Here is a simple example how a use case might be identified and mapped to technical implementation. The business view John works as a business owner in an insurance company. He recognizes that their biggest competitor in the market started to offer a mobile application to their clients. The app is simple and allows to do online contract management and tells the clients which products the have enrolled as well as rich information about contract timelines and possible consolidation options. He asks his manager to start a project to also deliver such an application to their customers. Since it is only a simple smartphone application, he expects that it's development might take a couple of weeks and than they can start a beta phase. To be competitive he estimates that they should have something useable for their customers within maximum of 5 months. Based on these facts, he got approval from his manager to request such a product from the internal IT. The IT view Tom is the data center manager of this insurance company. He got informed that the business wants to have a smartphone application to do all kinds of things for the new and existing customers. He is responsible to create a project and bring all necessary people on board to support this project and finally deliver the service to the business. The programming of the app will be done by an external consulting company. Tom discusses a couple of questions regarding this request with his team: How many users do we need to serve? How much time do we need to create this environment? What is the expected level of availability? How much compute power/disk space might be required? After a round of brainstorming and intense discussion, the team still is quite unsure how to answer these questions. For every question there is a couple of variables the team cannot predict. Will only a few of their thousands of users adopt to the app, what if they undersize the middleware environment? What if the user adoption rises within a couple of days, what if it lowers and the environment is over powered and therefor the cost is too high? Tom and his team identified that they need a dynamic solution to be able to serve the business request. He creates a mapping to match possible technical capabilities to the use case. After this mapping was completed, he is using it to discuss with his CIO if and how it can be implemented. Business challenge Question IT capability Easy to use app to win new customers/keep existing How many users do we need to server? Dynamic scale of an environment based on actual performance demand. How much time do we need to create this environment? To fulfill the expectations the environment needs to be flexible. Start small – scale big. What is the expected level of availability? Analytics and monitoring over all layers. Including possible self healing approach. How much compute power/disk space might be required? Create compute nodes based on actual performance requirements on demand. Introduce a capacity on demand model for required resources. Given this table, Tom revealed that with their current data center structure it is quite difficult to deliver what the business is asking for. Also, he got a couple of requirements from other departments, which are going in a similar direction. Based on these mappings, he identified that they need to change their way of deploying services and applications. They will need to use a fair amount of automation. Also, they have to span these functionalities across each data center department as a holistic approach, as shown in the following diagram: In this example, Tom actually identified a very strong use case for SDDC in his company. Based on the actual business requirements of a "simple" application, the whole IT delivery of this company needs to adopt. While this may sound like pure fiction, these are the challenges modern organizations need to face today. It is very important to identify the required capabilities for the entire data center and not just for a single department. You will also have to serve the legacy applications and bring them onto the new model. Therefore it is important to find a solution, which is serving the new business case as well as the legacy applications either way. In the first stage of any SDDC introduction in an organization, it is key to keep always an eye on the big picture. Tools to enable SDDC There is a basic and broadly accepted declaration of what a SDDC needs to offer. It can be considered as the second evolutionary step after server virtualization. It offers an abstraction layer from the infrastructure components such as compute, storage, and network by using automation and tools as such as a self service catalog In a way, it represents a virtualization of the whole data center with the purpose to simplify the request and deployment of complex services. Other capabilities of an SDDC are: Automated infrastructure/service consumption Policy based services and applications deployment Changes to services can be made easily and instantly All infrastructure layers are automated (storage, network, and compute) No human intervention is needed for infrastructure/service deployment High level of standardization is used Business logic is for chargeback or show back functionality All of the preceding points define a SDDC technically. But it is important to understand that a SDDC is considered to solve the business challenges of the organization running it. That means based on the actual business requirements, each SDDC will serve a different use case. Of course there is a main setup you can adopt and roll out – but it is important to understand your organizations business challenges in order to prevent any planning or design shortcomings. Also, to realize this functionality, SDDC needs a couple of software tools. These are designed to work together to deliver a seamless environment. The different parts can be seen like gears in a watch where each gear has an equally important role to make the clockwork function correctly. It is important to remember this when building your SDDC, since missing on one part can make another very complex or even impossible afterwards. This is a list of VMware tools building a SDDC: vRealize Business for Cloud vRealize Operations Manager vRealize Log Insight vRealize Automation vRealize Orchestrator vRealize Automation Converged Blueprint vRealize Code Stream VMware NSX VMware vSphere vRealize Business for Cloud is a charge back/show back tool. It can be used to track cost of services as well as the cost of a whole data center. Since the agility of a SDDC is much higher than for a traditional data center, it is important to track and show also the cost of adding new services. It is not only important from a financial perspective, it also serves as a control mechanism to ensure users are not deploying uncontrolled services and leaving them running even if they are not required anymore. vRealize Operations Manager is serving basically two functionalities. One is to help with the troubleshooting and analytics of the whole SDDC platform. It has an analytics engine, which applies machine learning to the behavior of its monitored components. The other important function is capacity management. It is capable of providing what-if analysis and informs about possible shortcomings of resources way before they occur. These functionalities also use the machine learning algorithms and get more accurate over time. This becomes very important in an dynamic environment where on-demand provisioning is granted. vRealize Log Insight is a unified log management. It offers rich functionality and can search and profile a large amount of log files in seconds. It is recommended to use it as a universal log endpoint for all components in your SDDC. This includes all OSes as well as applications and also your underlying hardware. In an event of error, it is much simpler to have a central log management which is easy searchable and delivers an outcome in seconds. vRealize Automation (vRA) is the base automation tool. It is providing the cloud portal to interact with your SDDC. The portal it provides offers the business logic such as service catalogs, service requests, approvals, and application life cycles. However, it relies strongly on vRealize Orchestrator for its technical automation part. vRA can also tap into external clouds to extend the internal data center. Extending a SDDC is mostly referred to as hybrid cloud. There are a couple of supported cloud offerings vRA can manage. vRealize Orchestrator (vRO) is providing the workflow engine and the technical automation part of the SDDC. It is literally the orchestrator of your new data center. vRO can be easily bound together with vRA to form a very powerful automation suite, where anything with an application programming interface (API) can be integrated. Also it is required to integrate third-party solutions into your deployment workflows, such as configuration management database (CMDB), IP address management (IPAM), or ticketing systems via IT service management (ITSM). vRealize Automation Converged Blueprint was formally known as vRealize Automation Application Services and is an add-on functionality to vRA, which takes care of application installations. It can be used with pre-existing scripts (like Windows PowerShell or Bash on Linux) – but also with variables received from vRA. This makes it very powerful when it comes to on demand application installations. This tool can also make use of vRO to provide even better capabilities for complex application installations. vRealize Code Stream is an addition to vRA and serves specific use cases in the DevOps area of the SDDC. It can be used with various development frameworks such as Jenkins. Also it can be used as a tool for developers to build and operate their own software test, QA and deployment environment. Not only can the developer build these separate stages, the migration from one stage into another can also be fully automated by scripts. This makes it a very powerful tool when it comes to stage and deploy modern and traditional applications within the SDDC. VMware NSX is the network virtualization component. Given the complexity some applications/services might introduce, NSX will provide a good and profound solution to help solving it. The challenges include: Dynamic network creation Microsegmentation Advanced security Network function virtualization VMware vSphere is mostly the base infrastructure and used as the hypervisor for server virtualization. You are probably familiar with vSphere and its functionalities. However, since the SDDC is introducing a change to you data center architecture, it is recommended to re-visit some of the vSphere functionalities and configurations. By using the full potential of vSphere it is possible to save effort when it comes to automation aspects as well as the service/application deployment part of the SDDC. This represents your toolbox required to build the platform for an automated data center. All of them will bring tremendous value and possibilities, but they also will introduce change. It is important that this change needs to be addressed and is a part of the overall SDDC design and installation effort. Embrace the change. The implementation journey While a big part of this article focuses on building and configuring the SDDC, it is important to mention that there are also non-technical aspects to consider. Creating a new way of operating and running your data center will always involve people. It is important to also briefly touch this part of the SDDC. Basically there are three major players when it comes to a fundamental change in any data center, as shown in the following image: Basically there are three major topics relevant for every successful SDDC deployment. Same as for the tools principle, these three disciplines need to work together in order to enable the change and make sure that all benefits can be fully leveraged. These three categories are: People Process Technology The process category Data center processes are as established and settled as IT itself. Beginning with the first operator tasks like changing tapes or starting procedures up to highly sophisticated processes to ensure that the service deployment and management is working as expected they have already come a long way. However, some of these processes might not be fit for purpose anymore, once automation is applied to a data center. To build a SDDC it is very important to revisit data center processes and adopt them to work with the new automation tasks. The tools will offer integration points into processes, but it is equally important to remove bottle necks for the processes as well. However, keep in mind that if you automate a bad process, the process will still be bad – but fully automated. So it is also necessary to re-visit those processes so that they can become slim and effective as well. Remember Tom, the data center manager. He has successfully identified that they need a SDDC to fulfill the business requirements and also did a use case to IT capabilities mapping. While this mapping is mainly talking about what the IT needs to deliver technically, it will also imply that the current IT processes need to adopt to this new delivery model. The process change example in Tom's organization If the compute department works on a service involving OS deployment, they need to fill out an Excel sheet with IP addresses and server names and send it to the networking department. The network admins will ensure that there is no double booking by reserving the IP address and approve the requested host name. After successfully proving the uniqueness of this data, name and IP gets added to the organizations DNS server. The manual part of this process is not longer feasible once the data center enters the automation era – imagine that every time somebody orders a service involving a VM/OS deploy, the network department gets an e-mail containing the Excel with the IP and host name combination. The whole process will have to stop until this step is manually finished. To overcome this, the process has to be changed to use an automated solution for IPAM. The new process has to track IP and host names programmatically to ensure there is no duplication within the entire data center. Also, after successfully checking the uniqueness of the data, it has to be added to the Domain Name System (DNS). While this is a simple example on one small process, normally there is a large number of processes involved which need to be re-viewed for a fully automated data center. This is a very important task and should not be underestimated since it can be a differentiator for success or failure of an SDDC. Think about all other processes in place which are used to control the deploy/enable/install mechanics in your data center. Here is a small example list of questions to ask regarding established processes: What is our current IPAM/DNS process? Do we need to consider a CMDB integration? What is our current ticketing process? (ITSM) What is our process to get resources from network, storage, and compute? What OS/VM deployment process is currently in place? What is our process to deploy an application (hand overs, steps, or departments involved)? What does our current approval process look like? Do we need a technical approval to deliver a service? Do we need a business approval to deliver a service? What integration process do we have for a service/application deployment? DNS, Active Directory (AD), Dynamic Host Configuration Protocol (DHCP), routing, Information Technology Infrastructure Library (ITIL), and so on Now for the approval question, normally these are an exception for the automation part, since approvals are meant to be manual in the first place (either technical or business). If all the other answers to this example questions involve human interaction as well, consider to change these processes to be fully automated by the SDDC. Since human intervention creates waiting times, it has to be avoided during service deployments in any automated data center. Think of it as the robotic construction bands todays car manufacturers are using. The processes they have implemented, developed over ages of experience, are all designed to stop the band only in case of an emergency. The same comes true for the SDDC – try to enable the automated deployment through your processes, stop the automation only in case of an emergency. Identifying processes is the simple part, changing them is the tricky part. However, keep in mind that this is an all new model of IT delivery, therefore there is no golden way of doing it. Once you have committed to change those processes, keep monitoring if they truly fulfill their requirement. This leads to another process principle in the SDDC: Continual Service Improvement (CSI). Re-visit what you have changed from time to time and make sure that those processes are still working as expected, if they don't, change them again. The people category Since every data center is run by people, it is important to also consider that a change of technology will also impact those people. There are some claims that a SDDC can be run with only half of the staff or save a couple of employees since all is automated. The truth is, a SDDC will transform IT roles in a data center. This means that some classic roles might vanish, while others will be added by this change. It is unrealistic to say that you can run an automated data center with half the staff than before. But it is realistic to say that your staff can concentrate on innovation and development instead of working a 100% to keep the lights on. And this is the change an automated data center introduces. It opens up the possibilities to evolve into a more architecture and design focused role for current administrators. The people example in Tom's organization Currently there are two admins in the compute department working for Tom. They are managing and maintaining the virtual environment, which is largely VMware vSphere. They are creating VMs manually, deploying an OS by a network install routine (which was a requirement for physical installs – so they kept the process) and than handing the ready VMs over to the next department to finish installing the service they are meant for. Recently they have experienced a lot of demand for VMs and each of them configures 10 to 12 VMs per day. Given this, they cannot concentrate on other aspects of their job, like improving OS deployments or the hand over process. At a first look it seems like the SDDC might replace these two employees since the tools will largely automate their work. But that is like saying a jackhammer will replace a construction worker. Actually their roles will shift to a more architectural aspect. They need to come up with a template for OS installations and an improvement how to further automate the deployment process. Also they might need to add new services/parts to the SDDC in order to fulfill the business needs continuously. So instead of creating all the VMs manually, they are now focused on designing a blueprint, able to be replicated as easy and efficient as possible. While their tasks might have changed, their workforce is still important to operate and run the SDDC. However, given that they focus on design and architectural tasks now, they also have the time to introduce innovative functions and additions to the data center. Keep in mind that an automated data center affects all departments in an IT organization. This means that also the tasks of the network and storage as well as application and database teams will change. In fact, in a SDDC it is quite impossible to still operate the departments disconnected from each other since a deployment will affect all of them. This also implies that all of these departments will have admins shifting to higher-level functions in order to make the automation possible. In the industry, this shift is also often referred to as Operational Transformation. This basically means that not only the tools have to be in place, you also have to change the way how the staff operates the data center. In most cases organizations decide to form a so-called center of excellence (CoE) to administer and operate the automated data center. This virtual group of admins in a data center is very similar to project groups in traditional data centers. The difference is that these people should be permanently assigned to the CoE for a SDDC. Typically you might have one champion from each department taking part in this virtual team. Each person acts as an expert and ambassador for their department. With this principle, it can be ensured that decisions and overlapping processes are well defined and ready to function across the departments. Also, as an ambassador, each participant should advertise the new functionalities within their department and enable their colleagues to fully support the new data center approach. It is important to have good expertise in terms of technology as well as good communication skills for each member of the CoE. The technology category This is the third aspect of the triangle to successfully implement a SDDC in your environment. Often this is the part where people spend most of their attention, sometimes by ignoring one of the other two parts. However, it is important to note that all three topics need to be equally considered. Think of it like a three legged chair, if one leg is missing it can never stand. The term technology does not necessarily only refer to new tools required to deploy services. It also refers to already established technology, which has to be integrated with the automation toolset (often referred to as third-party integration). This might be your AD, DHCP server, e-mail system, and so on. There might be technology which is not enabling or empowering the data center automation, so instead of only thinking about adding tools, there might also be tools to be removed or replaced. This is a normal IT lifecycle task and has been gone through many iterations already. Think of things like a fax machine or the telex – you might not use them anymore, they have been replaced by e-mail and messaging. The technology example in Tom's organization The team uses some tools to make their daily work easier when it comes to new service deployments. One of the tools is a little graphical user interface to quickly add content to AD. The admins use it to insert the host name, Organizational Unit as well as creating the computer account with it. This was meant to save admin time, since they don't have to open all the various menus in the AD configuration to accomplish these tasks. With the automated service delivery, this has to be done programmatically. Once a new OS is deployed it has to be added to the AD including all requirements by the deployment tool. Since AD offers an API this can be easily automated and integrated into the deployment automation. Instead of painfully integrating the graphical tool, this is now done directly by interfacing the organizations AD, ultimately replacing the old graphical tool. The automated deployment of a service across the entire data center requires a fair amount of communication. Not in a traditional way, but machine-to-machine communication leveraging programmable interfaces. Using such APIs is another important aspect of the applied data center technologies. Most of today's data center tools, from backup all the way up to web servers, do come with APIs. The better the API is documented, the easier the integration into the automation tool. In some cases you might need the vendors to support you with the integration of their tools. If you have identified a tool in the data center, which does not offer any API or even command-line interface (CLI) option at all, try to find a way around this software or even consider replacing it with a new tool. APIs are the equivalent of hand overs in the manual world. The better the communication works between tools, the faster and easier the deployment will be completed. To coordinate and control all this communication, you will need far more than scripts to run. This is a task for an orchestrator, which can run all necessary integration workflows from a central point. This orchestrator will act like a conductor for a big orchestra. It will form the backbone of your SDDC. Why are these three topics so important? The technology aspect closes the triangle and brings the people and the processes parts together. If the processes are not altered to fit the new deployment methods – automation will be painful and complex to implement. If the deployment stops at some point, since the processes require manual intervention, the people will have to fill in this gap. This means that they now have new roles, but also need to maintain some of their old tasks to keep the process running. By introducing such an unbalanced implementation of an automated data center, the workload for people can actually increase, while the service delivery times may not dramatically decrease. This may lead to an avoidance of the automated tasks, since the manual intervention might seen as faster by individual admins. So it is very important to accept all three aspects as the main part of the SDDC implementation journey. They all need to be addressed equally and thoughtfully to unveil the benefits and improvements an automated data center has to offer. However, keep in mind that this truly is a journey. A SDDC is not implemented in days but in month. Given this, also the implementation team in the data center has this time to adopt themselves and their process to this new way of delivering IT services. Also all necessary departments and their lead needs to be involved in this procedure. A SDDC implementation is always a team effort. Additional possibilities and opportunities All the previews mentioned topics serve the sole goal to install and use the SDDC within your data center. However, once you have the SDDC running the real fun begins since you can start to introduce additional functionalities impossible for any traditional data center. Lets just briefly touch on some of the possibilities from an IT view. The self-healing data center This is a concept where the automatic deployment of services is connected to a monitoring system. Once the monitoring system detects that a service or environment may be facing constraints, it can automatically trigger an additional deployment for this service to increase the throughput. While this is application dependent, for infrastructure services this can become quite handy. Think of ESXi host auto deployments if compute power is becoming a constraint, or data store deployments if disk space is running low. If this automation is acting to aggressive for your organization, it can be used with an approval function. Once the monitoring detects a shortcoming it will ask for approval to fix it with a deployment action. Instead of getting an e-mail from your monitoring system that there is a constraint identified, you get an e-mail with the constraint and the resolving action. All you need to do is to approve the action. The self-scaling data center A similar principle is to use a capacity management tool to predict the growth of your environment. If it approaches a trigger, the system can automatically generate an order letter, containing all needed components to satisfy the growing capacity demands. This can than be sent to finance or the purchasing management for approval and before you even get into any capacity constraints, the new gear might be available and ready to run. However, consider the regular turnaround time for ordering hardware, which might affect how far in the future you have to set the trigger for such functionality. Both of this opportunities are more than just nice to haves, they enable your data center to be truly flexible and proactive. Due to the fact that a SDDC is offering a high amount of agility, it will also need some self-monitoring to stay flexible and useable and to fulfill unpredicted demand. Summary In this article we discussed the main principles and declarations of an SDDC. It provided an overview of the opportunities and possibilities this new data center architecture provides. Also, it covered the changes which will be introduced by this new approach. Finally it discussed the implementation journey and its involvement with people, processes and technology. Resources for Article: Further resources on this subject: VM, It Is Not What You Think! [article] Introducing vSphere vMotion [article] Creating a VM using VirtualBox - Ubuntu Linux [article]
Read more
  • 0
  • 0
  • 9714

Packt
09 Mar 2016
13 min read
Save for later

Keystone – OpenStack Identity Service

Packt
09 Mar 2016
13 min read
In this article by Cody Bunch, Kevin Jackson and, Egle Sigler, the authors of  OpenStack Cloud Computing Cookbook, Third Edition, we will cover the following topics: Creating tenants in Keystone Configuring roles in Keystone Adding users to Keystone (For more resources related to this topic, see here.) The OpenStack Identity service, known as Keystone, provides services for authenticating and managing user accounts and role information for our OpenStack cloud environment. It is a crucial service that underpins the authentication and verification between all of our OpenStack cloud services and is the first service that needs to be installed within an OpenStack environment. The OpenStack Identity service authenticates users and tenants by sending a validated authorization token between all OpenStack services. This token is used for authentication and verification so that you can use that service, such as OpenStack Storage and Compute. Therefore, configuration of the OpenStack Identity service must be completed first, consisting of creating appropriate roles for users and services, tenants, the user accounts, and the service API endpoints that make up our cloud infrastructure. In Keystone, we have the concepts of tenants, roles and users. A tenant is like a project and has resources such as users, images, and instances, as well as networks in it that are only known to that particular project. A user can belong to one or more tenants and is able to switch between these projects to gain access to those resources. Users within a tenant can have various roles assigned. In the most basic scenario, a user can be assigned either the role of admin or just be a member. When a user has admin privileges within a tenant, they are able to utilize features that can affect the tenant (such as modifying external networks), whereas a normal user is assigned the member role, which is generally assigned to perform user-related roles, such as spinning up instances, creating volumes, and creating tenant only networks. Creating tenants in Keystone A tenant in OpenStack is a project, and the two terms are generally used interchangeably. Users can't be created without having a tenant assigned to them, so these must be created first. Here, we will create a tenant called cookbook for our users. Getting ready We will be using the keystone client to operate Keystone. If the python-keystoneclient tool isn't available, follow the steps described at http://bit.ly/OpenStackCookbookClientInstall. Ensure that we have our environment set correctly to access our OpenStack environment for administrative purposes: export OS_TENANT_NAME=cookbook export OS_USERNAME=admin export OS_PASSWORD=openstack export OS_AUTH_URL=https://192.168.100.200:5000/v2.0/ export OS_NO_CACHE=1 export OS_KEY=/vagrant/cakey.pem export OS_CACERT=/vagrant/ca.pem You can use the controller node if no other machines are available on your network, as this has the python-keystoneclient and the relevant access to the OpenStack environment. If you are using the Vagrant environment issue the following command to get access to the Controller: vagrant ssh controller How to do it... To create a tenant in our OpenStack environment, perform the following steps: We start by creating a tenant called cookbook: keystone tenant-create \     --name cookbook \     --description "Default Cookbook Tenant" \     --enabled true This will produce output similar to: +-------------+----------------------------------+|   Property  |              Value               |+-------------+----------------------------------+| description |     Default Cookbook Tenant      ||   enabled   |               True               ||      id     | fba7b31689714d1ab39a751bc9483efd ||     name    |             cookbook             |+-------------+----------------------------------+ We also need an admin tenant so that when we create users in this tenant, they have access to our complete environment. We do this in the same way as in the previous step: keystone tenant-create \     --name admin \     --description "Admin Tenant" \     --enabled true How it works... Creation of the tenants is achieved by using the keystone client, specifying the tenant-create option with the following syntax: keystone tenant-create \     --name tenant_name \     --description "A description" \     --enabled true The tenant_name is an arbitrary string and must not contain spaces. On creation of the tenant, this returns an ID associated with it that we use when adding users to this tenant. To see a list of tenants and the associated IDs in our environment, we can issue the following command: keystone tenant-list Configuring roles in Keystone Roles are the permissions given to users within a tenant. Here, we will configure two roles: an admin role that allows for the administration of our environment, and a member role that is given to ordinary users who will be using the cloud environment. Getting ready We will be using the keystone client to operate Keystone. If the python-keystoneclient tool isn't available, follow the steps described at http://bit.ly/OpenStackCookbookClientInstall. Ensure that we have our environment set correctly to access our OpenStack environment for administrative purposes: export OS_TENANT_NAME=cookbook export OS_USERNAME=admin export OS_PASSWORD=openstack export OS_AUTH_URL=https://192.168.100.200:5000/v2.0/ export OS_NO_CACHE=1 export OS_KEY=/vagrant/cakey.pem export OS_CACERT=/vagrant/ca.pem You can use the controller node if no other machines are available on your network, as this has the python-keystoneclient and the relevant access to the OpenStack environment. If you are using the Vagrant environment, issue the following command to get access to the Controller: vagrant ssh controller How to do it... To create the required roles in our OpenStack environment, perform the following steps: Create the admin role as follows: # admin role keystone role-create --name admin You will get an output like this: +----------+----------------------------------+| Property |              Value               |+----------+----------------------------------+|    id    | 625b81ae9f024366bbe023a62ab8a18d ||   name   |              admin               |+----------+----------------------------------+ To create the Member role, we repeat the step and specify the Member role: # Member role keystone role-create --name Member How it works... Creation of the roles is simply achieved by using the keystone client and specifying the role-create option with the following syntax: keystone role-create --name role_name The role_name attribute can't be arbitrary for admin and Member roles. The admin role has been set by default in /etc/keystone/policy.json as having administrative rights: {     "admin_required": [["role:admin"], ["is_admin:1"]] } The Member role is also configured by default in the OpenStack Dashboard, Horizon, for a non-admin user created through the web interface. On creation of the role, the ID associated with is returned, and we can use it when assigning roles to users. To see a list of roles and the associated IDs in our environment, we can issue the following command: keystone role-list Adding users to Keystone Adding users to the OpenStack Identity service requires that the user has a tenant that they can exist in and there is a defined role that can be assigned to them. Here, we will create two users. The first user will be named admin and will have the admin role assigned to them in the cookbook tenant. The second user will be named demo and will have the Member role assigned to them in the same cookbook tenant. Getting ready We will be using the keystone client to operate Keystone. If the python-keystoneclient tool isn't available, follow the steps described at http://bit.ly/OpenStackCookbookClientInstall. Ensure that we have our environment set correctly to access our OpenStack environment for administrative purposes: export OS_TENANT_NAME=cookbook export OS_USERNAME=admin export OS_PASSWORD=openstack export OS_AUTH_URL=https://192.168.100.200:5000/v2.0/ export OS_NO_CACHE=1 export OS_KEY=/vagrant/cakey.pem export OS_CACERT=/vagrant/ca.pem You can use the controller node if no other machines are available on your network, as this has the python-keystoneclient and the relevant access to the OpenStack environment. If you are using the Vagrant environment, issue the following command to get access to the Controller: vagrant ssh controller How to do it... To create the required users in our OpenStack environment, perform the following steps: To create a user in the cookbook tenant, we first need to get the cookbook tenant ID. To do this, issue the following command, which we conveniently store in a variable named TENANT_ID with the tenant-list option: TENANT_ID=$(keystone tenant-list \     | awk '/\ cookbook\ / {print $2}') Now that we have the tenant ID, the admin user in the cookbook tenant is created using the user-create option and a password is chosen for the user: PASSWORD=openstack keystone user-create \     --name admin \     --tenant_id $TENANT_ID \     --pass $PASSWORD \     --email root@localhost \    --enabled true The preceding code will produce the following output: +----------+----------------------------------+| Property |              Value               |+----------+----------------------------------+|  email   |          root@localhost          || enabled  |               True               ||    id    | 2e23d0673e8a4deabe7c0fb70dfcb9f2 ||   name   |              admin               || tenantId | 14e34722ac7b4fe298886371ec17cf40 || username |              admin               |+----------+----------------------------------+ As we are creating the admin user, which we are assigning the admin role, we need the admin role ID. We pick out the ID of the admin role and conveniently store it in a variable to use it when assigning the role to the user with the role-list option: ROLE_ID=$(keystone role-list \    | awk '/\ admin\ / {print $2}') To assign the role to our user, we need to use the user ID that was returned when we created that user. To get this, we can list the users and pick out the ID for that particular user with the following user-list option: USER_ID=$(keystone user-list \     | awk '/\ admin\ / {print $2}') With the tenant ID, user ID, and an appropriate role ID available, we can assign that role to the user with the following user-role-add option: keystone user-role-add \     --user $USER_ID \     --role $ROLE_ID \     --tenant_id $TENANT_ID Note that there is no output produced on successfully running this command. The admin user also needs to be in the admin tenant for us to be able to administer the complete environment. To do this, we need to get the admin tenant ID and then repeat the previous step using this new tenant ID: ADMIN_TENANT_ID=$(keystone tenant-list \     | awk '/\ admin\ / {print $2}') keystone user-role-add \     --user $USER_ID \     --role $ROLE_ID \     --tenant_id $ADMIN_TENANT_ID To create the demo user in the cookbook tenant with the Member role assigned, we repeat the process defined in steps 1 to 5: # Get the cookbook tenant ID TENANT_ID=$(keystone tenant-list \     | awk '/\ cookbook\ / {print $2}')   # Create the user PASSWORD=openstack keystone user-create \     --name demo \     --tenant_id $TENANT_ID \     --pass $PASSWORD \     --email demo@localhost \     --enabled true   # Get the Member role ID ROLE_ID=$(keystone role-list \     | awk '/\ Member\ / {print $2}')   # Get the demo user ID USER_ID=$(keystone user-list \     | awk '/\ demo\ / {print $2}')   # Assign the Member role to the demo user in cookbook keystone user-role-add \     --user $USER_ID \     -–role $ROLE_ID \     --tenant_id $TENANT_ID How it works... Adding users in the OpenStack Identity service involves a number of steps and dependencies. First, a tenant is required for the user to be part of. Once the tenant exists, the user can be added. At this point, the user has no role associated, so the final step is to designate the role to this user, such as Member or admin. Use the following syntax to create a user with the user-create option: keystone user-create \     --name user_name \     --tenant_id TENANT_ID \     --pass PASSWORD \     --email email_address \     --enabled true The user_name attribute is an arbitrary name but cannot contain any spaces. A password attribute must be present. In the previous examples, these were set to openstack. The email_address attribute must also be present. To assign a role to a user with the user-role-add option, use the following syntax: keystone user-role-add \     --user USER_ID \     --role ROLE_ID \     --tenant_id TENANT_ID This means that we need to have the ID of the user, the ID of the role, and the ID of the tenant in order to assign roles to users. These IDs can be found using the following commands: keystone tenant-list keystone user-list keystone role-list Summary In this article, we have looked at the basic operations with respect to Keystone, such as creating tenants, configuring roles, and adding users. To know everything else about cloud computing with OpenStack, check out OpenStack Cloud Computing Cookbook, Third Edition, also currently being used at CERN! The book has chapters on the Identity Service, Image Service, Networking, Object Storage, Block Storage, as well as how to manage OpenStack in production environments! It’s everything you need and more to make your job so much easier! Resources for Article:  Further resources on this subject:  Introducing OpenStack Trove [article] OpenStack Performance, Availability [article] Concepts for OpenStack [article]
Read more
  • 0
  • 0
  • 9497

article-image-monitoring-physical-network-bandwidth-using-openstack-ceilometer
Packt
14 Oct 2015
9 min read
Save for later

Monitoring Physical Network Bandwidth Using OpenStack Ceilometer

Packt
14 Oct 2015
9 min read
In this article by Chandan Dutta Chowdhury and Sriram Subramanian, author of the book, OpenStack Networking Cookbook, we will explore various means to monitor network resource utilization using Ceilometer. (For more resources related to this topic, see here.) Introduction Due to the dynamic nature of virtual infrastructure and multitenancy, OpenStack administrators need to monitor the resources used by tenants. The resource utilization data can be used to bill the users of a public cloud and to debug infrastructure-related problems. Administrators can also use the utilization reports for better capacity planning. In this post, we will look at OpenStack Ceilometer to meter the physical network resource utilization. Ceilometer components The OpenStack Ceilometer project provides you with telemetry services. It can collect the statistics of resource utilization and also provide threshold monitoring and alarm services. The Ceilometer project is composed of the following components: ceilometer-agent-central: This component is a centralized agent that collects the monitoring statistics by polling individual resources ceilometer-agent-notification: This agent runs on a central server and consumes notification events on the OpenStack message bus ceilometer-collector: This component is responsible for dispatching events to the data stores ceilometer-api: This component provides the REST API server that the Ceilometer clients interact with ceilometer-agent-compute: This agent runs on each compute node and collects resource utilization statistics for the local node ceilometer-alarm-notifier: This component runs on the central server and is responsible for sending alarms ceilometer-alarm-evaluator: This componentmonitors the resource utilization and evaluates the conditions to raise an alarm when the resource utilization crosses predefined thresholds The following diagram describes how the Ceilometer components are installed on a typical OpenStack deployment: Installation An installation of any typical OpenStack service consists of the following steps: Creating a catalog entry in keystone for the service Installing the service packages that provide you with a REST API server to access the service Creating a catalog entry The following steps describe the creation of a catalog entry for the Ceilometer service: An OpenStack service requires a user associated with it for the authorization and authentication. To create a service user, use the following command and provide the password when prompted: openstack user create --password-prompt ceilometer This service user will need to have administrative privileges. Use the following command in order to add an administrative role to the service user: openstack role add --project service --user ceilometer admin Next, we will create a service record for the metering service: openstack service create --name ceilometer --description "Telemetry" metering The final step in the creation of the service catalog is to associate this service with its API endpoints. These endpoints are REST URLs to access the OpenStack service and provide access for public, administrative, and internal use: openstack endpoint create --publicurl http://controller:8777 --internalurl http://controller:8777 --adminurl http://controller:8777 --region RegionOne metering Installation of the service packages Let's now install the packages required to provide the metering service. On the Ubuntu system, use the following command to install the Ceilometer packages: apt-get install ceilometer-api ceilometer-collector ceilometer-agent-central ceilometer-agent-notification ceilometer-alarm-evaluator ceilometer-alarm-notifier python-ceilometerclient Configuration The configuration of an OpenStack service requires the following information: Service database-related details such as the database server address, database name, and login credentials in order to access the database. Ceilometer uses MongoDB to store the resource monitoring data. A MongoDB database user with appropriate access for this service should be created. Conventionally, the user and database name matches the name of the OpenStack service, for example, for the metering service, we will use the database user and database name as ceilometer. A keystone user associated with the service (one that we created in the installation section). The credentials to connect to the OpenStack message bus such as rabbitmq. All these details need to be provided in the service configuration file. The main configuration file for Ceilometer is /etc/ceilometer/ceilometer.conf. The following is a sample template for the Ceilometer configuration. The placeholders mentioned here must be replaced appropriately. CEILOMETER_PASS: This is the password for the Ceilometer service user RABBIT_PASS: This is the message queue password for an OpenStack user CEILOMETER_DBPASS: This is the MongoDB password for the Ceilometer database TELEMETRY_SECRET: This is the secret key used by Ceilometer to sign messages Let's have a look at the following code: [DEFAULT] ... auth_strategy = keystone rpc_backend = rabbit [keystone_authtoken] ... auth_uri = http://controller:5000/v2.0 identity_uri = http://controller:35357 admin_tenant_name = service admin_user = ceilometer admin_password = CEILOMETER_PASS [oslo_messaging_rabbit] ... rabbit_host = controller rabbit_userid = openstack rabbit_password = RABBIT_PASS [database] ... connection = mongodb://ceilometer:CEILOMETER_DBPASS@controller:27017/ceilometer [service_credentials] ... os_auth_url = http://controller:5000/v2.0 os_username = ceilometer os_tenant_name = service os_password = CEILOMETER_PASS os_endpoint_type = internalURL os_region_name = RegionOne [publisher] ... telemetry_secret = TELEMETRY_SECRET Starting the metering service Once the configuration is done, make sure that all the Ceilometer components are restarted in order to use the updated configuration: # service ceilometer-agent-central restart # service ceilometer-agent-notification restart # service ceilometer-api restart # service ceilometer-collector restart # service ceilometer-alarm-evaluator restart # service ceilometer-alarm-notifier restart The Ceilometer compute agent must be installed and configured on the Compute node to monitor the OpenStack virtual resources. In this article, we will focus on using the SNMP daemon running on the Compute and Network node to monitor the physical resource utilization data. Physical Network monitoring The SNMP daemon must be installed and configured on all the OpenStack Compute and Network nodes for which we intend to monitor the physical network bandwidth. The Ceilometer central agent polls the SNMP daemon on the OpenStack Compute and Network nodes to collect the physical resource utilization data. The SNMP configuration The following steps describe the process of the installation and configuration of the SNMP daemon: To configure the SNMP daemon on the OpenStack nodes, you will need to install the following packages: apt-get install snmpd snmp Update the SNMP configuration file, /etc/snmp/snmpd.conf, to bind the daemon on all the network interfaces: agentAddress udp:161 Update the /etc/snmp/snmpd.conf file to create an SNMP community and to view and provide access to the required MIBs. The following configuration assumes that the OpenStack nodes, where the Ceilometer central agent runs, have addresses in the 192.168.0.0/24 Subnet: com2sec OS_net_sec 192.168.0.0/24 OS_community group OS_Group any OS_net_sec view OS_view included .1 80 access OS_Group "" any noauth 0 OS_view none none Next, restart the SNMP daemon with the updated configuration: service snmpd restart You can verify that the SNMP data is accessible from the Ceilometer central agent by running the following snmpwalk command from the nodes that run the central agent. In this example, 192.168.0.2 is the IP of the Compute node running the SNMP daemon while the snmpwalk command itself is executed on the node running the central agent. You should see a list of SNMP OIDs and their values: snmpwalk -mALL -v2c -cOS_community 192.168.0.2 The Ceilometer data source configuration Once the SNMP configuration is completed on the Compute and Network nodes, a Ceilometer data source must be configured for these nodes. A pipeline defines the transformation of the data collected by Ceilometer. The pipeline definition consists of a data source and sink. A source defines the start of the pipeline and is associated with meters, resources, and a collection interval. It also defines the associated sink. A sink defines the end of the transformation pipeline, transformation applied on the data, and the method used to publish the transformed data. Sending a notification over the message bus is the commonly used publishing mechanism. To use data samples provided by the SNMP daemon on the Compute and Network nodes, we will configure a meter_snmp data source with a predefined meter_sink in the Ceilometer pipeline on the Central node. We will use the OpenStack Controller node to run the central agent. The data source will also map the central agent to the Compute and Network nodes. In the /etc/ceilometer/pipeline.yaml file, add the following lines. This configuration will allow the central agent to poll the SNMP daemon on three OpenStack nodes with the IPs of 192.168.0.2, 192.168.0.3 and 192.168.0.4: - name: meter_snmp interval: 60 resources: - snmp://OS_community@192.168.0.2 - snmp://OS_community@192.168.0.3 - snmp://OS_community@192.168.0.4 meters: - "hardware.cpu*" - "hardware.memory*" - "hardware.disk*" - "hardware.network*" sinks: - meter_sink Restart the Ceilometer central agent in order to load the pipeline configuration change: # service ceilometer-agent-central restart Sample Network usage data With the SNMP daemon configured, the data source in place, and the central agent reloaded, the statistics of the physical network utilization will be collected by Ceilometer. Use the following commands to view the utilization data. ceilometer statistics -m hardware.network.incoming.datagram -q resource=<node-ip> You can use various meters such as hardware.network.incoming.datagram and hardware.network.outgoing.datagram. The following image shows some sample data: All the meters associated with the physical resource monitoring for a host can be viewed using the following command, where 192.168.0.2 is the node IP as defined in the pipeline SNMP source: ceilometer meter-list|grep 192.168.0.2 The following is the output of the preceding command: How does it work The SNMP daemon runs on each Compute and Network node. It provides a measure of the physical resources usage on the local node. The data provided by the SNMP daemon on the various Compute and Network nodes is collected by the Ceilometer central agent. The Ceilometer central agent must be told about the various nodes that are capable of providing an SNMP-based usage sample; this mapping is provided in the Ceilometer pipeline definition. A single central agent can collect the data from multiple nodes or multiple central agents can be used to partition the data collection task with each central agent collecting the data for a few nodes. Summary In this article, we learned the Ceilometer components, its installation and configuration. We also learned about monitoring physical network. You can also refer to the following books by Packt Publishing that can help you build your knowledge on OpenStack: • OpenStack Cloud Computing Cookbook, Second edition by Kevin Jackson and Cody Bunch• Learning OpenStack Networking (Neutron) by James Denton• Implementing Cloud Storage with OpenStack Swift by Amar Kapadia, Sreedhar Varma, and Kris Rajana• VMware vCloud Director Cookbook by Daniel Langenhan Resources for Article: Further resources on this subject: Using the OpenStack Dash-board[article] The orchestration service for OpenStack[article] Setting up VPNaaS in OpenStack [article]
Read more
  • 0
  • 0
  • 9247

article-image-troubleshooting
Packt
25 Oct 2013
20 min read
Save for later

Troubleshooting

Packt
25 Oct 2013
20 min read
(For more resources related to this topic, see here.) OpenStack is a complex suite of software that can make tracking down issues and faults quite daunting to beginners and experienced system administrators alike. While there is no single approach to troubleshooting systems, understanding where OpenStack logs vital information or what tools are available to help track down bugs will help resolve issues we may encounter. However, OpenStack like all software will have bugs that we are not able to solve ourselves. In that case, we will show you how gathering the required information so that the OpenStack community can identify bugs and suggest fixes is important in ensuring those bugs or issues are dealt with quickly and efficiently. Understanding logging Logging is important in all computer systems, but the more complex the system, the more you rely on logging to be able to spot problems and cut down on troubleshooting time. Understanding logging in OpenStack is important to ensure your environment is healthy and you are able to submit relevant log entries back to the community to help fix bugs. Getting ready Log in as the root user onto the appropriate servers where the OpenStack services are installed. This makes troubleshooting easier as root privileges are required to view all the logs. How to do it... OpenStack produces a large number of logs that help troubleshoot our OpenStack installations. The following details outline where these services write their logs: OpenStack Compute services logs Logs for the OpenStack Compute services are written to /var/log/nova/, which is owned by the nova user, by default. To read these, log in as the root user (or use sudo privileges when accessing the files). The following is a list of services and their corresponding logs. Note that not all logs exist on all servers. For example, nova-compute.log exists on your compute hosts only: nova-compute: /var/log/nova/nova-compute.log Log entries regarding the spinning up and running of the instances nova-network: /var/log/nova/nova-network.log Log entries regarding network state, assignment, routing, and security groups nova-manage: /var/log/nova/nova-manage.log Log entries produced when running the nova-manage command nova-conductor: /var/log/nova/nova-conductor.log Log entries regarding services making requests for database information nova-scheduler: /var/log/nova/nova-scheduler.log Log entries pertaining to the scheduler, its assignment of tasks to nodes, and messages from the queue nova-api: /var/log/nova/nova-api.log Log entries regarding user interaction with OpenStack as well as messages regarding interaction with other components of OpenStack nova-cert: /var/log/nova/nova-cert.log Entries regarding the nova-cert process nova-console: /var/log/nova/nova-console.log Details about the nova-console VNC service nova-consoleauth: /var/log/nova/nova-consoleauth.log Authentication details related to the nova-console service nova-dhcpbridge: /var/log/nova/nova-dhcpbridge.log Network information regarding the dhcpbridge service OpenStack Dashboard logs OpenStack Dashboard (Horizon) is a web application that runs through Apache by default, so any errors and access details will be in the Apache logs. These can be found in /var/log/apache2/*.log, which will help you understand who is accessing the service as well as the report on any errors seen with the service. OpenStack Storage logs OpenStack Object Storage (Swift) writes logs to syslog by default. On an Ubuntu system, these can be viewed in /var/log/syslog. On other systems, these might be available at /var/log/messages. The OpenStack Block Storage service, Cinder, will produce logs in /var/log/cinder by default. The following list is a breakdown of the log files: cinder-api: /var/log/cinder/cinder-api.log Details about the cinder-api service cinder-scheduler: /var/log/cinder-scheduler.log Details related to the operation of the Cinder scheduling service cinder-volume: /var/log/cinder/cinder-volume.log Log entries related to the Cinder volume service OpenStack Identity logs The OpenStack Identity service, Keystone, writes its logging information to /var/log/keystone/keystone.log. Depending on how you have Keystone configured, the information in this log file can be very sparse to extremely verbose including complete plaintext requests. OpenStack Image Service logs The OpenStack Image Service Glance stores its logs in /var/log/glance/*.log with a separate log file for each service. The following is a list of the default log files: api: /var/log/glance/api.log Entries related to the glance API registry: /var/log/glance/registry.log Log entries related to the Glance registry service. Things like metadata updates and access will be stored here depending on your logging configuration. OpenStack Network Service logs OpenStack Networking Service, formerly Quantum, now Neutron, stores its log files in /var/log/quantum/*.log with a separate log file for each service. The following is a list of the corresponding logs: dhcp-agent: /var/log/quantum/dhcp-agent.log Log entries pertaining to the dhcp-agent l3-agent: /var/log/quantum/l3-agent.log Log entries related to the l3 agent and its functionality metadata-agent: /var/log/quantum/metadata-agent.log This file contains log entries related to requests Quantum has proxied to the Nova metadata service. openvswitch-agent: /var/log/quantum/openvswitch-agent.log Entries related the the operation of Open vSwitch. When implementing OpenStack Networking, if you use a different plugin, its log file will be named accordingly. server: /var/log/quantum/server.log Details and entries related to the quantum API service OpenVSwitch Server: /var/log/openvswitch/ovs-vswitchd.log Details and entries related to the OpenVSwitch Switch Daemon Changing log levels By default each OpenStack service has a sane level of logging, which is determined by the level set as Warning. That is, it will log enough information to provide you the status of the running system as well as some basic troubleshooting information. However, there will be times that you need to adjust the logging verbosity either up or down to help diagnose an issue or reduce logging noise. As each service can be configured similarly, we will show you how to make these changes on the OpenStack Compute service. Log-level settings in OpenStack Compute services To do this, log into the box where the OpenStack Compute service is running and execute the following commands: sudo vim /etc/nova/logging.conf Change the following log levels to either DEBUG, INFO or WARNING in any of the services listed: Log-level settings in other OpenStack services Other services such as Glance and Keystone currently have their log-level settings within their main configuration files such as /etc/glance/glance-api.conf. Adjust the log levels by altering the following lines to achieve INFO or DEBUG levels: Restart the relevant service to pick up the log-level change. How it works... Logging is an important activity in any software, and OpenStack is no different. It allows an administrator to track down problematic activity that can be used in conjunction with the community to help provide a solution. Understanding where the services log and managing those logs to allow someone to identify problems quickly and easily are important. Checking OpenStack services OpenStack provides tools to check on its services. In this section, we'll show you how to check the operational status of these services. We will also use common system commands to check whether our environment is running as expected. Getting ready To check our OpenStack Compute host, we must log into that server, so do this now before following the given steps. How to do it... To check that OpenStack Compute is running the required services, we invoke the nova-manage tool and ask it various questions about the environment, as follows: Checking OpenStack Compute Services To check our OpenStack Compute services, issue the following command: sudo nova-manage service list You will see an output similar to the following. The :-) indicates that everything is fine. nova-manage service list The fields are defined as follows: Binary: This is the name of the service that we're checking the status of. Host: This is name of the server or host where this service is running. Zone: This refers to the OpenStack Zone that is running that service. A zone can run different services. The default zone is called nova. Status: This states whether or not an administrator has enabled or disabled that service. State: This refers to whether that running service is working or not. Updated_At: This indicates when that service was last checked. If OpenStack Compute has a problem, you will see XXX in place of :-). The following command shows the same: nova-compute compute.book nova enabled XXX 2013-06-18 16:47:35 If you do see XXX, the answer to the problem will be in the logs at /var/log/nova/. If you get intermittent XXX and :-) for a service, first check whether the clocks are in sync. OpenStack Image Service (Glance) The OpenStack Image Service, Glance, while critical to the ability of OpenStack to provision new instances, does not contain its own tool to check the status of the service. Instead, we rely on some built-in Linux tools. OpenStack Image Service (Glance) doesn't have a tool to check its running services, so we can use some system commands instead, as follows: ps -ef | grep glance netstat -ant | grep 9292.*LISTEN These should return process information for Glance to show it's running, and 9292 is the default port that should be open in the LISTEN mode on your server, which is ready for use. The output of these commands will be similar to the following: ps -ef | grep glance This produces output like the following: To check if the correct port is in use, issue the following command: netstat -ant | grep 9292 tcp 0 0 0.0.0.0:9292 0.0.0.0:* LISTEN Other services that you should check Should Glance be having issues while the above services are in working order, you will want to check the following services as well: rabbitmq: For rabbitmq, run the following command: sudo rabbitmqctl status For example, output from rabbitmqctl (when everything is running OK) should look similar to the following screenshot: If rabbitmq isn't working as expected, you will see output similar to the following indicating that the rabbitmq service or node is down: ntp: For ntp (Network Time Protocol, for keeping nodes in time-sync), run the following command: ntpq -p ntp is required for multi-host OpenStack environments but it may not be installed by default. Install the ntp package with sudo apt-get install -y ntp) This should return output regarding contacting NTP servers, for example: MySQL Database Server: For MySQL Database Server, run the following commands: PASSWORD=openstack mysqladmin -uroot –p$PASSWORD status This will return some statistics about MySQL, if it is running, as shown in the following screenshot: Checking OpenStack Dashboard (Horizon) Like the Glance Service, the OpenStack Dashboard service, Horizon, does not come with a built-in tool to check its health. Horizon, despite not having a built-in utility to check service health, does rely on the Apache web server to serve pages. To check the status of the service then, we check the health of the web service. To check the health of the Apache web service, log into the server running Horizon and execute the following command: ps -ef | grep apache This command produces output like the following screenshot: To check that Apache is running on the expected port, TCP Port 80, issue the following command: netstat -ano | grep :80 This command should show the following output: tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN off (0.00/0/0) To test access to the web server from the command line issue the following command: telnet localhost 80 This command should show the following output: Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Checking OpenStack Identity (Keystone) Keystone comes with a client side implementation called the python-keystone client. We use this tool to check the status of our Keystone services. To check that Keystone is running the required services, we invoke the keystone command: # keystone user-list This produces output like the following screenshot: Additionally, you can use the following commands to check the status of Keystone. The following command checks the status of the service: # ps -ef | grep keystone This should show output similar to the following: keystone 5441 1 0 Jun20 ? 00:00:04 /usr/bin/python /usr/bin/keystone-all Next you can check that the service is listening on the network. The following command can be used: netstat -anlp | grep 5000 This command should show output like the following: tcp 0 0 0.0.0.0:5000 0.0.0.0: LISTEN 54421/python Checking OpenStack Networking (Neutron) When running the OpenStack Networking service, Neutron, there are a number of services that should be running on various nodes. These are depicted in the following diagram: On the Controller node, check the Quantum Server API service is running on TCP Port 9696 as follows: sudo netstat -anlp | grep 9696 The command brings back output like the following: tcp 0 0 0.0.0.0:9696 0.0.0.0:* LISTEN 22350/python On the Compute nodes, check the following services are running using the ps command: ovsdb-server ovs-switchd quantum-openvswitch-agent For example, run the following command: ps -ef | grep ovsdb-server On the Network node, check the following services are running: ovsdb-server ovs-switchd quantum-openvswitch-agent quantum-dhcp-agent quantum-l3-agent quantum-metadata-agent To check our Neutron agents are running correctly, issue the following command from the Controller host when you have the correct OpenStack credentials sourced into your environment: quantum agent-list This will bring back output like the following screenshot when everything is running correctly: Checking OpenStack Block Storage (Cinder) To check the status of the OpenStack Block Storage service, Cinder, you can use the following commands: Use the following command to check if Cinder is running: ps -ef | grep cinder This command produces output like the following screenshot: Use the following command to check if iSCSI target is listening: netstat -anp | grep 3260 This command produces output like the following: tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 10236/tgtd Use the following command to check that the Cinder API is listening on the network: netstat -an | grep 8776 This command produces output like the following: tcp 0 0.0.0.0:8776 0.0.0.0:* LISTEN To validate the operation of the Cinder service, if all of the above is functional, you can try to list the volumes Cinder knows about using the following: cinder list This produces output like the following: Checking OpenStack Object Storage (Swift) The OpenStack Object Storage service, Swift, has a few built-in utilities that allow us to check its health. To do so, log into your Swift node and run the following commands: Use the following command for checking the Swift Service Using Swift Stat: swift stat This produces output like the following: Using PS: There will be a service for each configured container, account, object-store. ps -ef | grep swift This should produce output like the following screenshot: Use the following command for checking the Swift API: ps -ef | grep swift-proxy This should produce the following screenshot: Use the following command for checking if Swift is listening on the network: netstat -anlp | grep 8080 This should produce output like the following: tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 9818/python How it works... We have used some basic commands that communicate with OpenStack services to show they're running. This elementary level of checking helps with troubleshooting our OpenStack environment. Troubleshooting OpenStack Compute services OpenStack Compute services are complex, and being able to diagnose faults is an essential part of ensuring the smooth running of the services. Fortunately, OpenStack Compute provides some tools to help with this process, along with tools provided by Ubuntu to help identify issues. How to do it... Troubleshooting OpenStack Compute services can be a complex issue, but working through problems methodically and logically will help you reach a satisfactory outcome. Carry out the following suggested steps when encountering the different problems presented. Steps for when you cannot ping or SSH to an instance When launching instances, we specify a security group. If none is specified, a security group named default is used. These mandatory security groups ensure security is enabled by default in our cloud environment, and as such, we must explicitly state that we require the ability to ping our instances and SSH to them. For such a basic activity, it is common to add these abilities to the default security group. Network issues may prevent us from accessing our cloud instances. First, check that the compute instances are able to forward packets from the public interface to the bridged interface. Use the following command for the same: sysctl -A | grep ip_forward net.ipv4.ip_forward should be set to 1. If it isn't, check that /etc/sysctl.conf has the following option uncommented. Use the following command for it: net.ipv4.ip_forward=1 Then, run to following command to pick up the change: sudo sysctl -p Other network issues could be routing issues. Check that we can communicate with the OpenStack Compute nodes from our client and that any routing to get to these instances has the correct entries. We may have a conflict with IPv6, if IPv6 isn't required. If this is the case, try adding --use_ipv6=false to your /etc/nova/nova.conf file, and restart the nova-compute and nova-network services. We may also need to disable IPv6 in the operating system, which can be achieved using something like the following line in /etc/modprobe.d/ipv6.conf: install ipv6 /bin/true If using OpenStack Neutron, check the status of the neutron services on the host and the correct IP namespace is being used (see Troubleshooting OpenStack Networking). Reboot your host. Methods for viewing the Instance Console log When using the command line, issue the following commands: nova list nova console-log INSTANCE_ID For example: nova console-log ee0cb5ca-281f-43e9-bb40-42ffddcb09cd When using Horizon, carry out the following steps: Navigate to the list of instance and select an instance. You will be taken to an Overview screen. Along the top of the Overview screen is a Log tab. This is the console log for the instance. When viewing the logs directly on a nova-compute host, look for the following file: The console logs are owned by root, so only an administrator can do this. They are placed at: var/lib/nova/instances/<instance_id>/console.log. Instance fails to download meta information If an instance fails to communicate to download the extra information that can be supplied to the instance meta-data, we can end up in a situation where the instance is up but you're unable to log in, as the SSH key information is injected using this method. Viewing the console log will show output like in the following screenshot: If you are not using Neutron, ensure the following: nova-api is running on the Controller host (in a multi_host environment, ensure there's a nova-api-metadata and a nova-network package installed and running on the Compute host). Perform the following iptables check on the Compute node: sudo iptables -L -n -t nat We should see a line in the output like in the following screenshot: If not, restart your nova-network services and check again. Sometimes there are multiple copies of dnsmasq running, which can cause this issue. Ensure that there is only one instance of dnsmasq running: ps -ef | grep dnsmasq This will bring back two process entries, the parent dnsmasq process and a spawned child (verify by the PIDs). If there are any other instances of dnsmasq running, kill the dnsmasq processes. When killed, restart nova-network, which will spawn dnsmasq again without any conflicting processes. If you are using Neutron: The first place to look is in the /var/log/quantum/metadata_agent.log on the Network host. Here you may see Python stack traces that could indicate a service isn't running correctly. A connection refused message may appear here suggesting the metadata agent running on the Network host is unable to talk to the Metadata service on the Controller host via the Metadata Proxy service (also running on the Network host). The metadata service runs on port 8775 on our Controller host, so checking that is running involves checking the port is open and it's running the metadata service. To do this on the Controller host, run the following: sudo netstat -antp | grep 8775 This will bring back the following output if everything is OK: tcp 0 0 0.0.0.0:8775 0.0.0.0:* LISTEN If nothing is returned, check that the nova-api service is running and if not, start it. Instance launches; stuck at Building or Pending Sometimes, a little patience is needed before assuming the instance has not booted, because the image is copied across the network to a node that has not seen the image before. At other times though, if the instance has been stuck in booting or a similar state for longer than normal, it indicates a problem. The first place to look will be for errors in the logs. A quick way of doing this is from the controller server and by issuing the following command: sudo nova-manage logs errors A common error that is usually present is usually related to AMQP being unreachable. Generally, these errors can be ignored unless, that is, you check the time stamp and these errors are currently appearing. You tend to see a number of these messages related to when the services first started up so look at the timestamp before reaching conclusions. This command brings back any log line with the ERROR as log level, but you will need to view the logs in more detail to get a clearer picture. A key log file, when troubleshooting instances that are not booting properly, will be available on the controller host at /var/log/nova/nova-scheduler.log. This file tends to produce the reason why an instance is stuck in Building state. Another file to view further information will be on the compute host at /var/log/nova/nova-compute.log. Look here at the time you launch the instance. In a busy environment, you will want to tail the log file and parse for the instance ID. Check /var/log/nova/nova-network.log (for Nova Network) and /var/log/quantum/*.log (for Neutron) for any reason why instances aren't being assigned IP addresses. It could be issues around DHCP preventing address allocation or quotas being reached. Error codes such as 401, 403, 500 The majority of the OpenStack services are web services, meaning the responses from the services are well defined. 40X: This refers to a service that is up but responding to an event that is produced by some user error. For example, a 401 is an authentication failure, so check the credentials used when accessing the service. 500: These errors mean a connecting service is unavailable or has caused an error that has caused the service to interpret a response to cause a failure. Common problems here are services that have not started properly, so check for running services. If all avenues have been exhausted when troubleshooting your environment, reach out to the community, using the mailing list or IRC, where there is a raft of people willing to offer their time and assistance. See the Getting help from the community recipe at the end of this article for more information. Listing all instances across all hosts From the OpenStack controller node, you can execute the following command to get a list of the running instances in the environment: sudo nova-manage vm list To view all instances across all tenants, as a user with an admin role execute the following command: nova list --all-tenants These commands are useful in identifying any failed instances and the host on which it is running. You can then investigate further. How it works... Troubleshooting OpenStack Compute problems can be quite complex, but looking in the right places can help solve some of the more common problems. Unfortunately, like troubleshooting any computer system, there isn't a single command that can help identify all the problems that you may encounter, but OpenStack provides some tools to help you identify some problems. Having an understanding of managing servers and networks will help troubleshoot a distributed cloud environment such as OpenStack. There's more than one place where you can go to identify the issues, as they can stem from the environment to the instances themselves. Methodically working your way through the problems though will help lead you to a resolution.
Read more
  • 0
  • 0
  • 9162
article-image-apache-cloudstack-architecture
Packt
17 Jun 2013
17 min read
Save for later

Apache CloudStack Architecture

Packt
17 Jun 2013
17 min read
(For more resources related to this topic, see here.) Introducing cloud Before embarking on a journey to understand and appreciate CloudStack, let's revisit the basic concepts of cloud computing and how CloudStack can help us in achieving our private, public, or hybrid cloud objectives. Let's start this article with a plain and simple definition of cloud. Cloud is a shared multi-tenant environment built on a highly efficient, highly automated, and preferably virtualized IT infrastructure where IT resources can be provisioned on demand from anywhere over a broad network, and can be metered. Virtualization is the technology that has made the enablement of these features simpler and convenient. A cloud can be deployed in various models; including private, public, community or hybrid clouds. These deployment models can be explained as follows: Private cloud: In this deployment model, the cloud infrastructure is operated solely for an organization and may exist on premise or off premise. It can be managed by the organization or a third-party cloud provider. Public cloud: In this deployment model, the cloud service is provided to the general public or a large industry group, and is owned and managed by the organization providing cloud services. Community cloud: In this deployment model, the cloud is shared by multiple organizations and is supported by a specific community that has shared concerns. It can be managed by the organization or a third party provider, and can exist on premise or off premise. Hybrid cloud: This deployment model comprises two or more types of cloud (public, private, or community) and enables data and application portability between the clouds. A cloud—be it private, public, or hybrid—has the following essential characteristics: On-demand self service Broad network access Resource pooling Rapid elasticity or expansion Measured service Shared by multiple tenants Cloud has three possible service models, which means there are three types of cloud services that can be provided. They are: Infrastructure as a service (IaaS): This type of cloud service model provides IT infrastructure resources as a service to the end users. This model provides the end users with the capability to provision processing, storage, networks, and other fundamental computing resources that the customer can use to run arbitrary software including operating systems and applications. The provider manages and controls the underlying cloud infrastructure and the user has control over the operating systems, storage and deployed applications. The user may also have some control over the networking services. Platform as a service (PaaS): In this service model, the end user is provided with a platform that is provisioned over the cloud infrastructure. The provider manages the network, operating system, or storage and the end user has control over the applications and may have control over the hosting environment of the applications. Software as a service (SaaS): This layer provides software as a service to the end users, such as providing an online calculation engine for their end users. The end users can access these software using a thin client interface such as a web browser. The end users do not manage the underlying cloud infrastructure such as network, servers, OS, storage, or even individual application capabilities but may have some control over the application configurations settings. As depicted in the preceding diagram, the top layers of cloud computing are built upon the layer below it. In this book, we will be mainly dealing with the bottom layer—Infrastructure as a service. Thus providing Infrastructure as a Service essentially means that the cloud provider assembles the building blocks for providing these services, including the computing resources hardware, networking hardware and storage hardware. These resources are exposed to the consumers through a request management system which in turn is integrated with an automated provisioning layer. The cloud system also needs to meter and bill the customer on various chargeback models. The concept of virtualization enables the provider to leverage and pool resources in a multi-tenant model. Thus, the features provided by virtualization resource pooling, combined with modern clustering infrastructure, enable efficient use IT resources to provide high availability and scalability, increase agility, optimize utilization, and provide a multi-tenancy model. One can easily get confused about the differences between the cloud and a virtualized Datacenter; well, there are many differences, such as: The cloud is the next stage after the virtualization of datacenters. It is characterized by a service layer over the virtualization layer. Instead of bare computing resources, services are built over the virtualization platforms and provided to the users. Cloud computing provides the request management layer, provisioning layer, metering and billing layers along with security controls and multi-tenancy. Cloud resources are available to consumers on an on demand model wherein the resources can be provisioned and de-provisioned on an as needed basis. Cloud providers typically have huge capacities to serve variable workloads and manage variable demand from customers. Customers can leverage the scaling capabilities provided by cloud providers to scale up or scale down the IT infrastructure needed by the application and the workload. This rapid scaling helps the customer save money by using the capacity only when it is needed. The resource provisioning in the cloud is governed by policies and rules, and the process of provisioning is automated. Metering, Chargeback, and Billing are essential governance characteristics of any cloud environment as they govern and control the usage of precious IT resources. Thus setting up a cloud is basically building capabilities to provide IT resources as a service in a well-defined manner. Services can be provided to end users in various offerings, depending upon the amount of resources each service offering provides. The amount of resources can be broken down to multiple resources such as the computing capacity, memory, storage, network bandwidth, storage IOPS, and so on. A cloud provider can provide and meter multiple service offerings for the end users to choose from. Though the cloud provider makes upfront investments in creating the cloud capacity, however from a consumer's point of view the resources are available on demand on a pay per use model. Thus the customer gets billed for consumption just like in case of electricity or telecom services that individuals use. The billing may be based on hours of compute usage, the amount of storage used, bandwidth consumed, and so on. Having understood the cloud computing model, let's look at the architecture of a typical Infrastructure as a Service cloud environment. Infrastructure layer The Infrastructure layer is the base layer and comprises of all the hardware resources upon which IT is built upon. These include computing resources, storage resources, network resources, and so on. Computing resources Virtualization is provided using a hypervisor that has various functions such as enabling the virtual machines of the hosts to interact with the hardware. The physical servers host the hypervisor layer. The physical server resources are accessed through the hypervisor. The hypervisor layer also enables access to the network and storage. There are various hypervisors on the market such as VMware, Hyper-V, XenServer, and so on. These hypervisors are responsible for making it possible for one physical server to host multiple machines, and for enabling resource pooling and multi tenancy. Storage Like the Compute capacity, we need storage which is accessible to the Compute layer. The Storage in cloud environments is pooled just like the Compute and accessed through the virtualization layer. Certain types of services just offer storage as a service where the storage can be programmatically accessed to store and retrieve objects. Pooled, virtualized storage is enabled through technologies such as Network Attached Storage (NAS) and Storage Area Network (SAN) which helps in allowing the infrastructure to allocate storage on demand that can be based on policy, that is, automated. The storage provisioning using such technologies helps in providing storage capacity on demand to users and also enables the addition or removal of capacity as per the demand. The cost of storage can be differentiated according to the different levels of performance and classes of storage. Typically, SAN is used for storage capacity in the cloud where statefulness is required. Direct-attached Storage (DAS) can be used for stateless workloads that can drive down the cost of service. The storage involved in cloud architecture can be redundant and prevent the single point of failure. There can be multiple paths for the access of disk arrays to provide redundancy in case connectivity failures. The storage arrays can also be configured in a way that there is incremental backup of the allocated storage. The storage should be configured such that health information of the storage units is updated in the system monitoring service, which ensures that the outage and its impact are quickly identified and appropriate action can be taken in order to restore it to its normal state. Networks and security Network configuration includes defining the subnets, on-demand allocation of IP addresses, and defining the network routing tables to enable the flow of data in the network. It also includes enabling high availability services such as load balancing. Whereas the security configuration aims to secure the data flowing in the network that includes isolation of data of different tenants among each other and with the management data of cloud using techniques such as network isolation and security groups. Networking in the cloud is supposed to deal with the isolation of resources between multiple tenants as well as provide tenants with the ability to create isolated components. Network isolation in the cloud can be done using various techniques of network isolation such as VLAN, VXLAN, VCDNI, STT, or other such techniques. Applications are deployed in a multi-tenant environment and consist of components that are to be kept private, such as a database server which is to be accessed only from selected web servers and any other traffic from any other source is not permitted to access it. This is enabled using network isolation, port filtering, and security groups. These services help with segmenting and protecting various layers of application deployment architecture and also allow isolation of tenants from each other. The provider can use security domains, layer 3 isolation techniques to group various virtual machines. The access to these domains can be controlled using providers' port filtering capabilities or by the usage of more stateful packet filtering by implementing context switches or firewall appliances. Using network isolation techniques such as VLAN tagging and security groups allows such configuration. Various levels of virtual switches can be configured in the cloud for providing isolation to the different networks in the cloud environment. Networking services such as NAT, gateway, VPN, Port forwarding, IPAM systems, and access control management are used in the cloud to provide various networking services and accessibility. Some of these services are explained as follows: NAT: Network address translation can be configured in the environment to allow communication of a virtual machine in private network with some other machine on some other network or on the public Internet. A NAT device allows the modification of IP address information in the headers of IP packets while they are transformed from a routing device. A machine in a private network cannot have direct access to the public network so in order for it to communicate to the Internet, the packets are sent to a routing device or a virtual machine with NAT configured which has direct access to the Internet. NAT modifies the IP packet header so that the private IP address of the machine is not visible to the external networks. IPAM System/DHCP: An IP address management system or DHCP server helps with the automatic configuration of IP addresses to the virtual machines according to the configuration of the network and the IP range allocated to it. A virtual machine provisioned in a network can be assigned an IP address as per the user or is assigned an IP address from the IPAM. IPAM stores all the available IP addresses in the network and when a new IP address is to be allocated to a device, it is taken from the available IP pool, and when a device is terminated or releases the IP address, the address is given back to the IPAM system. Identity and access management: A access control list describes the permissions of various users on different resources in the cloud. It is important to define an access control list for users in a multi-tenant environment. It helps in restricting actions that a user can perform on any resource in the cloud. A role-based access mechanism is used to assign roles to users' profile which describes the roles and permissions of users on different resources. Use of switches in cloud A switch is a LAN device that works at the data link layer (layer 2) of the OSI model and provides multiport bridge. Switches store a table of MAC addresses and ports. Let us see the various types of switches and their usage in the cloud environment: Layer 3 switches: A layer-3 switch is a special type of switch which operates at layer 3—the Network layer of the OSI model. It is a high performance device that is used for network routing. A layer-3 switch has a IP routing table for lookups and it also forms a broadcast domain. Basically, a layer-3 switch is a switch which has a router's IP routing functionality built in. A layer-3 switch is used for routing and is used for better performance over routers. The layer-3 switches are used in large networks like corporate networks instead of routers. The performance of the layer-3 switch is better than that of a router because of some hardware-level differences. It supports the same routing protocols as network routers do. The layer-3 switch is used above the layer-2 switches and can be used to configure the routing configuration and the communication between two different VLANs or different subnets. Layer 4-7 switches: These switches use the packet information up to OSI layer 7 and are also known as content switches, web-switches, or application switches. These types of switches are typically used for load balancing among a group of servers which can be performed on HTTP, HTTPS, VPN, or any TCP/IP traffic using a specific port. These switches are used in the cloud for allowing policy-based switching—to limit the different amount of traffic on specific end-user switch ports. It can also be used for prioritizing the traffic of specific applications. These switches also provide forwarding decision making like NAT services and also manages the state of individual sessions from beginning to end thus acting like firewalls. In addition, these switches are used for balancing traffic across a cluster of servers as per the configuration of the individual session information and status. Hence these types of switches are used above layer-3 switches or above a cluster of servers in the environment. They can be used to forward packets as per the configuration such as transferring the packets to a server that is supposed to handle the requests and this packet forwarding configuration is generally based on the current server loads or sticky bits that binds the session to a particular server. Layer-3 traffic isolation provides traffic isolation across layer-3 devices. It's referred to as Virtual Routing and Forwarding (VRF). It virtualizes the routing table in a layer-3 switch and has set of virtualized tables for routing. Each table has a unique set of forwarding entries. Whenever traffic enters, it is forwarded using the routing table associated with the same VRF. It enables logical isolation of traffic as it crosses a common physical network infrastructure. VRFs provide access control, path isolation, and shared services. Security groups are also an example of layer-3 isolation capabilities which restricts the traffic to the guests based on the rules defined. The rules are defined based on the port, protocol, and source/destination of the traffic. Virtual switches: The virtual switches are software program that allows one guest VM to communicate with another and is similar to the Ethernet switch explained earlier. Virtual switches provide a bridge between the virtual NICs of the guest VMs and the physical NIC of the host. Virtual switches have port groups on one side which may or may not be connected to the different subnets. There are various types of virtual switches used with various virtualization technologies such as VMware Vswitch, Xen, or Open Vswitch. VMware also provides a distributed virtual switch which spans multiple hosts. The virtual switches consists of port groups at one end and an uplink at the other. The port groups are connected to the virtual machines and the uplink is mapped to the physical NIC of the host. The virtual switches function as a virtual switch over the hypervisor layer on the host. Management layer The Management layer in a cloud computing space provides management capabilities to manage the cloud setup. It provides features and functions such as reporting, configuration for the automation of tasks, configuration of parameters for the cloud setup, patching, and monitoring of the cloud components. Automation The cloud is a highly automated environment and all tasks such as provisioning the virtual machine, allocation of resources, networking, and security are done in a self-service mode through automated systems. The automation layer in cloud management software is typically exposed through APIs. The APIs allow the creation of SDKs, scripts, and user interfaces. Orchestration The Orchestration layer is the most critical interface between the IT organization and its infrastructure, and helps in the integration of the various pieces of software in the cloud computing platform. Orchestration is used to join together various individual tasks which are executed in a specified sequence with exception handling features. Thus a provisioning task for a virtual machine may involve various commands or scripts to be executed. The orchestration engine binds these individual tasks together and creates a provisioning workflow which may involve provisioning a virtual machine, adding it to your DNS, assigning IP Addresses, adding entries in your firewall and load balancer, and so on. The orchestration engine acts as an integration engine and also provides the capabilities to run an automated workflow through various subsystems. As an example, the service request to provision cloud resources may be sent to an orchestration engine which then talks to the cloud capacity layer to determine the best host or cluster where the workload can be provisioned. As a next step, the orchestration engine chooses the component to call to provision the resources. The orchestration platform helps in easy creation of complex workflows and also provides ease of management since all integrations are handled by a specialized orchestration engine and provide loose coupling. The orchestration engine is executed in the cloud system as an asynchronous job scheduler which orchestrates the service APIs to fulfill and execute a process. Task Execution The Task execution layer is at the lower level of the management operations that are performed using the command line or any other interface. The implementation of this layer can vary as per the platform on which the execution takes place. The activity of this layer is activated by the layers above in the management layer. Service Management The Service Management layer helps in compliance and provides means to implement automation and adapts IT service management best practices as per the policies of the organization, such as the IT Infrastructure Library (ITIL). This is used to build processes to implement different types of incident resolutions and also provide change management. The self service capability in cloud environment helps in providing users with a self-service catalog which consists of various service options that the user can request and provision resources from the cloud. The service layer can be comprised of various levels of services such as basic provisioning of virtual machines with some predefined templates/configuration, or can be of an advanced level with various options for provisioning servers with configuration options as well.
Read more
  • 0
  • 0
  • 8772

article-image-introduction-microsoft-azure-cloud-services
Packt
04 Jun 2015
10 min read
Save for later

Introduction to Microsoft Azure Cloud Services

Packt
04 Jun 2015
10 min read
In this article by Gethyn Ellis, author of the book Microsoft Azure IaaS Essentials, we will understand cloud computing and the various services offered by it. (For more resources related to this topic, see here.) Understanding cloud computing What do we mean when we talk about cloud from an information technology perspective? People mention cloud services; where do we get the services from? What services are offered? The Wikipedia definition of cloud computing is as follows: "Cloud computing is a computing term or metaphor that evolved in the late 1990s, based on utility and consumption of computer resources. Cloud computing involves application systems which are executed within the cloud and operated through internet enabled devices. Purely cloud computing does not rely on the use of cloud storage as it will be removed upon users download action. Clouds can be classified as public, private and [hybrid cloud|hybrid]." If you have worked with virtualization, then the concept of cloud is not completely alien to you. With virtualization, you can group a bunch of powerful hardware together, using a hypervisor. A hypervisor is a kind of software, operating system, or firmware that allows you to run virtual machines. Some of the popular Hypervisors on the market are VMware ESX or Microsoft's Hyper-V. Then, you can use this powerful hardware to run a set of virtual servers or guests. The guests share the resources of the host in order to execute and provide the services and computing resources of your IT department. The IT department takes care of everything from maintaining the hypervisor hosts to managing and maintaining the virtual servers and guests. The internal IT department does all the work. This is sometimes termed as a private cloud. Third-party suppliers, such as Microsoft, VMware, and Amazon, have a public cloud offering. With a public cloud, some computing services are provided to you on the Internet, and you can pay for what you use, which is like a utility bill. For example, let's take the utilities you use at home. This model can be really useful for start-up business that might not have an accurate demand forecast for their services, or the demand may change very quickly. Cloud computing can also be very useful for established businesses, who would like to make use of the elastic billing model. The more services you consume, the more you pay when you get billed at the end of the month. There are various types of public cloud offerings and services from a number of different providers. The TechNet top ten cloud providers are as follows: VMware Microsoft Bluelock Citrix Joyent Terrmark Salesforce.com Century Link RackSpace Amazon Web Services It is interesting to read that in 2013, Microsoft was only listed ninth in the list. With a new CEO, Microsoft has taken a new direction and put its Azure cloud offering at the heart of the business model. To quote one TechNet 2014 attendee: "TechNet this year was all about Azure, even the on premises stuff was built on the Azure model" With a different direction, it seems pretty clear that Microsoft is investing heavily in its cloud offering, and this will be further enhanced with further investment. This will allow a hybrid cloud environment, a combination of on-premises and public cloud, to be combined to offer organizations that ultimate flexibility when it comes to consuming IT resources. Services offered The term cloud is used to describe a variety of service offerings from multiple providers. You could argue, in fact, that the term cloud doesn't actually mean anything specific in terms of the service that you're consuming. It is, in fact, just a term that means you are consuming an IT service from a provider. Be it an internal IT department in the form of a private cloud or a public offering from some cloud provider, a public cloud, or it could be some combination of both in the form of a hybrid cloud. So, then what are the services that cloud providers offer? Virtualization and on-premises technology Most business even in today's cloudy environment has some on-premises technology. Until virtualization became popular and widely deployed several years ago, it was very common to have a one-to-one relationship between a physical hardware server with its own physical resources, such as CPU, RAM, storage, and the operating system installed on the physical server. It became clear that in this type of environment, you would need a lot of physical servers in your data center. An expanding and sometimes, a sprawling environment brings its own set of problems. The servers need cooling and heat management as well as a power source, and all the hardware and software needs to be maintained. Also, in terms of utilization, this model left lots of resources under-utilized: Virtualization changed this to some extent. With virtualization, you can create several guests or virtual servers that are configured to share the resources of the underlying host, each with their own operating system installed. It is possible to run both a Windows and Linux guest on the same physical host using virtualization. This allows you to maximize the resource utilization and allows your business to get a better return on investment on its hardware infrastructure: Virtualization is very much a precursor to cloud; many virtualized environments are sometimes called private clouds, so having an understanding of virtualization and how it works will give you a good grounding in some of the concepts of a cloud-based infrastructure. Software as a service (SaaS) SaaS is a subscription where you need to pay to use the software for the time that you're using it. You don't own any of the infrastructures, and you don't have to manage any of the servers or operating systems, you simply consume the software that you will be using. You can think of SaaS as like taking a taxi ride. When you take a taxi ride, you don't own the car, you don't need to maintain the car, and you don't even drive the car. You simply tell the taxi driver or his company when and where you want to travel somewhere, and they will take care of getting you there. The longer the trip, that is, the longer you use the taxi, the more you pay. An example of Microsoft's Software as a service would be the Azure SQL Database. The following diagram shows the cloud-based SQL database: Microsoft offers customers a SQL database that is fully hosted and maintained in Microsoft data centers, and the customer simply has to make use of the service and the database. So, we can compare this to having an on-premises database. To have an on-premises database, you need a Windows Server machine (physical or virtual) with the appropriate version of SQL Server installed. The server would need enough CPU, RAM, and storage to fulfill the needs of your database, and you need to manage and maintain the environment, applying various patches to the operating systems as they become available, installing, and testing various SQL Server service packs as they become available, and all the while, your application makes use of the database platform. With the SQL Azure database, you have no overhead, you simply need to connect to the Microsoft Azure portal and request a SQL database by following the wizard: Simply, give the database a name. In this case, it's called Helpdesk, select the service tier you want. In this example, I have chosen the Basic service tier. The service tier will define things, such as the resources available to your database, and impose limits, in terms of database size. With the Basic tier, you have a database size limit of 2 GB. You can specify the server that you want to create your database with, accept the defaults on the other settings, click on the check button, and the database gets created: It's really that simple. You will then pay for what you use in terms of database size and data access. In a later section, you will see how to set up a Microsoft Azure account. Platform as a service (PaaS) With PaaS, you rent the hardware, operating system, storage, and network from the public cloud service provider. PaaS is an offshoot of SaaS. Initially, SaaS didn't take off quickly, possibly because of the lack of control that IT departments and business thought they were going to suffer as a result of using the SaaS cloud offering. Going back to the transport analogy, you can compare PaaS to car rentals. When you rent a car, you don't need to make the car, you don't need to own the car, and you have no responsibility to maintain the car. You do, however, need to drive the car if you are going to get to your required destination. In PaaS terms, the developer and the system administrator have slightly more control over how the environment is set up and configured but still much of the work is taken care of by the cloud service provider. So, the hardware, operating system, and all the other components that run your application are managed and taken care of by the cloud provider, but you get a little more control over how things are configured. A geographically dispersed website would be a good example of an application offered on a PaaS offering. Infrastructure as a service (IaaS) With IaaS, you have much more control over the environment, and everything is customizable. Going with the transport analogy again, you can compare it to buying a car. The service provides you with the car upfront, and you are then responsible for using the car to ensure that it gets you from A to B. You are also responsible to fix the car if something goes wrong, and also ensure that the car is maintained by servicing it regularly, adding fuel, checking the tyre pressure, and so on. You have more control, but you also have more to do in terms of maintenance. Microsoft Azure has an offering. You can deploy a virtual machine, you can specify what OS you want, how much RAM you want the virtual machine to have, you can specify where the server will sit in terms of Microsoft data centers, and you can set up and configure recoverability and high availability for your Azure virtual machine: Hybrid environments With a hybrid environment, you get a combination of on-premises infrastructure and cloud services. It allows you to flexibly add resilience and high availability to your existing infrastructure. It's perfectly possible for the cloud to act as a disaster recovery site for your existing infrastructure. Microsoft Azure In order to work with the examples in this article, you need sign up for a Microsoft account. You can visit http://azure.microsoft.com/, and create an account all by yourself by completing the necessary form as follows: Here, you simply enter your details; you can use your e-mail address as your username. Enter the credentials specified. Return to the Azure website, and if you want to make use of the free trial, click on the free trial link. Currently, you get $125 worth of free Azure services. Once you have clicked on the free trial link, you will have to verify your details. You will also need to enter a credit card number and its details. Microsoft assures that you won't be charged during the free trial. Enter the appropriate details and click on Sign Up: Summary In this article, we looked at and discussed some of the terminology around the cloud. From the services offered to some of the specific features available in Microsoft Azure, you should be able to differentiate between a public and private cloud. You can also now differentiate between some of the public cloud offerings. Resources for Article: Further resources on this subject: Windows Azure Service Bus: Key Features [article] Digging into Windows Azure Diagnostics [article] Using the Windows Azure Platform PowerShell Cmdlets [article]
Read more
  • 0
  • 0
  • 8623

Packt
07 Jul 2016
5 min read
Save for later

AIO setup of OpenStack – preparing the infrastructure code environment

Packt
07 Jul 2016
5 min read
Viewing your OpenStack infrastructure deployment as code will not only simplify node configuration, but also improve the automation process. Despite the existence of numerous system-management tools to bring our OpenStack up and running in an automated way, we have chosen Ansible for automation of our infrastructure. (For more resources related to this topic, see here.) At the end of the day you can choose to use any automation tool that fits your production need, the key point to keep in mind is that to manage a big production environment you must simplify operation by: Automating deployment and operation as much as possible Tracking your changes in a version control system Continuous integration of code to keep you infrastructure updated and bug free Monitoring and testing your infrastructure code to make it robust. We have chosen Git to be our version control system. Let's go ahead and install the Git package on our development system. Check the correctness of the Git installation: If you decide to use IDE like eclipse for you development, it might be easier to install a Git plugin to integrate Git to your IDE. For example, the EGit plugin can be used to develop with Git in Eclipse. We do this by navigating to the Help | Install new software menu entry. You will need to add the following URL to install EGit: http://download.eclipse.org/egit/updates. Preparing the development setup The install process is divided into the following steps: Checkout the OSA repository. Install and bootstrap Ansible. Initial host bootstrap. Run playbooks. Configuring your setup The AIO development environment used the configuration files in the test/roles/bootstrap-host/defaults/main.yml file. This file describes the default values for the host configuration. In addition to the configuration, file the configuration options can be passed through shell environment variables. The BOOTSTRAP_OPTS variable is read by the bootstrap script as a space separated key-value pair. It can be used to pass values to override the default ones in the configuration file: export BOOTSTRAP_OPTS="${BOOTSTRAP_OPTS} bootstrap_host_loopback_cinder_size=512" OSA also allows overriding default values for service configuration. These override values are provides in the etc/openstack_deploy/user_variables.yml file. The following is an example of overriding the values in nova.conf using the override file: nova_nova_conf_overrides: DEFAULT: remove_unused_original_minimum_age_seconds: 43200 libvirt: cpu_mode: host-model disk_cachemodes: file=directsync,block=none database: idle_timeout: 300 max_pool_size: 10 This override file will populate the nova.conf file with the following options: [DEFAULT] remove_unused_original_minimum_age_seconds = 43200 [libvirt] cpu_mode = host-model disk_cachemodes = file=directsync,block=none [database] idle_timeout = 300 max_pool_size = 10 The override variables can also be passed using a per host configuration stanza in /etc/openstack_deploy/openstack_user_config.yml. The complete set of configuration options are described in the OpenStack Ansible documentation at http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-openstack.html. Building the development setup To start the installation process, execute the Ansible bootstrap script. This script will download and install the correct Ansible version. It also creates a wrapper script around ansible-playbook called openstack-ansible that always loads the OpenStack user variable files. Next step is to configure the system for the all-in-one setup. This script does the following tasks: Applies Ansible roles to install the basic software requirements like openSSH and pip. It also applies the bootstrap_host role to check the hard-disk and swap space Create various loopback volumes for use with Cinder, Swift and Nova Prepares networking Finally, we run the playbooks to bring up the AIO development environment. This script will execute the following tasks: Creates the LXC containers Applies security hardening to the host Reinitiates the network bridges Install the infrastructure services like MySQL, RabbitMQ, memcached, and more Finally it installs the various OpenStack services Running the playbooks take a long time to build the containers and start the OpenStack services. Once finished you will have all the OpenStack services running in their private container. You can use the lxc-ls command to list the service containers on the development machine. Use the lxc-attach command to connect to any container as shown here: lxc-attach --name <name_of_container> Use the name of the container from the output of lxc-ls to attach to the container. LXC commands can be used to star and stop the service containers. The AIO environment brings MySQL cluster, which needs special care to start the MySQL cluster if the development machine is rebooted. Details of operating the AIO environment are available in the OpenStack Ansible QuickStart guide at http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html. Tracking your changes The OSA project itself is maintains its code under version control at the OpenStack git server (http://git.openstack.org/cgit/openstack/openstack-ansible/tree/). The configuration files of OSA are stored at /etc/openstack_ansible/ on the deployment host. These files define the deployment environment and the user override variables. To make sure that you control the deployment environment it is important that the changes to these configuration files are tracked in a version control system. To make sure that you track the development environment make sure that the Vagrant configuration files are also tracked in version control system. Summary So far, we’ve deployed a basic AIO setup of OpenStack. Mastering OpenStack Second Edition will take you through the process of extending our design by clustering, defining the various infrastructure nodes, controller, and compute hosts. Resources for Article: Further resources on this subject: Concepts for OpenStack[article] Introducing OpenStack Trove[article] OpenStack Performance, Availability[article]
Read more
  • 0
  • 0
  • 8467
article-image-digging-windows-azure-diagnostics
Packt
11 Aug 2011
14 min read
Save for later

Digging into Windows Azure Diagnostics

Packt
11 Aug 2011
14 min read
Diagnostic data can be used to identify problems with a hosted service. The ability to view the data from several sources and across different instances eases the task of identifying a problem. Diagnostic data can be used to identify when service capacity is either too high or too low for the expected workload. This can guide capacity decisions such as whether to scale up or down the number of instances. The configuration of Windows Azure Diagnostics is performed at the instance level. The code to do that configuration is at the role level, but the diagnostics configuration for each instance is stored in individual blobs in a container named wad-control-container located in the storage service account configured for Windows Azure Diagnostics. Read more: Windows Azure Diagnostics: Initializing the Configuration and Using a Configuration File There is no need for application data and diagnostics data to be located in the same storage service account. Indeed, a best practice from both security and performance perspectives would be to host application data and diagnostic data in separate storage service accounts. The configuration of Windows Azure Diagnostics is centered on the concept of data buffers with each data buffer representing a specific type of diagnostic information. Some of the data buffers have associated data sources which represent a further refining of the data captured and persisted. For example, the performance counter data buffer has individual data sources for each configured performance counter. Windows Azure Diagnostics supports record-based data buffers that are persisted to Windows Azure tables and file-based data buffers that are persisted to Windows Azure blobs. In the Accessing data persisted to Windows Azure Storage recipe we see that we can access the diagnostic data in the same way we access other data in Windows Azure storage. Windows Azure Diagnostics supports the following record-based data buffers: Windows Azure basic logs Performance counters Windows Event Logs Windows Azure Diagnostic infrastructure logs The Windows Azure basic logs data buffer captures information written to a Windows Azure trace listener. In the Using the Windows Azure Diagnostics trace listener recipe, we see how to configure and use the basic logs data buffer. The performance counters data buffer captures the data of any configured performance counters. The Windows Event Logs data buffer captures the events form any configured Windows Event Log. The Windows Azure Diagnostic infrastructure logs data buffer captures diagnostic data produced by the Windows Azure Diagnostics process. Windows Azure Diagnostics supports the following file-based data sources for the Directories data buffer: IIS logs IIS Failed Request Logs Crash dumps Custom directories The Directories data buffer copies new files in a specified directory to blobs in a specified container in the Windows Azure Blob Service. The data captured by IIS Logs, IIS Failed Request Logs, and crash dumps is self-evident. With the custom directories data source, Windows Azure Diagnostics supports the association of any directory on the instance with a specified container in Windows Azure storage. This allows for the coherent integration of third-party logs into Windows Azure Diagnostics. We see how to do this in the Implementing custom logging recipe. The implementation of Windows Azure Diagnostics was changed in Windows Azure SDK v1.3 and it is now one of the pluggable modules that have to be explicitly imported into a role in the service definition file. As Windows Azure Diagnostics persists both its configuration and data to Windows Azure storage, it is necessary to specify a storage service account for diagnostics in the service configuration file. The default configuration for Windows Azure Diagnostics captures some data but does not persist it. Consequently, the diagnostics configuration should be modified at role startup. In the Initializing the configuration of Windows Azure Diagnostics recipe, we see how to do this programmatically, which is the normal way to do it. In the Using a configuration file with Windows Azure Diagnostics recipe, we see how to use a configuration file to do this, which is necessary in a VM role. In normal use, diagnostics data is captured all the time and is then persisted to the storage service according to some schedule. In the event of a problem, it may be necessary to persist diagnostics data before the next scheduled transfer time. We see how to do this in the Performing an on-demand transfer recipe. Both Microsoft and Cerebrata have released PowerShell cmdlets that facilitate the remote administration of Windows Azure Diagnostics. We see how to do this in the Using the Windows Azure Platform PowerShell cmdlets to configure Windows Azure Diagnostics recipe. There are times, especially early in the development process, when non-intrusive diagnostics monitoring is not sufficient. In the Using IntelliTrace to Diagnose Problems with a Hosted Service recipe, we see the benefits of intrusive monitoring of a Windows Azure role instance. Using the Windows Azure Diagnostics trace listener Windows Azure Diagnostics supports the use of Trace to log messages. The Windows Azure SDK provides the DiagnosticMonitorTraceListener trace listener to capture the messages. The Windows Azure Diagnostics basic logs data buffer is used to configure their persistence to the Windows Azure Table Service. The trace listener must be added to the Listeners collection for the Windows Azure hosted service. This is typically done through configuration in the appropriate app.config or web.config file, but it can also be done in code. When it creates a worker or web role, the Windows Azure tooling for Visual Studio adds the DiagnosticMonitorTraceListener to the list of trace listeners specified in the Configuration section of the relevant configuration file. Methods of the System.Diagnostics.Trace class can be used to write error, warning and informational messages. When persisting the messages to the storage service, the Diagnostics Agent can filter the messages if a LogLevel filter is configured for the BasicLogsBufferConfiguration. The Compute Emulator in the development environment adds an additional trace listener, so that trace messages can be displayed in the Compute Emulator UI. In this recipe, we will learn how to trace messages using the Windows Azure trace listener. How to do it... We are going to see how to use the trace listener provided in the Windows Azure SDK to trace messages and persist them to the storage service. We do this as follows: Ensure that the DiagnosticMonitorTraceListener has been added to the appropriate configuration file: app.config for a worker role and web.config for a web role. If necessary, add the following to the Configuration section of app.config or web.config file: <system.diagnostics> <trace> <listeners> <add type="Microsoft.WindowsAzure.Diagnostics. DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics"> <filter type="" /> </add> </listeners> </trace> </system.diagnostics> Use the following to write an informational message: System.Diagnostics.Trace.TraceInformation("Information"); Use the following to write a warning message: System.Diagnostics.Trace.Warning("Warning "); Use the following to write an error message: System.Diagnostics.Trace.TraceError("Error"); Ensure that the DiagnosticMonitorConfiguration.Logs property is configured with an appropriate ScheduledTransferPeriod and ScheduledTransferLogLevelFilter when DiagnosticMonitor.Start() is invoked. How it works... In steps 1 and 2, we ensure that the DiagnosticMonitorTraceListener is added to the collection of trace listeners for the web role or worker role. In steps 3 through 5, we see how to write messages to the trace listener. In step 6, we ensure that the Diagnostic Agent has been configured to persist the messages to the storage service. Note that they can also be persisted through an on-demand transfer. This configuration is described in the recipe Initializing the configuration of Windows Azure Diagnostics. There's more... The Windows Azure SDK v1.3 introduced full IIS in place of the hosted web core used previously for web roles. With full IIS, the web role entry point and IIS are hosted in separate processes. Consequently, the trace listener must be configured separately for each process. The configuration using web.config configures the trace listener for IIS, not the web role entry point. Note that Windows Azure Diagnostics needs to be configured only once in each role, even though the trace listener is configured separately in both the web role entry point and in IIS. The web role entry point runs under a process named WaIISHost.exe. Consequently, one solution is to create a special configuration file for this process named WaIISHost.exe.config and add the trace listener configuration to it. A more convenient solution is to add the DiagnosticMonitorTraceListener trace listener programmatically to the list of trace listeners for the web role entry point. The following demonstrates an overridden OnStart() method in a web role entry point modified to add the trace listener and write an informational message: public override bool OnStart() { System.Diagnostics.Trace.Listeners.Add(new Microsoft. WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener()); System.Diagnostics.Trace.AutoFlush = true; System.Diagnostics.Trace.TraceInformation("Information"); return base.OnStart(); } The AutoFlush property is set to true to indicate that messages should be flushed through the trace listener as soon as they are written. Performing an on-demand transfer The Windows Azure Diagnostics configuration file specifies a schedule in which the various data buffers are persisted to the Windows Azure Storage Service. The on-demand transfer capability in Windows Azure Diagnostics allows a transfer to be requested outside this schedule. This is useful if a problem occurs with an instance and it becomes necessary to look at the captured logs before the next scheduled transfer. An on-demand transfer is requested for a specific data buffer in a specific instance. This request is inserted into the diagnostics configuration for the instance stored in a blob in wad-control-container. This is an asynchronous operation whose completion is indicated by the insertion of a message in a specified notification queue. The on-demand transfer is configured using an OnDemandTransferOptions instance that specifies the DateTime range for the transfer, a LogLevelFilter that filters the data to be transferred, and the name of the notification queue. The RoleInstanceDiagnosticeManager.BeginOnDemandTransfer() method is used to request the on-demand transfer with the configured options for the specified data buffer. Following the completion of an on-demand transfer, the request must be removed from the diagnostics configuration for the instance by using the RoleInstanceDiagnosticManager.EndOnDemandTransfer() method. The completion message in the notification queue should also be removed. The GetActiveTransfers() and CancelOnDemandTransfers() methods of the RoleInstanceDiagnosticManager class can be used to enumerate and cancel active on-demand transfers. Note that it is not possible to modify the diagnostics configuration for the instance if there is a current request for an on-demand transfer, even if the transfer has completed. Note that requesting an on-demand transfer does not require a direct connection with the hosted service. The request merely modifies the diagnostic configuration for the instance. This change is then picked up when the Diagnostic Agent on the instance next polls the diagnostic configuration for the instance. The default value for this polling interval is 1 minute. This means that a request for an on-demand transfer needs to be authenticated only against the storage service account containing the diagnostic configuration for the hosted service. In this recipe, we will learn how to request an on-demand transfer and clean up after it completes. How to do it... We are going to see how to request an on-demand transfer and clean up after it completes. We do this as follows: Use Visual Studio to create a WPF project. Add the following assembly references to the project: Microsoft.WindowsAzure.Diagnostics.dll Microsoft.WindowsAzure.ServiceRuntime.dll Microsoft.WindowsAzure.StorageClient.dll System.configuration.dll Add a class named OnDemandTransferExample to the project. Add the following using statements to the class: using Microsoft.WindowsAzure; using Microsoft.WindowsAzure.Diagnostics; using Microsoft.WindowsAzure.Diagnostics.Management; using Microsoft.WindowsAzure.ServiceRuntime; using Microsoft.WindowsAzure.StorageClient; using System.Configuration; Add the following private member to the class: String wadNotificationQueueName = "wad-transfer-queue"; Add the following method, requesting an on-demand transfer, to the class: public void RequestOnDemandTransfer( String deploymentId, String roleName, String roleInstanceId) { CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse( ConfigurationManager.AppSettings[ "DiagnosticsConnectionString"]); OnDemandTransferOptions onDemandTransferOptions = new OnDemandTransferOptions() { From = DateTime.UtcNow.AddHours(-1), To = DateTime.UtcNow, LogLevelFilter = Microsoft.WindowsAzure.Diagnostics.LogLevel.Verbose, NotificationQueueName = wadNotificationQueueName }; RoleInstanceDiagnosticManager ridm = cloudStorageAccount.CreateRoleInstanceDiagnosticManager( deploymentId, roleName, roleInstanceId); IDictionary<DataBufferName, OnDemandTransferInfo> activeTransfers = ridm.GetActiveTransfers(); if (activeTransfers.Count == 0) { Guid onDemandTransferId = ridm.BeginOnDemandTransfer( DataBufferName.PerformanceCounters, onDemandTransferOptions); } } Add the following method, cleaning up after an on-demand transfer, to the class: public void CleanupOnDemandTransfers() { CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse( ConfigurationManager.AppSettings[ "DiagnosticsConnectionString"]); CloudQueueClient cloudQueueClient = cloudStorageAccount.CreateCloudQueueClient(); CloudQueue cloudQueue = cloudQueueClient.GetQueueReference( wadNotificationQueueName); CloudQueueMessage cloudQueueMessage; while ((cloudQueueMessage = cloudQueue.GetMessage()) != null) { OnDemandTransferInfo onDemandTransferInfo = OnDemandTransferInfo.FromQueueMessage( cloudQueueMessage); String deploymentId = onDemandTransferInfo.DeploymentId; String roleName = onDemandTransferInfo.RoleName; String roleInstanceId = onDemandTransferInfo.RoleInstanceId; Guid requestId = onDemandTransferInfo.RequestId; RoleInstanceDiagnosticManager ridm = cloudStorageAccount.CreateRoleInstanceDiagnosticManager( deploymentId, roleName, roleInstanceId); Boolean result = ridm.EndOnDemandTransfer(requestId); cloudQueue.DeleteMessage(cloudQueueMessage); } } Add the following Grid declaration to the Window element of MainWindow.xaml: <Grid> <Label Content="DeploymentId:" Height="28" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="30,60,0,0" Name="label1" /> <Label Content="Role name:" Height="28" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="30,110,0,0" Name="label2" /> <Label Content="Instance Id:" Height="28" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="30,160,0,0" Name="label3" /> <TextBox HorizontalAlignment="Left" VerticalAlignment="Top" Margin="120,60,0,0" Name="DeploymentId" Height="23" Width="120" Text="24447326eed3475ca58d01c223efb778" /> <TextBox HorizontalAlignment="Left" VerticalAlignment="Top" Margin="120,110,0,0" Width="120" Name="RoleName" Text="WebRole1" /> <TextBox Height="23" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="120,160,0,0" Width="120" Name="InstanceId" Text="WebRole1_IN_0" /> <Button Content="Request On-Demand Transfer" Height="23" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="60,220,0,0" Width="175" Name="RequestTransfer" Click="RequestTransfer_Click" /> <Button Content="Cleanup On-Demand Transfers" Height="23" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="300,220,0,0" Width="175" Name="CleanupTransfers" Click="CleanupTransfers_Click" /> </Grid> Add the following event handler to MainWindow.xaml.cs: private void RequestTransfer_Click( object sender, RoutedEventArgs e) { String deploymentId = DeploymentId.Text; String roleName = RoleName.Text; String roleInstanceId = InstanceId.Text; OnDemandTransferExample example = new OnDemandTransferExample(); example.RequestOnDemandTransfer( deploymentId, roleName, roleInstanceId); } Add the following event handler to MainWindow.xaml.cs: private void CleanupTransfers_Click( object sender, RoutedEventArgs e) { OnDemandTransferExample example = new OnDemandTransferExample(); example.CleanupOnDemandTransfers(); } Add the following to the configuration element of app.config: <appSettings> <add key="DiagnosticsConnectionString" value="DefaultEndpointsProtocol=https;AccountName={ ACCOUNT_NAME};AccountKey={ACCESS_KEY}"/> </appSettings> How it works... We create a WPF project in step 1 and add the required assembly references in step 2. We set up the OnDemandTransferExample class in steps 3 and 4. We add a private member to hold the name of the Windows Azure Diagnostics notification queue in step 5. In step 6, we add a method requesting an on-demand transfer. We create an OnDemandTransferOptions object configuring an on-demand transfer for data captured in the last hour. We provide the name of the notification queue Windows Azure Diagnostics inserts a message indicating the completion of the transfer. We use the deployment information captured in the UI to create a RoleInstanceDiagnosticManager instance. If there are no active on-demand transfers, then we request an on-demand transfer for the performance counters data buffer. In step 7, we add a method cleaning up after an on-demand transfer. We create a CloudStorageAccount object that we use to create the CloudQueueClient object with which we access to the notification queue. We then retrieve the transfer-completion messages in the notification queue. For each transfer-completion message found, we create an OnDemandTransferInfo object describing the deploymentID, roleName, instanceId, and requestId of a completed on-demand transfer. We use the requestId to end the transfer and remove it from the diagnostics configuration for the instance allowing on-demand transfers to be requested. Finally, we remove the notification message from the notification queue. In step 8, we add the UI used to capture the deployment ID, role name, and instance ID used to request the on-demand transfer. We can get this information from the Windows Azure Portal or the Compute Emulator UI. This information is not needed for cleaning up on-demand transfers, which uses the transfer-completion messages in the notification queue. In steps 9 and 10, we add the event handlers for the Request On-Demand Transfer and Cleanup On-Demand Transfers buttons in the UI. These methods forward the requests to the methods we added in steps 6 and 7. In step 11, we add the DiagnosticsConnectionString to the app.config file. This contains the connection string used to interact with the Windows Azure Diagnostics configuration. We must replace {ACCOUNT_NAME} and {ACCESS_KEY} with the storage service account name and access key for the storage account in which the Windows Azure Diagnostics configuration is located.
Read more
  • 0
  • 0
  • 8320

article-image-introduction-vmware-horizon-mirage
Packt
17 Dec 2013
12 min read
Save for later

An Introduction to VMware Horizon Mirage

Packt
17 Dec 2013
12 min read
(For more resources related to this topic, see here.) What is Horizon Mirage? Horizon Mirage is a Windows desktop image management solution that centralizes a copy of each desktop image onto the Horizon Mirage Server infrastructure hosted in the datacenter. Apart from copying the image, Horizon Mirage also separates and categorizes it into logical layers. It's like adding a level of abstraction but without a hypervisor. These layers fall into two categories: those owned and managed by the end user, and those owned and managed by the IT department. This allows the IT managed layers, such as the operating system, to be independently updated while leaving the end users' files, profiles, and applications which they have installed all intact. These layers are continuously synchronized with the image in the datacenter, creating either full desktop snapshots or snapshots based on the upload policy applied to that particular device. These snapshots are backed up and ready for recovery or rolled back in case of failure of the endpoint device. So, the question that I get asked all the time is, "How is this different from VDI, and does that mean I don't need Horizon View anymore?" To answer this question there are a couple of points to raise. Firstly, Horizon Mirage does not require a hypervisor; it's an image management tool that can manage an image on a physical desktop PC or laptop. In fact, one of the use cases for Mirage is to provide a user with an IT-managed Windows desktop when there is limited connectivity or bandwidth between the datacenter and the local site, or the end user needs to work offline. The latest version of Horizon Mirage (4.3) launched on 19 November, 2013 also supports a managed image running as a persistent VDI desktop on Horizon View. To summarize, Horizon Mirage has the following three key areas it will deliver against: Horizon View desktops allowing management of VDI-based desktop images Physical desktop and laptop image management and backup and recovery for Microsoft Windows-based endpoints The Bring Your Own Device (BYOD) feature delivering a virtual Windows desktop machine onto a device running VMware Fusion Professional or VMware Workstation The following diagram illustrates the three areas of Horizon Mirage integration: The second and the biggest difference is where the image executes. By that I mean where does that desktop actually run? In a VDI environment, the desktop runs as a virtual machine centrally on a server in the datacenter—central management and central execution. In a Horizon Mirage deployment, the desktop is a physical endpoint device, and so it runs either natively or on that device—central management and local execution. The following diagram shows how Horizon Mirage executes locally and Horizon View executes centrally: Horizon Mirage terminologies and naming conventions Before we start to describe some of the use cases and how Horizon Mirage works, it's worth spending five minutes on covering some of the terminologies used. Endpoint In this context, an endpoint refers to an end user's device that is being managed by Mirage; so, in this case, it can either be a Windows desktop PC/laptop or a virtual instance of Windows running inside VMware Workstation or Fusion Professional. Centralized Virtual Desktop (CVD) The name CVD is a bit misleading really, because the desktop is not actually a virtual machine in the true sense of a VM running on a hypervisor. It more than likely refers to the level of abstraction used in creating the different layers of a desktop image. A CVD is a centralized copy or backup of a desktop or laptop machine that has been copied onto the Horizon Mirage Server in the datacenter. The CVD comprises the following core components: Base layers Driver profiles User-installed applications Machine states User settings and data Application layers Collections A collection is a logical grouping of similar endpoints or CVDs. This could be based on departmental machines, for example, the machines used by the Sales and Marketing departments. Collections can be static or dynamic. Reference CVD A reference CVD or reference machine is effectively the centralized copy of the endpoint that you use to build your "gold image" or "master image". It is used to create your base layers. These base layers are then deployed to a user's CVD which in turn synchronizes with the endpoint. All of your endpoints will now be running the same base layer, that is, the core operating systems and applications. Base layer A base layer is effectively a template from which to build a desktop. By removing any existing identity information, you are able to centrally deploy a base layer to multiple endpoints. A base layer comprises the core operating system, service packs, patches, and any core application. It is captured from a reference CVD. The best practice is to have as few base layers as possible. The ideal solution is being able to have just one single base layer for all endpoints. Application layer An application layer is a feature introduced in Version 4.0 that allows applications to be captured and deployed as separate layers. This allows you to manage and deliver applications independent of the base layer by assigning them to a user's CVD. Driver library / driver profile The driver library is where the Horizon Mirage Server stores the drivers from the different hardware vendors/makes/models of endpoints you have in your environment. The best practice is to download these directly from the vendor's website or driver DVD. A driver profile will contain details of all the drivers required for a particular piece of hardware. For example, you may have a driver profile for a Dell Latitude E6400 containing all the relevant drivers for that particular hardware platform. Managing drivers separately effectively decouples the base layer from the hardware, allowing it to build base layers that are independent of the endpoint hardware. It also means that driver conflicts are prevented when you refresh or migrate users to new or different hardware platforms. Hardware-specific device drivers are stored in the driver library and correct drivers are automatically applied to endpoints based on a set of rules that you create to match drivers to endpoints. The driver library can also detect missing or broken drivers on endpoints and fix them; however, it does not upgrade or take other actions on existing healthy drivers. Single Instance Store (SIS) The SIS contains the deduplicated data from all the uploaded CVDs, including operating systems, applications, and user files/data. It significantly reduces the storage requirements for CVDs, because it only holds one copy of each unique file. For example, if you centralize a Windows 7 endpoint which is 100 GB in size, the first upload will be 100 GB. Any subsequent uploads will only store the files that are different; so, in this case, if the second endpoint is also Windows 7, those files that are already stored will not be copied and, instead, pointers to those files will be created. So, in this example, maybe only 200 MB will be copied. The Horizon Mirage Management Server will show you the dedupe ratio for a centralized CVD. Horizon Mirage use cases In this section, we are going to cover, at a high level, some of the key use cases for Horizon Mirage. These are reflected in the Horizon Mirage Common Wizards page in the Management Server. I have broken these down into three categories: manage, migrate, and protect. Manage These features and components are all about delivering image management to your endpoint devices. Endpoint repair As the user's desktop is backed up in the datacenter, it's very easy to replace missing or corrupt files from the backed up image down to the endpoint device. This could be fixing a single file, application, or directory folder, or a complete restore of the endpoint. For example, a user accidentally deletes the Excel executable file and Excel now fails to load. The IT helpdesk can compare the image in the datacenter with the current state of the endpoint and work out that it's the Excel.exe file that is missing. This file will now be copied back to the endpoint and Excel is now up and running again. Single image management With Horizon Mirage you have the ability to create a layered approach to the operating environment by abstracting operating systems, applications, drivers, user data, and so on. This allows you to manage fewer images; in fact, the idea (in Horizon Mirage-speak) is that you have only one core operating system image or base layer. To build a complete endpoint, you assign any additional layers to a user's CVD, for example, assigning an application layer or driver profile. The layers are merged together and then synchronized with the endpoint device. It's like ordering a pizza. You start with the base, choose your toppings, bake it, and then it gets delivered to you. Application layering A new feature in Horizon Mirage is the ability to deliver an application as an individual layer. Application layers can be added to a user's CVD to create a complete desktop with the correct applications for the user based on their entitlement or the department that they work in. For those familiar with VMware ThinApp and the capturing process, Horizon Mirage captures an application layer in a similar way; however, Mirage does not build a virtual bubble like ThinApp. A Horizon Mirage application layer natively installs the application onto the endpoint device, so you first need to make sure the application is compatible with the operating system. Remote office management with Mirage Branch Reflectors Remote office locations have always been a problem for a traditional VDI solution because connectivity is usually limited or, in some cases, non-existent. So how can you deliver a centralized image with limited connectivity or slow WAN connections? With the Branch Reflectors you can "nominate" a desktop PC in the remote location to act as an intermediary between the local endpoints and the datacenter. The local endpoints connect locally to a Branch Reflector rather than the Horizon Mirage Server and, in turn, the Branch Reflector synchronizes with the datacenter, downloading only the differences between the IT-managed layers and the local endpoints. These newly built layers can then be distributed to the local endpoints. One thing to remember is that this feature does not back up the endpoint devices. It is purely for updating an image and so, for example, is ideal for remotely migrating an operating system. Migrate The second category, migrate, covers two migration areas, one for the desktop operating system and the second for hardware refresh projects or migrating between platforms. Windows XP / Windows Vista to Windows 7 Probably the most important feature of Horizon Mirage today is its ability to migrate operating systems, especially as we are rapidly approaching April 8, 2014 when Windows XP goes end-of-support. By using the layered approach to desktop images, Mirage is able to manage the operating system as a separate layer and can therefore update this layer while leaving the user data, profile, and settings all in place. The migration process is also less intrusive to the end user because the files are copied in the background, keeping the user downtime to complete the migration to a minimum. Hardware refresh Similar to the way that Horizon Mirage can migrate an operating system using a layered approach, it can also manage drivers as a separate layer. This means that Horizon Mirage can also be used if you are refreshing your hardware platform, allowing you to build specific driver profiles to match different hardware models from multiple vendors. It also means that, if your endpoint device became corrupt, unusable, or was even stolen, you could replace it with something entirely different, yet still use your same image. Protect The third and final category is protection. This is something that customers don't typically deploy for their desktop and laptop estates. It's usually left to the user to make sure their machine is backed up and protected. Centralized endpoint backup By installing the Horizon Mirage Client onto an endpoint, meaning that the endpoint will now be managed by Horizon Mirage, the first thing that happens is that the endpoint is copied to the Horizon Mirage Server in the datacenter in the form of a CVD for that endpoint/user. In addition to this, Horizon Mirage can create snapshots of the image and create points in time from which to roll back. Desktop recovery Backing up desktops is only half the story; you also need to think about restoring. Horizon Mirage offers the option of restoring specific layers, while preserving the remaining layers. You can restore an endpoint to a previous snapshot without overwriting user data. If a computer is stolen, damaged, or lost, you can restore the entire image to a replacement desktop computer or laptop. It doesn't even need to be the same make and model. Or, you could just select which layers to restore. For example, if a particular application became corrupted, you could just replace that application layer. In this use case, it doesn't just apply to the physical machines. You could restore a physical computer to a virtual desktop machine either temporarily or as a permanent migration, maybe as a stepping-stone to deploying a full VDI solution. Summary In this article we learned about what Horizon Mirage is and the terminologies used along with the three use case categories: Manage, Migrate, and Protect. Resources for Article: Application Packaging in VMware ThinApp 4.7 Essentials [Article] VMware View 5 Desktop Virtualization [Article] Windows 8 with VMware View [Article]
Read more
  • 0
  • 0
  • 8130
Modal Close icon
Modal Close icon