Welcome to the world of automation! I am sure you have heard about vRealize Automation product (vRA), formerly vCloud Automation Center (vCAC). In this chapter, we will focus on the use cases and what vRealize Automation solution is. We will further discuss the following topics:
The conceptual diagram of vRealize Automation
CAFÉ Appliance and component deep dive
vPostgres
RabbitMQ
vCAC server
tcServer (Tomcat)
Telemetry
IaaS – architecture and component deep dive
Model Manager Web (a.k.a. repository)
Model Manager Data
MSSQL database
Manager Service
DEM (Orchestrator and Worker)
Proxy agents
Management agents (starting vRA 6.2)
vRealize Automation (a.k.a vRA) is a complete Cloud Management Platform (CMP) that can be used to build and manage a multi-vendor cloud infrastructure. Using an automation solution, end users can self-provision virtual machines in private and public cloud environments, physical machines (install OEM images), applications, and IT services according to policies defined by administrators.
Using a sales pitch definition—it is a self-serviced, policy-driven orchestration and cloud automation engine with integration capabilities built into the core of the product.
Before you move forward, I would like you to take a closer look at the preceding image that depicts how various products in VMware dovetail to complement the automation solution. The automation construct is built upon a layer of management that can automate tasks and activities in compute, storage, and network components that is referred to as Software-Defined Data Center (SDDC).
I have listed a few key capabilities about vRealize Automation. While there could be many more, the following list should cover it all:
A single solution of abstracted service models
Model once, deploy anywhere
Personalization through policies
vRealize Automation provides the ability to create models of services. These services are abstracted from each other within a single solution. For example, application services (PaaS) are abstracted from infrastructure services (IaaS), which are further abstracted from resource pools. I like to think about these as layers of services, layered on top of each other like the layers of an onion. If you think about it, VMware has always been in the business of providing abstraction.
The power of the abstracted service model is that you can treat a model the same irrespective of where the service gets deployed. You can model a service once, and deploy it anywhere; be it production or development or a cloud (private or public) setup.
Governance is considered a high priority item in the world of automation. Even though we want to create models of abstracted services that can be treated the same irrespective of where they get deployed, we do not want to provide each consumer with the same service. Each consumer demands a personalized service for specific business needs. vRealize Automation provides this ability through policies. Fine-grained policies work in conjunction with personalization of a service. For example, if a developer requests a service, they will receive a development environment with a small footprint, without the need for approvals, perhaps deployed into the public cloud. If a business analyst requests the same service, their personalized service may get deployed into the private cloud by placing approvals in place.
These key capabilities of vRealize Automation are unique. They are critical for providing agility by automating the delivery of personalized services.
From my experience, here are the most common use cases of vRealize Automation solution:
Create a catalog of standard operating systems that can be consumed by an organization with a single click
Offer other services beyond infrastructure, for example—PaaS, XaaS
Requirement to integrate with CMDB or ITSM tools to track activities about a machine when it's provisioned
The integration of an IPAM system for an IP address when provisioning a machine
Advanced governance is priority
Hybrid cloud deployments
Many other use cases can be listed here, but I will limit the list since our primary requirement is to make you understand the core capabilities of the product.
Do spend some time to recognize the goodness this product has:
The identity management appliance or vSphere 5.5 SSO or vSphere 6.0 PSC provides Single Sign-On (SSO) capabilities that allow connectivity to Active Directory (AD) or Open LDAP-compatible directory services.
This is a preconfigured virtual appliance that serves as the heart of the SSO system with limited capabilities released specifically for the vRealize Automation product. It serves all authentication requests and handles multiple identity sources and uses a routing layer to route requests to an appropriate subsystem (a configuration or authentication interface). It is important to note that an IDM appliance is recommended for small-scale deployments. If your design demands high availability, you could use a vSphere feature such as HA and FT since the IDM appliance does not have native capability to cluster or join with the existing SSO deployments.
vSphere 5.5 SSO is available as a Windows-based as well as a Linux-based appliance and can be added to an existing SSO domain. If your design demands an SSO configuration to be highly available behind a load balancer, you are limited to using only a Windows version of SSO, but be aware that it supports active/passive failover mode.
Since the release of vSphere 6, the SSO configuration has been built into Platform Service Controller (PSC) that is available in both Linux and Windows-based flavor. If your design demands an SSO configuration to be highly available behind a load balancer, you have the flexibility to choose between both Linux and Windows:
A vRealize Automation or CAFÉ appliance is a preconfigured virtual appliance that deploys vRealize Automation services and related components. The virtual appliance is built on top of the SUSE Linux Enterprise Server 11 (SLES) operating system. The CAFÉ appliance is focused on the business logic behind vRA that allows the IaaS component to focus on provisioning.
The server includes the vRealize Automation services, which provide the following:
A single portal for self-service provisioning
The management of cloud services
Authoring and administration
Governance-related tasks
It includes an embedded vPostgres database, vRealize Orchestrator (server/configurator), Rabbit MQ messaging server, and vFabric Tomcat server:
The CAFÉ appliance has an embedded vPostgres database for catalog persistence. This appliance has the option to either use the embedded Postgres database or an external vPostgres database. Some of the contents of the database include the following:
Catalog item details
Entitlements
Approval policies
Advanced Service Designer information
Service definitions
This is a message broker that uses the Advanced Message Queuing Protocol (AMQP). Since RabbitMQ service will start before the vcac-server service, it is important that RabbitMQ service starts successfully; otherwise, some of the vRA services will fail. While the services in the CAFÉ appliance use REST API to communicate with each other, RabbitMQ is used to handle the following:
Work queues
Buffer and batch operations
Request offloading
Workload distribution
To check whether the RabbitMQ server is connected, execute the command as shown in the screenshot in the vRealize Automation server — rabbitmqctl list_queues
.
If the RabbitMQ server is connected, the output will match the following screenshot:
In case the RabbitMQ server is down, the result will be an error as shown:
Listing queues... Error: unable to connect to node rabbit@localhost: nodedown
This service is the core of vRealize Automation. It starts the tcServer component when it is initialized.
VMware vFabric tcServer is a web application server based on open source Apache Tomcat. With its lean architecture and small memory footprint, tcServer requires significantly fewer resources than conventional Tomcat servers and allows you to have a greater server density in virtual and cloud environments. vRealize Automation deploys all the web applications inside the vFabric tcServer:
Shell-UI: This is the web interface that users hit when they connect to the CAFÉ UI.
Component registry: This is similar to the SSO lookup service in vCenter. It acts as a central repository that manages all the common services and endpoints. Since all services are registered to component registry, a lookup is performed against it to find the URI and its certificates.
Telemetry was introduced with the vRealize Automation 6.2 version and is also known as the Customer Experience Improvement Program (CEIP). The intent of this new feature is to allow customers to opt-in to send information back to VMware for the purpose of improving the product. This functionality lives within the CAFÉ appliance and is turned off by default when the VA is deployed. To access it, you will have to navigate to the vRA VA VAMI page and click the new tab called Telemetry. Within this screen, you can set when and how often the data is sent back to VMware along with any sort of data masking rules that you want to set up.
The Model Manager role actually refers to two types of data—Model Manager Data and Model Manager Web a.k.a. repository.
The Model Manager Data holds the business logic required to connect/manage endpoints and execute workflows.
Since the business logic is uploaded to the database during the installation of the first web node, the successive web node installation in a distributed install does not allow us to install the Model Manager Data.
Please note, the business logic is always referenced from the database and never referenced from the Model Manager Data folder stored in the filesystem of first web node. While the business logic is not referenced during runtime, it is used only during upgrades or when executing the Register Solution User and RepoUtil commands.
Model Manager is designed for Microsoft IIS and therefore, needs to be installed on a Microsoft IIS Web Server. Model Manager Web is also referred to as repository. It exposes the IaaS data model as a service and allows Create/Read/Update/Delete (CRUD) operations on the IaaS database. They implement the business logic that is executed by Distributed Execution Manager (DEM)—this triggers DEM workflows (more details later in this chapter) on create/update/delete.
The website component communicates with the Model Manager, which provides the component with updates from the DEM, proxy agents, and database.
There are four websites that are configured while installing the Model Manager Web component:
IaaS Web UI:
https://FQDN-of-IAAS-Web-Server/vcac
When a user requests to log in to this website, the IaaS Web UI presents the form in a frame on the Infrastructure tab on the CAFÉ UI.
WAPI portal:
https://FQDN-of-IAAS-Web-Server/WAPI
This is a IIS web application exposing a private API through a REST interface. Web API is a proxy layer that exists in the web machine, which is a service-oriented API developed using .NET, and acts as the integration point between the CAFÉ appliance and the repository. WAPI is registered in the component registry against the IaaS service. The important point to note is that the vCAC service uses the WAPI endpoint registered in component registry to communicate with IaaS components. WAPI is also used to check the health status of the IaaS service. In short, all communication for IIS goes through WAPI.
Reporting website:
https://FQDN-of-IaaS-Web-Server/vcacReports
As the name suggests, it is used for any reporting-related information.
Repository website:
https://FQDN-of-IaaS-Server/Repository
Connecting to this website will fetch you the details related to the repository. However, the only catch is that you should connect to the nodes (
WEB1
orWEB2
) directly and not via the load balancer virtual IP.
This database is used by the IaaS component to store the following:
Business groups
Fabric groups
Endpoint definitions
IaaS resources
Reservation policies
Blueprints
Note
Microsoft Distributed Transaction Coordinator (MSDTC) must be enabled in the database machine. The only supported high availability model for the database is through Windows Server Failover Clustering, and this requires a shared block storage. The newer Always On replication model is not supported today due to the vRA dependency of the MSDTC but it does not work with Always On.
The vCloud Automation Center service (commonly called the Manager Service) is a Windows .NET service that coordinates communication between DEMs, agents including guest agents (over SOAP), the IaaS database, AD (or LDAP), and SMTP. The Manager Service communicates with the repository to queue external workflows in the SQL database that will be later picked up by either a DEM worker or a proxy agent or a guest agent.
Some of the key functionalities of Manager Services are listed here:
Triggers inventory/state/performance data collection for the managed compute resource.
Processes data collection response ONLY for proxy agent-based hypervisor.
The master workflows (machine transition states from requested to destroy) are handled by Manager Service. For details on the life cycle states, refer the following diagram:
A Virtual Machine Observer (VMO) task is another important task performed by Manager Service. This task is scheduled to be executed every 60 minutes to check whether any machine has expired or reached the archive state and initiates the required operations. Earlier, in the vRA 6.2 version, a VMO task was executed every 10 seconds. Since it's a resource-intensive action, it has been added as a configuration parameter in the manager service config file (ProcessLeaseWorkflowTimerCallbackIntervalMiliSeconds
).
For additional details, please refer to http://pubs.vmware.com/vra-62/index.jsp?topic=%2Fcom.vmware.vra.system.administration.doc%2FGUID-5FBB7C73-2AAD-4106-9C0D-DE7B416A4716.html.
The Distributed Execution Manager (DEM) is of two types. It can either act as a DEM Worker or a DEM Orchestrator.
One of the main tasks of a DEM Orchestrator is to monitor the status and health of DEM Workers:
If the Worker service stops or loses its connection to the repository (Model Manager Web), the DEM Orchestrator clears all associated workflow instances to the non-functional DEM Worker, thus allowing the other DEM Workers to pick up the workflows. This explains why we don't need to explicitly create high availability for DEM workers.
Performs the scheduling of daily recurring workflows, such as inventory data collections, state data collection, and performance collections, by creating new workflow instances at the scheduled time.
The RunOneOnly feature in the DEM Orchestrator ensures that only one instance of any workflow is executed at a given time by the DEM Worker.
It pre-processes workflows before they are executed including checking the preconditions that are required for the workflows.
From an availability standpoint, the DEM Orchestrator works in an active/standby configuration. At least one DEM Orchestrator is necessary every time workflows are run. It's also recommended to install an additional DEM Orchestrator instance on a separate machine to help in providing HA in case of failure. In case of failure in the active DEM Orchestrator, the standby Orchestrator will take over automatically since it monitors the status of the active DEM Orchestrator.
Distributed Execution Manager (DEM) Worker executes the core business logic of custom models by interacting with the VMware database and with systems such as vRealize Orchestrator (vRO), vCloud Director, and vCloud Air . Multiple DEMs can be deployed for scalability, availability, and distribution. DEMs can also manage physical machine-related life cycle events.
The agents are designed to interact with external systems. There are different types of agents, each having specific functions:
Virtualization proxy agents: These interact with hypervisors (vSphere, KVM, Hyper-V) to provision virtual machines and the data collection of an inventory. There can be multiple proxy agents to the same hypervisor (N: 1).
Integration agents: Virtual desktop integration (VDI) and external provisioning integration (EPI) fall under this purview.
WMI agents: These enable data collection from the Windows machines managed by vRealize Automation.
Note
If you have more than one vCenter endpoint, then you have to install an additional vSphere agent—it's a 1:1 mapping between the vCenter endpoint and the vSphere Agent. You can have multiple agents talking to the same vCenter endpoint in case of high availability but each agent should be on a different server, and they all should have the same name else you will run into many issues. For example if you have three agents pointing to a single vCenter Server, then all the three agents pointing to this vCenter Server should be named as Agent. Please refer to KB: 2052062 for additional details.
The management agent helps to collect support and telemetry log information through a cluster collection process. The management agent is automatically installed as a part of deployment in every IaaS node. The management agent pings the CAFÉ appliance via port 5480 every three minutes to check whether any work item (telemetry information or a log bundle collection) is pending. If a work item is pending, it will execute the work item and send back the response. So even if the vCAC server service in the CAFÉ appliance has been stopped, the management agent will be able to function.
The roadmap for the management agent in the upcoming releases of vRA will be the following:
Enable vRealize Automation to automate the installation of IaaS components (Windows-based)
Can be installed via command line or UI
As you have understood that the vRealize Automation solution has multiple components, the startup order for every component plays an important role when recovering from a power outage or an orchestrated shutdown. Here is the recommended startup method.
Start with the load balancer; ensure that it is fully functional before moving on to the next step. If NSX LB is used, connect to the NSX Edge appliance via SSH and execute the command (show service loadbalancer monitor
) and check whether the configuration is intact and enabled:
Power on the PostgreSQL and MSSQL database machines if external to your servers. If embedded with the CAFÉ nodes, these must come up first in the boot order.
Power on the identity appliance or vSphere SSO/PSC, and wait until the startup finishes. Before moving on to the next step, do the following:
Connect to the SSO/PSC web portal using its virtual IP (only for SSO and PSC, if they are behind a load balancer). For example,
https://psc.pkct.local
(this should take you to the SSO/PSC page).For the identity appliance, connect to the VAMI page. For example,
https://FQDN-of-Identify-Appliance:5480
Power on the primary vRealize Appliance. If you are running a distributed deployment, start the secondary virtual appliances next and wait until the startup finishes.
Power on vRealize Orchestrator. If you are running a distributed deployment, start the secondary appliances next and wait until the startup finishes.
Power on the primary web node and wait until the startup finishes:
If you are running a distributed deployment, start all the secondary web nodes.
Power on all the Manager Service nodes.
Power on the Distributed Execution Manager Orchestrator, Workers, and all vRealize Automation agents.
You can start these components in any order and you do not need to wait for one startup to finish before you start another.
Once all these steps are completed, perform the following steps:
In a simple deployment—
https://FQDN-or-IP-of-CAFE:5480/
and wait until all the services show up as REGISTERED.In a distributed deployment—
https://FQDN-or-IP-of-CAF
É(1/2..n):5480
and wait until all the services show up as REGISTERED.
To preserve data integrity, follow the following shutdown order for the vRealize Automation components:
Shut down the primary vRO appliance. Once the shutdown completes, shut down the secondary nodes.
Shut down the Distributed Execution Manager Orchestrator, Workers, and all the vRealize Automation agents in any order and wait until all components finish shutting down.
Shut down the Manager Service machine. Once the shutdown finishes, continue to shut down additional nodes if they exist.
Shut down the primary web node. Once the shutdown finishes, continue to shut down additional nodes if they exist.
Shut down the primary vRealize CAFÉ appliance. Once the shutdown finishes, continue to shut down additional node if they exist.
Shut down the PostgreSQL and MSSQL virtual machines in any order and wait until the shutdown finishes.
Shut down your SSO appliance, which could be an identity appliance or a vSphere SSO/PSC if this is dedicated for the vRA deployment.
Shutting down the load balancer will be the last step.
This chapter is intended to refresh your understanding of the vRA architecture, and it depicts the high level details of every component involved. It's important for us to understand the functionality of each component and its workflow before we set out to build the distributed vRealize Automation installation. In the next chapter, we will focus on planning the requirements and steps involved in building the distributed vRealize Automation infrastructure.