This might be an old adage but it most certainly doesn't make it any less true: before we learn to run, we must first learn to walk, and even before that, we must learn to crawl. Effectively, this chapter will progress through a little crawling and then some walking in the world of Cloud Technologies, SSH utility, Git Source Control Management software, and onto OpenShift. You'll be running in no time. If you are well versed in the topics leading up to the OpenShift specifics, please feel free to simply skim through the sections offering this background information or skip them altogether as they will most likely be review material. However, if this is new territory to you, please proceed. For those who have some experience in this area, hopefully the following passages will be a helpful refresher.
The mythical creature known as "The Cloud" has become a juggernaut of marketing collateral that often makes those who are technologically inclined want to laugh hysterically or run for the hills. However, it is in fact a paradigm of Information Technology that has taken the market by storm and has no inclination of leaving any time soon. Aside from the marketing hype, this concept of the cloud is truly an evolution of IT that aims to make lives easier for those who use, manage, design, and implement technology. Within the notion of "The Cloud", there are three main Service Models, as per the National Institute of Standards and Technology (NIST) definition of Cloud Computing (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf), or areas of the cloud that are different in their advantages and disadvantages as well as their goals and feature sets. The three service models are:
Infrastructure as a Service (IaaS)
Platform as a Service (PaaS)
Software as a Service (SaaS)
Each of these is listed "as a Service" because the cloud is largely about taking traditional components from Information Technology and offering them as a service either to customers or users within an organization in order to provide a more flexible environment. One thing to note here is that these service models are loosely coupled such that we may use them together, but we do not inherently require all the layers in order to create a cloud architecture.
It was mentioned that the cloud has layers. This is mostly an attempt to help us understand how it all fits together, where the distinction between the service models exists, the roles they play, and how each can be applicable to their target user base. We can visualize these different service models as layers built upon one another, not unlike that of a stack. The lower you are in the stack, the more components you, as a user, are responsible for managing, and the higher you are in the stack, the more your service provider is responsible for. The following diagram will show this example, and further explanation will follow in the coming sections in this chapter:
Beginning at the bottom of our stack, we will find the foundation upon which other layers will often be built. This layer is known as Infrastructure as a Service (IaaS), and it has become a part of the natural evolution to traditional virtualization technologies largely deployed in data centers all over the world. Within an IaaS cloud environment, all the aspects of an infrastructure are virtualized into an abstraction structure. With the introduction of this abstraction, we allow for these components to be utilized in a more flexible manner. Often found within IaaS Clouds are virtualized compute nodes, which are equivalent to traditional virtual machines but are more dynamic or ephemeral in nature. Storage is considered to be virtualized as well and is deployed in a scaled-out approach, generally offering block storage both as ephemeral resources or as persistent disks. Also common among IaaS environments are virtual networks and virtual firewalls allowing for the separation of resources on the network by creating network security zones.
As a user of IaaS Cloud, there are no ticketing systems for which we have to file requests in order to retrieve the resources that the systems administration or operations team provide. Instead, the service model offers the ability to simply provide what is needed. IaaS offers its power and flexibility at this point where we, as a user, are left to make decisions on criteria such as:
Operating System Deployment (OSD)
Service Daemon Configuration (SDC)
Storage Provisioning (SP)
Network Configuration (NC)
While these items are criteria that attribute to the flexibility of IaaS, they also incurthe overhead of needing a DevOps team or, at a minimum, someone on the staff knowledgeable in the area of DevOps and dedicated to the project at hand. There are a number of open source IaaS solutions that have gained considerable popularity, which will provide great examples and a wealth of documentation for readers who would like to continue on their education in this space. This list is alphabetical and possibly not all-inclusive:
DevOps is a new paradigm where the Dev and Ops teams work together in order to solve the need for increased release cycles. The term has been coined by a movement in response to the widening gap between the Dev and Ops teams. It is aimed to solve the problems where a Dev team would write code and hand it over to the Ops team and there was very little coordination between the two. DevOps utilizes the aspects of the cloud, configuration management, and automation tools to satisfy the Dev team's requirement for fast-moving environments, and the Ops team's requirement of a stable and controlled infrastructure.
Moving up one layer in our stack example, we find ourselves at Platform as a Service (PaaS). This service model aims to offer some of the flexibility of an IaaS while removing a great deal of the overhead such as the need to maintain the operating system, storage, deployment, provisioning, and configuration management. The offerings in this space will take the abstraction one level higher, and instead of virtualizing every component of the infrastructure that would normally be provisioned as hardware, PaaS effectively offers the pieces of a puzzle, which when put together, provide the platform on which applications can run. In a PaaS environment, the administrators, developers, or deployment managers of web applications can select the components upon which their application will run, such as the service daemon, programming language, web framework, and database. At this point, the end user's decision should focus more on whether the PaaS being reviewed offers features needed by the individual interested in hosting, deploying, or developing a particular application, along with its capacity, scaling, backup, and any other potential concerns.
Now, there are a number of PaaS providers available and anyone looking to select one should indeed spend some time with their favorite search engine to find candidates interested in becoming their provider. The top contenders should also be taken for a test drive before making any hard decisions. However, since this book is about OpenShift, I hope the reader has decided to use OpenShift, and other PaaS providers will not be discussed as such. One thing to note as a tie-in with the stack analogy is how some PaaS architectures are tightly integrated with IaaS using an Application Programming Interface (API). The API can be used as a means of automating tasks within IaaS from the perspective of PaaS, such as launching a new compute node, auto-configuring its storage and services daemons, and adding these new resources to the PaaS environment to increase capacity. Also note that even if PaaS Clouds are not integrated directly to IaaS, they are often deployed on top of IaaS because of the flexible nature of IaaS Clouds.
Sitting on the top layer of the stack, Software as a Service (SaaS) is the cloud evolution of hosted web applications. This layer of abstraction removes the largest amount of control from the user or customer as they take upon the role of simply a user, or possibly as an application administrator, and the entire platform upon which it runs is managed by the service provider as well as all the infrastructure concerns. However, as with all things where there is "give", there must be "take", and in this scenario, the "give" is a loss of control and flexibility in terms of architectural decisions, choice of backend programming languages, frameworks, databases, and any other selections of technology used. The "take" side of this and why this service model gains such widespread adoption is that some organizations, companies, or teams do not have the expertise, the desire to take on the technical aspects of a hosted web application, or might consider such functions as a burden. Common examples of SaaS hosting are Customer Relations Management (CRM), Enterprise Resource Planning (ERP), Management Information Systems (MIS), as well as other essential business-focused software solutions.
Where did all the clouds go? Why are we talking about SSH all of a sudden? Well, we're talking about SSH because it is an important component of OpenShift as well as other PaaS Clouds. SSH is an acronym for Secure Shell and it is a network communications protocol that creates encrypted connections for remote command executions, shell sessions, and data transfer. From a user's standpoint, SSH is quite simple to use, but do not let that be an indication of its potential as it is quite powerful. We will briefly discuss some simple SSH commands in context to the OpenShift use cases, but first, we need to understand a couple of things about how SSH works so that we can set up some prerequisites. The first thing on our list of prerequisites is the fact that SSH offers public- or private-key-based authentication, which is extremely common and is also used by OpenShift. The most popular implementation of SSH is arguably OpenS SH (http://www.openssh.com/), which is used by OpenShift. OpenSSH can also use other methods for authentication, such as passwords, Single sign-on mechanisms, and even Two-factor authentication. These alternatives are not covered here as they are not applicable to our coverage of OpenShift.
Once public keys are in place, something that OpenShift's client utilities will set up for us, we can simply run the following command to connect to a remote server in order to run commands in an interactive shell.
If we are doing this against a server that is not an OpenShift Gear, we will have to verify whether the configurations are in place to allow for passwordless SSH; there are many guides on this online so we won't discuss it here.
Gears will be explained at length in a later section, but it's effectively a GNU/Linux sandbox environment that is resource constrained and secured with SELinux.
If you are using a GNU/Linux distribution or Mac OS X, you will most likely have an SSH client preinstalled; however, if you are a Windows user, you will need to install a third-party SSH client application such as PuTTY (http://www.putty.org/).
In the preceding example, the shell prompt,
[email protected]$, is used to signify a shell on the local machine, and once the SSH connection is established, the prompt changes to
[email protected]$, signifying that the shell session that is currently at our fingertips is on a different machine. While shell prompts will vary greatly in the wild because of the flexibility of their configuration, this should serve as a decent placeholder to understand that once we are typing into a shell prompt at
[email protected]$, these commands are happening remotely.
The following diagram shows a simple layout of a client computer (such as a laptop) and a server system, along with a sample user account that resides on the server system, cleverly named
user that will offer itself as a high-level overview of the introductory example we previously covered.
There are actually a lot of steps going on in the background of this diagram that have to do with setting up the encrypted connection, but an in-depth coverage of these is not within the scope of this publication.
Another thing we can do with SSH, other than logging in to a remote shell, is execute single commands remotely and receive their output in the local terminal. The following example will display how to obtain our quota information from an OpenShift Gear using the
quota command, without actually entering into an interactive shell session remotely.
OpenShift shell prompts do not actually look like this in real usage; the prompt in the example was modified to maintain consistency with the previous examples. The actual OpenShift prompts and SSH username formatting will be covered in later sections.
[email protected]$ ssh [email protected] 'quota' Disk quotas for user [email protected] (uid 6017): Filesystem blocks quota limit grace files quota limit grace /dev/mapper/EBSStore01-user_home01 604 0 1048576 172 0 40000
As we can see here, the command was executed remotely and the output was sent back to us providing seamless interaction, almost as though we ran the command locally.
SSH is often just used outside interactive shells and remote-command execution. Many utilities in traditional Unix and Unix-like operating systems use or have the option to use SSH as their data transport in order to provide secure transmission of whatever data they need to move between two hosts. Common utilities in this category are
git, which leads us into the next section based on
Once upon a time, developers would maintain complex directory structures of source code that would live on a central server. Members of the development team would mount the directory over a shared file system or develop collectively on the same server, both of which posed a laundry list of problems. There is a classification of utility known as Version Control Systems (VCSs), which solve these issues. VCSs create the ability to maintain a manifest of differentials between "commits" or "versions" of a code base and much more. The VCS of choice for code management and deployment with OpenShift is named Git. The following is an excerpt from the Git website (http://git-scm.com/):
"Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows".
Before we go too deep into the details of Git, there needs to be some discussion about a few Git concepts that are essential to understanding how Git functions and why it is so powerful for developers. The first of which is the notion of a branch. In Git, there is the source-code repository that has been initialized to be tracked, and within that repository, there can be many branches. A branch is effectively a sub-repository snapshot that maintains its own change logs, snapshots, metadata, and so on. A Git branch is not a unique concept as other version control systems share this feature, but many who have never experienced it might find it difficult to follow at first, so hopefully the following diagram will help to clarify:
In the preceding diagram, there are three lines, each representing a branch. A focal point to make note of is what is known as the master branch, which is created by default when you create and initialize a Git repository. It stands to note that at the time a branch is created, it is a point-in-time snapshot of the code base from where the branch originates, and each branch can receive code commits independently from one another. Within the diagram, in this hypothetical Git repository, there are two other branches. One is called
dev and another is called
some_feature, both of which are meant to show that this is all the same code base but has deviations during the development timeline. The arrows moving between the branches introduce another concept from Git called a merge. In Git, when you merge from one branch to another, you are applying the change set or differential from another branch upon the current one. Git has a number of clever methods for accomplishing this task, but it should be mentioned that there is a possibility of a conflict that would have to be resolved before the merge operation can be completed. There are methods for mitigating the risk of merge conflicts, which will be discussed later in this section. The manner in which developers perform their branch-and-merge process is up to their respective development team. There are many approaches to branch/merge development cycles, each with advantages and disadvantages, and discussions of these exist far and wide on the Internet. It is advisable to spend some time researching to find the one that best fits a project's development style.
This has been a very rapid discussion of Git concepts, and we have only scraped the surface of its power and distributed nature. It would be advisable to spend some time with the Git project's documentation (http://git-scm.com/doc) for users who are interested in the breadth of capability that Git offers.
Hopefully, there is enough background information covered up to this point in order to start working with Git, so we will first want to set up a couple of global parameters for good measure.
While it was not covered here, it is assumed that Git is installed on the user's system. For GNU/Linux users of debian-based distributions, this can be done with
apt-get install git as the root (or the
git-all package to pull in all subpackages) or from a Fedora- or Red Hat-based system, it can be accomplished using
yum install git as the root. Other Linux distributions are likely to have the installable package name of
git in their respective repositories. For users of Mac OS X or Windows, please visit Git's download site (http://git-scm.com/downloads) in order to obtain your installation medium.
When using Git for the first time, the first order of business is to set a few global Git settings such as developer identity, editor of choice, and diff tool (for merges). Run the following commands as the system user (that is, as a non-root user), which will be used for development, replacing the name and e-mail address with your own:
$ git config --global user.name "John Smith" $ git config --global user.email [email protected]
Next up on the list will be to configure the editor of choice. Most developers like to use either
emacs, but these are certainly not the only editors in town, so use what fits best. We can configure the editor as follows:
$ git config --global core.editor vim
After these are in place, it would also be wise to configure a merge tool, which is used to assist when handling the merge conflicts. On my system, which is Fedora 19, at the time of writing, the command
git mergetool –tool-help lists the following as valid entries as a merge tool:
xxdiff. These tools are simple examples of merge utilities that can be used, and we should select one we feel comfortable with, or accept the defaults for your system if this is uncharted territory. For those using a GNU/Linux distribution as their development platform of choice, and who enjoy graphical environments,
kdiff3 have both received a lot of positive feedback and would likely be a decent place to start. As a
vimdiff is the merge tool of choice and we'll configure it as follows:
$ git config --global merge.tool vimdiff
There are also a number of other configurable Git variables, which may be found using either the Git documentation found on their website or via the
git-config main page.
Moving on, for the sake of the example, let's assume that there is an application we are going to write named
my_app. For simplicity, it will just be a simple "Hello World" example in Ruby, but it will be enough to cover the basic usage commands. First, we need a directory that we will turn into a Git repository using the following commands:
$ mkdir my_app $ cd my_app $ git init Initialized empty Git repository in ~/myapp/.git/
That's it. That's the magic; we did it! See how easy that was? It is truly amazing how powerful Git is, considering how simple it is to use. Next up, we need to create a file named
app.rb with the following contents:
#!/usr/bin/env ruby puts "Hello world!"
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com.If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to havethe files e-mailed directly to you.
#!/usr/bin/env ruby line is what is called a shebang, and it defines the environment in which the file should be executed. This is a common Unix-ism and will have no effect on the Windows environments.
$ git add app.rb
Then to check the status of our Git repository, run the following command and you should get a similar output:
$ git status # On branch master # # Initial commit # # Changes to be committed: # (use "git rm --cached <file>..." to unstage) # # new file: app.rb #
The portion of these lines of commands that is of interest is the
Changes to be committed part. This means we've added changes to a "staging" status and it is ready to be committed to the Git log. Also, we can add a
commit message to provide some context to what the contents of this commit are. We will commit and then check the Git log; remember, Git maintains a log of all the code that is committed to the repository. Commit the code and view the Git logs with the following commands:
$ git commit -m "Initial commit of app.rb, Hello World example" [master (root-commit) 77839fd] Initial commit of app.rb, Hello World example 1 file changed, 3 insertions(+) create mode 100644 app.rb $ git log commit 77839fdef6f17012797e93f05516d342570d31d6 Author: Adam Miller <[email protected]> Date: Wed Jan 9 23:21:04 2013 -0600 Initial commit of app.rb, Hello World example
One thing to note here is that if you were to run the command,
git show, it will show you the latest entry in the Git log, including the changes committed as follows. We will see the line start with two paths that don't really exist,
b/app.rb, these are effectively placeholders that show the differential between what
app.rb used to be and what it is now within this Git branch:
$ git show commit 77839fdef6f17012797e93f05516d342570d31d6 Author: Adam Miller <[email protected]> Date: Wed Jan 9 23:21:04 2013 -0600 Initial commit of app.rb, Hello World example diff --git a/app.rb b/app.rb new file mode 100644 index 0000000..2966711 --- /dev/null +++ b/app.rb @@ -0,0 +1,3 @@ +#!/usr/bin/env ruby + +puts "Hello world!"
In the preceding output, there is a commit ID, which is a unique identifier for this commit, followed by the Author and Date stamp for the commit.
A quick side mention that should be considered is that date stamps are not always chronologically ordered as we might think they should be, and this can happen in a number of ways, but most commonly, are going to be time zones of commits in a distributed development model or merges intermingling commits.
After the commit ID, the Author, and the Date stamp, is the commit message andthe diff. For those familiar with the
patch tools, they will feel right at home with this output formatting and its meanings. If this is new territory, fret not as the output is relatively straightforward: the lines with a
+ character prepended are additions to the file, lines with a
- character prepended are removals from the file, lines without any prepended characters are not modified, and lines with the
@@ characters are offsets in the file.
If the Git repository we were working with had not been initialized on our local machine, but instead had been cloned from a remote repository, which is what happens when you use OpenShift, there would be one more command needed to propagate this commit to the remote server:
git push. Do you remember we have mentioned before that Git is distributed, and therefore, the commit we made previously was only to our local repository? By performing a
git push, we are "pushing" those changes out to a remote location. The default remote location in Git nomenclature is known as
origin, but we need not supply that information to the command because by default it is assumed.
Note that the following output is from an OpenShift Git repository and will contain some output that might not be very meaningful, but don't worry as this will be covered at length in the later sections.
$ git push Counting objects: 4, done. Delta compression using up to 4 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 290 bytes, done. Total 3 (delta 1), reused 0 (delta 0) remote: restart_on_add=false remote: Waiting for stop to finish remote: Done remote: restart_on_add=false remote: ~/git/sinatra.git ~/git/sinatra.git remote: ~/git/sinatra.git remote: Running .openshift/action_hooks/pre_build remote: Running .openshift/action_hooks/build remote: Running .openshift/action_hooks/deploy remote: hot_deploy_added=false remote: Done remote: Running .openshift/action_hooks/post_deploy To ssh://[email protected]/~/git/sinatra.git/ bab6f7c..e703aa8 master -> master
In this section we will discuss OpenShift at a very high level, showing certain components of the backend, explaining a little about how everything fits together and how it works, without getting too much in depth with each. That level of granularity will be explored in later sections and might not be applicable to everyone's interests.
Following the preceding diagram, we will walk through the path where traffic will flow, starting with the perspective of a user utilizing the OpenShift Client Tools. From there, we will proceed by stepping through the components so that we can get a basic feel of the way the platform works before diving deeper into the individual levels. Both the DNS server and the MongoDB system will not receive much focus at this time as DNS and database servers are conceptually extremely widespread technologies and this is meant to be a high-level discussion; they will, however, receive focus in later sections. OpenShift is written in the Ruby programming language (http://www.ruby-lang.org) using the very popular Ruby on Rails (http://rubyonrails.org/) web framework. This will be noteworthy for any reader who may be interested in joining the OpenShift Origin upstream development community.
Ruby is an open source, multi-platform, object-oriented programming language that has gained considerable popularity in recent years, especially in the realm of web development in large, thanks to the Ruby on Rails web framework, which is also open source. More information on each of these, respectively, can be found at http://www.ruby-lang.org/en/http://rubyonrails.org/.
The layer where OpenShift users will spend most of their time developing or hosting their web application is the client tools. The client tools will communicate with the server known as a broker, which we will cover shortly. Within the category of client tools, there are a number of options: first is the command-line interface that is distributed as a RubyGem. The Gem itself is distributed on https://rubygems.org/gems/rhc, and more information about it and its code base can be found on GitHub (https://github.com/openshift/rhc/).
A RubyGem is a package management format that is the standard for distributing and consuming software, and software libraries written in the Ruby programming language. These packages are often repackaged into GNU/Linux distribution's native package format such as
deb so that it can be easily deployed and installed. For more information about RubyGems, visit https://rubygems.org/.
Aside from the RubyGem version of the command-line client tools, there are also various Integrated Development Environments (IDEs) that offer integration as well as the Web Console, all of which allow for the developer to utilize the OpenShift platform. At the time of this writing, the Red Hat JBoss Developer Studio, which is an IDE from Red Hat based on Eclipse (https://devstudio.jboss.com), an Eclipse plugin, and the Zend Studio IDEs, offer OpenShift client-integration plugins. However, it should be noted that no matter the utility, all end user tools utilize the Representational State Transfer Application Programming Interface (REST API) on the backend. This will be covered in more detail later.
Interaction with OpenShift happens through the broker. This component of OpenShift can be thought of as the facilitator, as it handles REST API calls and translates them into actions. These actions can be DNS updates, user authentication, or an application action such as creation, deletion, or other state changes. When these actions are sent to the broker, it will make its decisions and utilize a message-passing mechanism to instruct other components within OpenShift to carry out a task. Depending on the task required, the broker will send a message such as the following to a supporting system:
Name server record updates
Application state transition
Node tasks: performing actions against a user's application environment
When an OpenShift user creates an application, they do so within a container that has been titled as Gear, and multiple OpenShift gears will live on a single Node. Multiple nodes can exist in an OpenShift environment, but the important point to be observed here, which makes the OpenShift PaaS unique, is that it's multitenant at the operating system level or at the platform layer, which offers high density. These nodes are the work horses. This is where the applications run, the databases run, the Git repositories live, as well as what is known as a Cartridge will execute (more on cartridges later), and the location that we will land at when we SSH into our application or Gear.
This has been a whirlwind take on all things in cloud computing and most notably that of Platform as a Service. In this chapter, we have also covered some background utilities such as SSH and Git that are essential for using OpenShift, and we even took a high-level look into the OpenShift architecture in order to see how all the components fit together. Next up, we will move on to actually using OpenShift by utilizing the hosted free OpenShift Online service, as we explore all the ways we are able to deploy our applications into "The Cloud" using OpenShift.