Machine Learning Toolkit
In this chapter, we're going to look at the following topics:
- Installing Docker
- Building a machine learning Docker file
- Sharing data back and forth between your host computer and your Docker container
- Building a REST service that uses the machine learning infrastructure run inside of your Docker container
Installing Docker
We'll need to download Docker to get it installed, and in this section, you'll see how we install Docker on Windows and use a script that's suitable for installation on Linux.
Let's install Docker from https://www.docker.com/. The quickest way to get this done is to head up to the menu. Here, we'll choose to download the version for Windows. Give it a click, which will take you right over to the Docker store, where you can download the specific installer for your platform, as shown in the following screenshot:
All the platforms are available here. We'll just download the MSI for Windows. It downloads relatively quickly, and once it's on your PC, you can just click the MSI installer and it will quickly continue.
Installing on Ubuntu is best done with a script. So, I've provided a sample installation script (install-docker.sh) that will update your local package manager pointing to the official Docker distribution repositories, and then simply use apps to get the installation completed.
Getting Docker installed on Linux is pretty straightforward: you just run the install-docker shell script I've provided. The packages will update, download, and then install. When you get to the end of it, you just have to type docker --help to make sure that everything is installed:
Now, for GPU support, which will make your Keras and TensorFlow models run faster, there is a special version called nvidia-docker, which exposes devices on Ubuntu to your Docker containers to allow GPU acceleration. There's an install script for this as well (install-nvidia-docker.sh). Now, assuming that you do have an actual NVIDIA graphics card, you can use NVIDIA Docker in place of Docker.
Here, we're running a test command that uses the NVIDIA SMI, which is really the status program that shows you the GPU status on your machine:
And you can see, our TITAN X is fully exposed to Docker. Getting Docker installed is a relatively easy operation.
In the next section, we're going to take a look at authoring a Docker file to set up a complete machine learning environment.
The machine learning Docker file
Now, let's dive into preparing a machine learning Docker file. In this section, we will take a look at cloning the source files, the base images that are needed for Docker, installing additional required packages, exposing a volume so that you can share your work, and exposing ports so that you'll be able to see Jupyter Notebooks, which is the tool that we'll be using to explore machine learning.
Now, you'll need to get the source code that goes with these sections. Head on over to https://github.com/wballard/kerasvideo/tree/2018, where you can quickly clone the repository. Here, we're just using GitHub for Windows as a relatively quick way in order to make that repository cloned, but you can use Git in any fashion you're comfortable with. It doesn't matter what directory you put these files in; we're just downloading them into our local work directory. Then, we're going to use this location as the place to begin the build of the actual Docker container.
In the clone repository, take a look at the Docker file:
This is what we'll be using to create our environment. We're starting off with the base NVIDIA image that has the CUDA and cuDNN drivers, which will enable GPU support in the future. Now, in this next section, we're updating the package manager that will be on the container to make sure that we have git and wget updated graphics packages so that we'll be able to draw charts in our notebooks:
Now, we're going to be installing Anaconda Python. We're downloading it from the internet, and then running it as a shell script, which will place Python on the machine. We'll clean up after we're done:
Anaconda is a convenient Python distribution to use for machine learning and data science tasks because it comes with pre-built math libraries, particularly Pandas, NumPy, SciPy, and scikit-learn, which are built with optimized Intel Math Kernal Libraries. This is because, even if you don't have a GPU, you can generally get better performance by using Anaconda. It also has the advantage of installing, not as a root or globally underneath your system, but in your home directory. Therefore, you can add it on to an existing system without worrying about breaking system components that might rely on Python, say, in the user's bin or whats been installed by your global package manager.
Now, we're going to be setting up a user on our container called Keras:
When we're running notebooks, they're going to be running as this user, so you'll know who owns the files at all times. Creating a specific user in order to set up your container isn't strictly necessary, but it is convenient to guarantee that you have a consistent setup. As you use these techniques with Docker more, you'll likely explore different base images, and those user directories set up on those images may not be exactly as you expect. For example, you may be using a different shell or have a different home directory path. Setting up your own allows this to be consistent.
Now, we're actually going to be installing conda in our environment:
This will be the Python we're using here, and we'll be installing TensorFlow and Keras on top of it in order to have a complete environment. You'll notice here that we're using both conda and pip. So, conda is the package manager that comes with Anaconda Python, but you can also add packages that aren't available as conda prepackaged images by using the normal pip command. So in this fashion, you can always mix and match and get the packages you need.
In these last sections, we're setting up what's called a VOLUME:
This is going to allow access to the local hard drive on your machine so that your files, as you're editing them and working on them, are not lost inside the container. Then, we're exposing a port that the IPython Notebooks will be shared over. So, the container is going to be serving up port 8888, running the IPython Notebook on the container, and then you'll be able to access it directly from your PC.
Remember that these settings are from the point of view of the container: when we say VOLUME src, what we're really saying is that on the container, create a /src that's ready to receive an amount from whatever your host computer is, which we'll do in a later section when we actually run the container. Then, we say USER keras: this is the user we created before. Afterwards, we say WORKDIR, which says use the /src directory as the current working directory when we finally run our command, that is, jupyter notebook. This sets everything so that we have some reasonable defaults. We're running as the user we expect, and we're going to be in the directory that we expect as we go to run the command that's being exposed on a network port from the container from our Docker.
Now that we've prepared our Docker file, let's take a look at some security settings and how we can share data with our container.
Sharing data
In this section, we will take a look at sharing data between your Docker container and your desktop. We're going to cover some necessary security settings to allow access. We will then run the self test to make sure that we've got those security settings correct, and finally, we're going to run our actual Docker file.
Now, assuming you have Docker installed and running, you need to get into the Docker settings from the cute little whale in the Settings... menu. So, go to the lower right on your taskbar, right-click the whale, and select Settings...:
There are a few security settings we need to get right in order for our VOLUME to work so that our Docker container can look at our local hard drive. I've popped this setting up from the whale, and we're going to select and copy the test command we'll be using later, and click on Apply:
Now, this is going to pop up with a new window asking for a password so that we are allowing Docker to map a shared drive back to our PC so that our PC's hard drive is visible from within the container. This share location is where we're going to be working and editing files so that we can save our work.
Now that we have the command that we copied from the dialog, we're going to go ahead and paste it into the Command Prompt, or you can just type it in where we're going to run a test container, just to make sure that our Docker installation can actually see local hard drives:
C:\11519>docker run --rm -v c:/Users:/data alpine ls /data
So, you can see that with the -v switch, we're saying see c:/Users:, which is actually on our local PC, and then /data, which is actually on the container, which is the volume and the alpine test machine. What you can see is that it's downloading the alpine test container, and then running the ls command, and that we have access:
Note that if you are running on Linux, you won't have to do any of these steps; you just have to run your Docker command with sudo, depending upon which filesystem you're actually sharing. Here, we're running both docker and nvidia-docker to make sure that we have access to our home directories:
Now, we're actually going to build our container with the docker build command. We're going to use -t in order to give it a name called keras, and then go ahead and run the following command:
C:\11519>docker build -t keras .
This will actually run relatively quickly because I have in fact built it before on this computer, and a lot of the files are cached:
Conveniently, the command to build on Linux is the exact same as on Windows with Docker. However, you may choose to build with nvidia-docker if you're working with GPU support on your Linux host. So, what does docker build do? Well, it takes the Docker file and executes it, downloading the packages, creating the filesystem, running commands, and then saving all of those changes against a virtual filesystem so that you can reuse that later. Every time you run the Docker container, it starts from the state you were at when you ran the build. That way, every run is consistent.
Now that we have our Docker container running, we'll move on to the next section where we'll set up and run a REST service with the Jupyter Notebook.
Machine learning REST service
Now that we've got our Docker file built and readable, we're going to run a REST service inside of our container. In this section, we will take a look at running Docker and the correct command-line arguments, the exposed URL from our REST service, and then finally we'll be verifying that Keras is fully installed and operational.
And now for the payoff: we're actually going to run our container using the docker run command. There's a couple of switches we're going to pass here. -p is going to tell us that port 8888 on the container is port 8888 on our PC, and the -v command (and we're actually going to mount our local work directory, which is where we cloned the source code from GitHub) will be mounted into the volume on the container:
C:\11519>docker run -p 8888:8888 -v C:/11519/:/src keras
Press Enter, and suddenly you'll be presented with a token that we're going to actually going to use to test logging in to the IPython container with our web browser:
Now, if you have a GPU on a Linux-based machine, there is a separate Docker file in the gpu folder that you can build a Docker container with in order to get accelerated GPU support. So, as you can see here, we're just building that Docker container and calling it keras-gpu:
It takes a little while to build the container. There's really nothing important to notice in the output; you just need to make sure that the container was actually built successfully at the end:
Now, with the container built, we're going to go ahead and run it. We're going to run it with nvidia-docker, which exposes the GPU device through to your Docker container:
sudo nvidia-docker run -p 8888:8888 -v ~/kerasvideo/:/src keras-gpu
Otherwise, the command-line switches are the same as we did for actually running the straight Keras container, except they're going to be nvidia-docker and keras-gpu. Now, once the container is up and running, you'll get a URL, and then you'll take this URL and paste it into your browser to access the IPython Notebook being served by the container:
Now, we'll go ahead and make a new IPython Notebook really quick. When it launches, we'll import keras, make sure it loads, and that takes a second in order to come up:
Then, we'll use the following code that uses TensorFlow in order to detect GPU support:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
So, we'll be running the preceding bit of code in order to see the libraries and devices:
Now, we can see that we have GPU.
Flipping over to our web browser, go ahead and paste that URL and go:
Oops! It can't be reached because 0.0.0.0 is not a real computer; we'll switch that to localhost, hit Enter, and sure enough we have an IPython Notebook:
We'll go ahead and create a new Python 3 Notebook, and give it a quick test by seeing if we can import the keras library and make sure everything's okay.
Looks like we're all set. Our TensorFlow backend is good to go!
This is the environment that we'll be running throughout this book: a Docker container fully prepared and ready to go so that all you need to do is start it, run it, and then work with the Keras and IPython Notebooks that are hosted inside so that you can have an easy, repeatable environment every time.
Summary
In this chapter, we had a look at how to install Docker, including acquiring it from https://www.docker.com/, setting up a machine learning Docker file, sharing data back with your host computer, and then finally, running a REST service to provide the environment we'll be using throughout this book.
In the next chapter, we're going to dive in and start looking at actual data. Then, we're going to start by understanding how to take image data and prepare it for use in machine learning models.