TensorFlow is a new machine learning and graph computation library recently released by Google. Its Python interface ensures the elegant design of common models, while its compiled backend ensures speed.

Let's take a glimpse at the techniques you'll learn and the models you'll build as you apply TensorFlow.

In this section, you will learn what TensorFlow is, how to install it, and how to build simple models and do simple computations. Further, you will learn how to build a logistic regression model for classification, and introduce a machine learning problem to help us learn TensorFlow.

We're going to learn what kind of library TensorFlow is and install it on our own Linux machine, or a free instance of CoCalc if you don't have access to a Linux machine.

First, what is TensorFlow? TensorFlow is a new machine learning library put out by Google. It is designed to be very easy to use and is very fast. If you go to the TensorFlow website, tensorflow.org, you will have access to a wealth of information about what TensorFlow is and how to use it. We'll be referring to this often, particularly the documentation.

Before we get started with TensorFlow, note that you need to install it, as it probably doesn't come preinstalled on your operating system. So, if you go to the **Install** tab on the TensorFlow web page, click on **Installing TensorFlow on Ubuntu**, and then click on **"native" pip**, you will learn how to install TensorFlow.

Installing TensorFlow is very challenging, even for experienced system administrators. So, I highly recommend using something like the `pip`

installation; alternatively, if you're familiar with Docker, use the Docker installation. You can install TensorFlow from the source, but this can be very difficult. We will install TensorFlow using a precompiled binary called a **wheel file**
. You can install this file using Python's `pip`

module installer.

For the `pip`

installation, you have the option of using either a Python 2 or Python 3 version. Also, you can choose between the CPU and GPU version. If your computer has a powerful graphics card, the GPU version may be for you.

However, you need to check that your graphics card is compatible with TensorFlow. If it's not, it's fine; everything in this series can be done with just the CPU version.

### Note

We can install TensorFlow by using the `pip install tensorflow`

command (based on your CPU or GPU support and `pip`

version), as shown in the preceding screenshot.

So, if you copy the following line for TensorFlow, you can install it as well:

# Python 3.4 installationsudo pip3 install --upgrade \https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.1-cp34-cp34m-linux_x86_64.whl

If you don't have Python 3.4, as the wheel file called for, that's okay. You can probably still use the same wheel file. Let's take a look at how to do this for Python 3.5. First, you just need to download the wheel file directly, by either putting the following URL in your browser or using a command-line program, such as `wget`

, as we're doing here:

**wget https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.1-cp34-cp34m-linux_x86_64.whl**

If you download this, it will very quickly be grabbed by your computer.

Now all you need to do is change the name of the file from `cp34`

, which stands for Python 3.4, to whichever version of Python 3 you're using. In this case, we'll change it to a version using Python 3.5, so we'll change `4`

to `5`

:

**mv tensorflow-1.2.1-cp34-cp34m-linux_x86_64.whl tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl**

Now you can install TensorFlow for Python 3.5 by simply changing the installation line here to `pip3 install`

and the name of the new wheel file after changing it to 3.5:

**sudo pip3 install ./tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl**

We can see this works just fine. Now you've installed TensorFlow.

If your installation somehow becomes corrupted later, you can always jump back to this segment to remind yourself about the steps involved in the installation.

If you don't have administrative or installation rights on your computer but still want to try TensorFlow, you can try running TensorFlow over the web in a CoCalc instance. If you go to https://cocalc.com/ and create a new account, you can create a new project. This will give you a sort of a virtual machine that you can play around with. Conveniently, TensorFlow is already installed in the Anaconda 3 kernel.

Let's create a new project called `TensorFlow`

. Click on **+Create new project…**, enter a title for your project, and click on **Create Project**. Now we can go into our project by clicking on the title. It will take a couple of seconds to load.

Click on **+New** to create a new file. Here, we'll create a Jupyter notebook:

Jupyter is a convenient way to interact with IPython and the primary means of using CoCalc for these computations. It may take a few seconds to load.

When you get to the interface shown in the following screenshot, the first thing you need to do is change the kernel to Anaconda Python 3 by going to **Kernel** | **Change kernel…** | **Python 3 (Anaconda)**:

This will give you the proper dependencies to use TensorFlow. It may take a few seconds for the kernel to change. Once you are connected to the new kernel, you can type `import tensorflow`

in the cell and go to **Cell** | **Run Cells** to check whether it works:

If your Jupyter notebook takes a long time to load, you can instead create a Terminal in CoCalc using the button shown in the following screenshot:

Once there, type `anaconda3`

to switch environments, then type `ipython3`

to launch an interactive Python session, as shown in the following screenshot:

You can easily work here, although you won't be able to visualize the output. Type `import tensorflow`

in the Terminal and off you go.

So far in this section, you've learned what TensorFlow is and how to install it, either locally or on a virtual machine on the web. Now we're ready to explore simple computations in TensorFlow.

First, we're going to take a look at the tensor object type. Then we'll have a graphical understanding of TensorFlow to define computations. Finally, we'll run the graphs with sessions, showing how to substitute intermediate values.

The first thing you need to do is download the source code pack for this book and open the `simple.py`

file. You can either use this file to copy and paste lines into TensorFlow or CoCalc, or type them directly yourselves. First, let's import `tensorflow`

as `tf`

. This is a convenient way to refer to it in Python. You'll want to hold your constant numbers in `tf.constant`

calls. For example, let's do `a = tf.constant(1)`

and `b = tf.constant(2)`

:

import tensorflow as tf # You can create constants in TF to hold specific values a = tf.constant(1) b = tf.constant(2)

Of course, you can add and multiply these to get other values, namely `c`

and `d`

:

# Of course you can add, multiply, and compute on these as you like c = a + b d = a * b

TensorFlow numbers are stored in **tensors**, a fancy term for multidimensional arrays. If you pass a Python list to TensorFlow, it does the right thing and converts it into an appropriately dimensioned tensor. You can see this illustrated in the following code:

# TF numbers are stored in "tensors", a fancy term for multidimensional arrays. If you pass TF a Python list, it can convert it V1 = tf.constant([1., 2.]) # Vector, 1-dimensional V2 = tf.constant([3., 4.]) # Vector, 1-dimensional M = tf.constant([[1., 2.]]) # Matrix, 2d N = tf.constant([[1., 2.],[3.,4.]]) # Matrix, 2d K = tf.constant([[[1., 2.],[3.,4.]]]) # Tensor, 3d+

The `V1`

vector, a one-dimensional tensor, is passed as a Python list of `[1. , 2.]`

. The dots here just force Python to store the number as decimal values rather than integers. The `V2`

vector is another Python list of `[3. , 4. ]`

. The `M`

variable is a two-dimensional matrix made from a list of lists in Python, creating a two-dimensional tensor in TensorFlow. The `N`

variable is also a two-dimensional matrix. Note that this one actually has multiple rows in it. Finally, `K`

is a true tensor, containing three dimensions. Note that the final dimension contains just one entry, a single two-by-two box.

Don't worry if this terminology is a bit confusing. Whenever you see a strange new variable, you can jump back to this point to understand what it might be.

You can also do simple things, such as add tensors together:

V3 = V1 + V2

Alternatively, you can multiply them element-wise, so each common position is multiplied together:

# Operations are element-wise by default M2 = M * M

For true matrix multiplication, however, you need to use `tf.matmul`

, passing in your two tensors as arguments:

NN = tf.matmul(N,N)

Everything so far has just specified the TensorFlow graph; we haven't yet computed anything. To do this, we need to start a session in which the computations will take place. The following code creates a new session:

sess = tf.Session()

Once you have a session open, doing: `sess.run(NN)`

will evaluate the given expression and return an array. We can easily send this to a variable by doing the following:

output = sess.run(NN) print("NN is:") print(output)

If you run this cell now, you should see the correct tensor array for the `NN`

output on the screen:

When you're done using your session, it's good to close it, just like you would close a file handle:

# Remember to close your session when you're done using it sess.close()

For interactive work, we can use `tf.InteractiveSession()`

like so:

sess = tf.InteractiveSession()

You can then easily compute the value of any node. For example, entering the following code and running the cell will output the value of `M2`

:

# Now we can compute any node print("M2 is:") print(M2.eval())

Of course, not all our numbers are constant. To update weights in a neural network, for example, we need to use `tf.Variable`

to create the appropriate object:

W = tf.Variable(0, name="weight")

Note that variables in TensorFlow are not initialized automatically. To do so, we need to use a special call, namely `tf.global_variables_initializer()`

, and then run that call with `sess.run()`

:

init_op = tf.global_variables_initializer() sess.run(init_op)

This is to put a value in that variable. In this case, it will stuff a `0`

value into the `W`

variable. Let's just verify that `W`

has that value:

print("W is:") print(W.eval())

You should see an output value for `W`

of `0`

in your cell:

Let's see what happens when you add `a`

to it:

W += a print("W after adding a:") print(W.eval())

Recall that `a`

is `1`

, so you get the expected value of `1`

here:

Let's add `a`

again, just to make sure we can increment and that it's truly a variable:

W += a print("W after adding a:") print(W.eval())

Now you should see that `W`

is holding `2`

, as we have incremented it twice with `a`

:

You can return or supply arbitrary nodes when doing a TensorFlow computation. Let's define a new node but also return another node at the same time in a fetch call. First, let's define our new node `E`

, as shown here:

E = d + b # 1*2 + 2 = 4

Let's take a look at what `E`

starts as:

print("E as defined:") print(E.eval())

You should see that, as expected, `E`

equals `4`

. Now let's see how we can pass in multiple nodes, `E`

and `d`

, to return multiple values from a `sess.run`

call:

# Let's see what d was at the same time print("E and d:") print(sess.run([E,d]))

You should see multiple values, namely `4`

and `2`

, returned in your output:

Now suppose we want to use a different intermediate value, say for debugging purposes. We can use `feed_dict`

to supply a custom value to a node anywhere in our computation when returning a value. Let's do that now with `d`

equals `4`

instead of `2`

:

# Use a custom d by specifying a dictionary print("E with custom d=4:") print(sess.run(E, feed_dict = {d:4.}))

Remember that `E `

equals` d + b`

and the values of `d`

and `b`

are both `2`

. Although we've inserted a new value of `4`

for `d`

, you should see that the value of `E`

will now be output as `6`

:

You have now learned how to do core computations with TensorFlow tensors. It's time to take the next step forward by building a logistic regression model.

Okay, let's get started with building a real machine learning model. First, we'll see the proposed machine learning problem: font classification. Then, we'll review a simple algorithm for classification, called **logistic regression**. Finally, we'll implement logistic regression in TensorFlow.

Before we jump in, let's load all the necessary modules:

import tensorflow as tf import numpy as np

If you're copying and pasting to IPython, make sure your `autoindent`

property is set to `OFF`

:

%autoindent

The `tqdm`

module is optional; it just shows nice progress bars:

try: from tqdm import tqdm except ImportError: def tqdm(x, *args, **kwargs): return x

Next, we'll set a seed of `0`

, just to get consistent data splitting from run to run:

# Set random seed np.random.seed(0)

In this book, we've provided a dataset of the images of characters using five fonts. For convenience, these are stored in a compressed NumPy file (`data_with_labels.npz`

), which can be found in the download package of this book. You can easily load these into Python with `numpy.load`

:

# Load data data = np.load('data_with_labels.npz') train = data['arr_0']/255. labels = data['arr_1']

The `train`

variable here holds the actual pixel values scaled from 0 to 1, and `labels`

holds the type of font that it was; therefore, it'll be either 0, 1, 2, 3, or 4, as there are five fonts in total. You can print out these values, so you can look at them using the following code:

# Look at some data print(train[0]) print(labels[0])

However, that's not very instructive, as most of the values are zeroes and only the central part of the screen contains the image data:

If you have Matplotlib installed, now is a good place to import it. We'll use `plt.ion()`

to automatically bring up figures when needed:

# If you have matplotlib installed import matplotlib.pyplot as plt plt.ion()

Here are some example images of characters from each font:

Yeah, they're pretty flashy. In the dataset, each image is represented as a 36 x 36 two-dimensional matrix of pixel darkness values. The 0 value represents a white pixel, while 255 represents a black pixel. Everything in between is a shade of gray. Here's the code to display these fonts on your own machine:

# Let's look at a subplot of one of A in each font f, plts = plt.subplots(5, sharex=True) c = 91 for i in range(5): plts[i].pcolor(train[c + i * 558], cmap=plt.cm.gray_r)

If your plot appears really wide, you can easily resize the window just using your mouse. It's often much more work to resize it ahead of time in Python if you're simply plotting interactively. Our goal is to decide which font an image belongs to, given that we have many other labeled images of the fonts. To expand the dataset and help avoid overfitting, we have also *jittered* each character around in the 36 x 36 area, giving us nine times as many data points.

It may be helpful to come back to this after working with later models. It's important to keep the original data in mind, no matter how advanced the final model is.

If you're familiar with linear regression, you're halfway toward understanding logistic regression. Basically, we're going to assign a weight to each pixel in the image, then take the weighted sum of those pixels (beta for weights and *X* for pixels). This will give us a score for that image being a particular font. Every font will have its own set of weights, as they will value pixels differently. To convert these scores into proper probabilities (represented by *Y*), we will use what's called the `softmax`

function to force their sum to be between 0 and 1, as illustrated next. Whatever probability is the greatest for a particular image, we will classify it into the associated class.

You can read more about the theory of logistic regression in most statistical modeling textbooks. Here is its formula:

One good reference that focuses on applications is William H. Greene's *Econometric Analysis*, *Pearson*, published in the year 2012.

Implementing logistic regression is pretty easy in TensorFlow and will serve as scaffolding for more complex machine learning algorithms. First, we need to convert our integer labels into a *one-hot* format. This means, instead of labeling an image with font class 2, we transform the label into [0, 0, 1, 0, 0]. That is, we stick `1`

in position two (note 0-up counting is common in computer science) and `0`

for every other class. Here's the code for our `to_onehot`

function:

def to_onehot(labels,nclasses = 5): ''' Convert labels to "one-hot" format. >>> a = [0,1,2,3] >>> to_onehot(a,5) array([[ 1., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 1., 0.]]) ''' outlabels = np.zeros((len(labels),nclasses)) for i,l in enumerate(labels): outlabels[i,l] = 1 return outlabels

With this done, we can go ahead and call the function:

onehot = to_onehot(labels)

For the pixels, we don't really want a matrix in this case, so we'll flatten the 36 x 36 numbers into a one-dimensional vector of length 1,296, but this will come a little bit later. Also, recall that we've rescaled the pixel values of 0-255 so that they fall between 0 and 1.

Okay, our final piece of preparation is to split our dataset into training and validation sets. This will help us catch overfitting later on. The training set will help us determine the weights in our logistic regression model, and the validation set will just be used to confirm that those weights are reasonably correct on new data:

# Split data into training and validation indices = np.random.permutation(train.shape[0]) valid_cnt = int(train.shape[0] * 0.1) test_idx, training_idx = indices[:valid_cnt],\ indices[valid_cnt:] test, train = train[test_idx,:],\ train[training_idx,:] onehot_test, onehot_train = onehot[test_idx,:],\ onehot[training_idx,:]

Okay, let's kick off the TensorFlow code by creating an interactive session:

sess = tf.InteractiveSession()

With this, we've started our first model in TensorFlow.

We're going to use a placeholder variable for `x`

, which represents our input images. This is just to tell TensorFlow that we will supply the value for this node via `feed_dict`

later on:

# These will be inputs ## Input pixels, flattened x = tf.placeholder("float", [None, 1296])

Also, note that we can specify the shape of this tensor, and here we have used `None`

as one of the sizes. The `None`

size allows us to send an arbitrary number of data points into the algorithm at once for batch processing. We'll use the variable `y_`

likewise to hold our known labels to be used for training later on:

## Known labels y_ = tf.placeholder("float", [None,5])

To perform logistic regression, we need a set of weights (`W`

). In fact, we need 1,296 weights for each of the five font classes, which will give us our shape. Note that we also want to include an extra weight for each class as a bias (`b`

). This is the same as adding an extra input variable that always takes the value `1`

:

# Variables W = tf.Variable(tf.zeros([1296,5])) b = tf.Variable(tf.zeros([5]))

With all these TensorFlow variables floating around, we need to make sure they get initialized. Let's call them now:

# Just initialize sess.run(tf.global_variables_initializer())

Good job! You've got everything prepared. Now you can implement the `softmax`

formula to compute probabilities. Because we set up our weights and input very carefully, TensorFlow makes this task very easy with just a call to `tf.matmul`

and `tf.nn.softmax`

:

# Define model y = tf.nn.softmax(tf.matmul(x,W) + b)

That's it! You've implemented an entire machine learning classifier in TensorFlow. Nice work. But where do we get the values for the weights? Let's take a look at using TensorFlow to train the model.

First, you'll learn about the loss function for our machine learning classifier and implement it in TensorFlow. Then, we'll quickly train the model by evaluating the right TensorFlow node. Finally, we'll verify that our model is reasonably accurate and the weights make sense.

Optimizing our model really means minimizing how wrong we are. With our labels in *one-hot* style, it's easy to compare these with the class probabilities predicted by the model. The categorical `cross_entropy`

function is a formal way to measure this. While the exact statistics are beyond the scope of this course, you can think of it as punishing the model for more for less accurate predictions. To compute it, we multiply our *one-hot* real labels element-wise with the natural log of the predicted probabilities, then sum these values and negate them. Conveniently, TensorFlow already includes this function as `tf.nn.softmax_cross_entropy_with_logits()`

and we can just call that:

# Climb on cross-entropy cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( logits = y + 1e-50, labels = y_))

Note that we are adding a small error value of `1e-50`

here to avoid numerical instability problems.

TensorFlow is convenient in that it provides built-in optimizers to take advantage of the loss function we just wrote. Gradient descent is a common choice and will slowly nudge our weights toward better results. This is the node that will update our weights:

# How we train train_step = tf.train.GradientDescentOptimizer( 0.02).minimize(cross_entropy)

Before we actually start training, we should specify a few more nodes to assess how well the model does:

# Define accuracy correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast( correct_prediction, "float"))

The `correct_prediction`

node is `1`

if our model assigns the highest probability to the correct class, and `0`

otherwise. The `accuracy`

variable averages these predictions over the available data, giving us an overall sense of how well the model did.

When training in machine learning, we often want to use the same data point multiple times to squeeze all the information out. Each pass through the entire training data is called an **epoch**. Here, we're going to save both the training and validation accuracy every 10 epochs:

# Actually train epochs = 1000 train_acc = np.zeros(epochs//10) test_acc = np.zeros(epochs//10) for i in tqdm(range(epochs)): # Record summary data, and the accuracy if i % 10 == 0: # Check accuracy on train set A = accuracy.eval(feed_dict={ x: train.reshape([-1,1296]), y_: onehot_train}) train_acc[i//10] = A # And now the validation set A = accuracy.eval(feed_dict={ x: test.reshape([-1,1296]), y_: onehot_test}) test_acc[i//10] = A train_step.run(feed_dict={ x: train.reshape([-1,1296]), y_: onehot_train})

Note that we use `feed_dict`

to pass in different types of data to get different output values. Finally, `train_step.run`

updates the model every iteration. This should only take a few minutes on a typical computer, much less if you're using a GPU, and a bit more on an underpowered machine.

You just trained a model with TensorFlow; awesome!

After 1,000 epochs, let's take a look at the model. If you have Matplotlib installed, you can view the accuracies in a graphical plot; if not, you can still look at the number. For the final results, use the following code:

# Notice that accuracy flattens out print(train_acc[-1]) print(test_acc[-1])

If you do have Matplotlib installed, you can use the following code to display the plot:

# Plot the accuracy curves plt.figure(figsize=(6,6)) plt.plot(train_acc,'bo') plt.plot(test_acc,'rx')

You should see something like the following plot (note that we used some random initialization, so it might not be exactly the same):

It seems like the validation accuracy flattens out after about 400-500 iterations; beyond this, our model may either be overfitting or not learning much more. Also, even though the final accuracy of about 40 percent might seem poor, recall that, with five classes, a totally random guess would only have 20 percent accuracy. With this limited dataset, the simple model is doing all it can.

It's also often helpful to look at computed weights. These can give you a clue as to what the model thinks is important. Let's plot them by pixel position for a given class:

# Look at a subplot of the weights for each font f, plts = plt.subplots(5, sharex=True) for i in range(5): plts[i].pcolor(W.eval()[:,i].reshape([36,36]))

This should give you a result similar to the following (again, if the plot comes out very wide, you can squeeze in the window size to square it up):

We can see that the weights near the interior are important in some models, while the weights on the outside are essentially zero. This makes sense, since none of the font characters reach the corners of the images.

Again, note that your final results might look a little different due to random initialization effects. Always feel free to experiment and change the parameters of the model; that's how you'll learn new things.

In this chapter, we installed TensorFlow on a machine we can use. After some small steps with basic computations, we jumped into a machine learning problem, successfully building a decent model with just logistic regression and a few lines of TensorFlow code.

In the next chapter, we'll see TensorFlow in its prime with deep neural networks.