Machine Learning Using TensorFlow Cookbook

By Alexia Audevart , Konrad Banachewicz , Luca Massaron
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Getting Started with TensorFlow 2.x

About this book

The independent recipes in Machine Learning Using TensorFlow Cookbook will teach you how to perform complex data computations and gain valuable insights into your data. You will work through recipes on training models, model evaluation, sentiment analysis, regression analysis, artificial neural networks, and deep learning - each using Google’s machine learning library, TensorFlow.

This cookbook begins by introducing you to the fundamentals of the TensorFlow library, including variables, matrices, and various data sources. You’ll then take a deep dive into some real-world implementations of Keras and TensorFlow and learn how to use estimators to train linear models and boosted trees, both for classification and for regression to provide a baseline for tabular data problems.

As you progress, you’ll explore the practical applications of a variety of deep learning architectures, such as recurrent neural networks and Transformers, and see how they can be applied to computer vision and natural language processing (NLP) problems. Once you are familiar with the TensorFlow ecosystem, the final chapter will teach you how to take a project to production.

By the end of this machine learning book, you will be proficient in using TensorFlow 2. You’ll also understand deep learning from the fundamentals and be able to implement machine learning algorithms in real-world scenarios.

Publication date:
February 2021


Getting Started with TensorFlow 2.x

Google's TensorFlow engine has a unique way of solving problems, allowing us to solve machine learning problems very efficiently. Nowadays, machine learning is used in almost all areas of life and work, with famous applications in computer vision, speech recognition, language translations, healthcare, and many more. We will cover the basic steps to understand how TensorFlow operates and eventually build up to production code techniques later in the pages of this book. For the moment, the fundamentals presented in this chapter are paramount in order to provide you with a core understanding for the recipes found in the rest of this book.

In this chapter, we'll start by covering some basic recipes and helping you to understand how TensorFlow 2.x works. You'll also learn how to access the data used to run the examples in this book, and how to get additional resources. By the end of this chapter, you should have knowledge of the following:

  • Understanding how TensorFlow 2.x works
  • Declaring and using variables and tensors
  • Working with matrices
  • Declaring operations
  • Implementing activation functions
  • Working with data sources
  • Finding additional resources

Without any further ado, let's begin with the first recipe, which presents in an easy fashion the way TensorFlow deals with data and computations.


How TensorFlow works

Started as an internal project by researchers and engineers from the Google Brain team, initially named DistBelief, an open source framework for high performance numerical computations was released in November 2015 under the name TensorFlow (tensors are a generalization of scalars, vectors, matrices, and higher dimensionality matrices). You can read the original paper on the project here: After the appearance of version 1.0 in 2017, last year, Google released TensorFlow 2.0, which continues the development and improvement of TensorFlow by making it more user-friendly and accessible.

Production-oriented and capable of handling different computational architectures (CPUs, GPUs, and now TPUs), TensorFlow is a framework for any kind of computation that requires high performance and easy distribution. It excels at deep learning, making it possible to create everything from shallow networks (neural networks made of a few layers) to complex deep networks for image recognition and natural language processing.

In this book, we're going to present a series of recipes that will help you use TensorFlow for your deep learning projects in a more efficient way, cutting through complexities and helping you achieve both a wider scope of applications and much better results.

At first, computation in TensorFlow may seem needlessly complicated. But there is a reason for it: because of how TensorFlow deals with computation, when you become accustomed to TensorFlow style, developing more complicated algorithms becomes relatively easy. This recipe will guide us through the pseudocode of a TensorFlow algorithm.

Getting ready

Currently, TensorFlow is tested and supported on the following 64-bit systems: Ubuntu 16.04 or later, macOS 10.12.6 (Sierra) or later (no GPU support, though), Raspbian 9.0 or later, and Windows 7 or later. The code for this book has been developed and tested on an Ubuntu system, but it should run fine on any other system as well. The code for the book is available on GitHub at, which acts as the book repository for all the code and some data.

Throughout this book, we'll only concern ourselves with the Python library wrapper of TensorFlow, although most of the original core code for TensorFlow is written in C++. TensorFlow operates nicely with Python, ranging from version 3.7 to 3.8. This book will use Python 3.7 (you can get the plain interpreter at and TensorFlow 2.2.0 (you can find all the necessary instructions to install it at

While TensorFlow can run on the CPU, most algorithms run faster if processed on a GPU, and it is supported on graphics cards with Nvidia Compute Capability 3.5 or higher (preferable when running complex networks that are more computationally intensive).

All the recipes you'll find in the book are compatible with TensorFlow 2.2.0. Where necessary, we'll point out the differences in syntax and execution with the previous 2.1 and 2.0 versions.

Popular GPUs for running scripts based on TensorFlow on a workstation are Nvidia Titan RTX and Nvidia Quadro RTX models, whereas in data centers, we instead commonly find Nvidia Tesla architectures with at least 24 GB of memory (for instance, Google Cloud Platform offers GPU Nvidia Tesla K80, P4, T4, P100 and V100 models). To run properly on a GPU, you will also need to download and install the Nvidia CUDA toolkit, version 5.x+ (

Some of the recipes in this chapter will rely on an installation of the current versions of SciPy, NumPy, and Scikit-learn Python packages. These accompanying packages are also included in the Anaconda package (

How to do it…

Here, we'll introduce the general flow of TensorFlow algorithms. Most recipes will follow this outline:

  1. Import or generate datasets: All of our machine learning algorithms will depend on datasets. In this book, we'll either generate data or use an outside source of datasets. Sometimes, it's better to rely on generated data because we can control how to vary and verify the expected outcome. Most of the time, we will access public datasets for the given recipe. The details on accessing these datasets can be found in the Additional resources recipe at the end of this chapter:
    import tensorflow as tf
    import tensorflow_datasets as tfds
    import numpy as np
    data = tfds.load("iris", split="train")
  2. Transform and normalize data: Generally, input datasets do not come in the exact form we want for what we intend to achieve. TensorFlow expects us to transform the data into the accepted shape and data type. In fact, the data is usually not in the correct dimension or type that our algorithms expect, and we will have to transform it properly before we can use it. Most algorithms also expect normalized data (which implies variables whose mean is zero and whose standard deviation is one) and we will look at how to accomplish this here as well. TensorFlow offers built-in functions that can load your data, split your data into batches, and allow you to transform variables and normalize each batch using simple NumPy functions, including the following:
    for batch in data.batch(batch_size, drop_remainder=True):
        labels = tf.one_hot(batch['label'], 3)
        X = batch['features']
        X = (X - np.mean(X)) / np.std(X) 
  3. Partition the dataset into training, test, and validation sets: We generally want to test our algorithms on different sets that we have trained on. Many algorithms also require hyperparameter tuning, so we set aside a validation set for determining the best set of hyperparameters.
  4. Set algorithm parameters (hyperparameters): Our algorithms usually have a set of parameters that we hold constant throughout the procedure. For example, this could be the number of iterations, the learning rate, or other fixed parameters of our choice. It's considered good practice to initialize these together using global variables, so that the reader or user can easily find them, as follows:
    epochs = 1000 
    batch_size = 32
    input_size = 4
    output_size = 3
    learning_rate = 0.001
  5. Initialize variables: TensorFlow depends on knowing what it can and cannot modify. TensorFlow will modify/adjust the variables (model weights/biases) during optimization to minimize a loss function. To accomplish this, we feed in data through input variables. We need to initialize both variables and placeholders with size and type so that TensorFlow knows what to expect. TensorFlow also needs to know the type of data to expect. For most of this book, we will use float32. TensorFlow also provides float64 and float16 data types. Note that more bytes are used for precision results in slower algorithms, but fewer bytes results in less precision of the resulting algorithm. Refer to the following code for a simple example of how to set up an array of weights and a vector of biases in TensorFlow:
    weights = tf.Variable(tf.random.normal(shape=(input_size, 
    biases  = tf.Variable(tf.random.normal(shape=(output_size,), 
  6. Define the model structure: After we have the data, and have initialized our variables, we have to define the model. This is done by building a computational graph. The model for this example will be a logistic regression model (logit E(Y) = bX + a):
    logits = tf.add(tf.matmul(X, weights), biases) 
  7. Declare the loss functions: After defining the model, we must be able to evaluate the output. This is where we declare the loss function. The loss function is very important as it tells us how far off our predictions are from the actual values. The different types of loss function are explored in greater detail in the Implementing Backpropagation recipe in Chapter 2, The TensorFlow Way. Here, as an example, we implement the cross entropy with logits, which computes softmax cross entropy between logits and labels:
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels, logits)) 
  8. Initialize and train the model: Now that we have everything in place, we need to create an instance of our graph, feed in the data, and let TensorFlow change the variables to predict our training data better. Here is one way to initialize the computational graph and, by means of multiple iterations, converge the weights in the model structure using the SDG optimizer:
    optimizer = tf.optimizers.SGD(learning_rate)
    with tf.GradientTape() as tape:
       logits = tf.add(tf.matmul(X, weights), biases)
       loss = tf.reduce_mean(
          tf.nn.softmax_cross_entropy_with_logits(labels, logits))
    gradients = tape.gradient(loss, [weights, biases])
    optimizer.apply_gradients(zip(gradients, [weights, biases]))
  9. Evaluate the model: Once we've built and trained the model, we should evaluate the model by looking at how well it does with new data through some specified criteria. We evaluate on the training and test set, and these evaluations will allow us to see whether the model is under or overfitting. We will address this in later recipes. In this simple example, we evaluate the final loss and compare the fitted values against the ground truth training ones:
    print(f"final loss is: {loss.numpy():.3f}")
    preds = tf.math.argmax(tf.add(tf.matmul(X, weights), biases), axis=1)
    ground_truth = tf.math.argmax(labels, axis=1)
    for y_true, y_pred in zip(ground_truth.numpy(), preds.numpy()):
        print(f"real label: {y_true} fitted: {y_pred}")
  10. Tune hyperparameters: Most of the time, we will want to go back and change some of the hyperparameters, checking the model's performance based on our tests. We then repeat the previous steps with different hyperparameters and evaluate the model on the validation set.
  11. Deploy/predict new outcomes: It is also a key requirement to know how to make predictions on new and unseen data. We can achieve this easily with TensorFlow with all of our models once we have them trained.

How it works…

In TensorFlow, we have to set up the data, input variables, and model structure before we can tell the program to train and tune its weights to improve predictions. TensorFlow accomplishes this through computational graphs. These computational graphs are directed graphs with no recursion, which allows for computational parallelism.

To do this, we need to create a loss function for TensorFlow to minimize. TensorFlow accomplishes this by modifying the variables in the computational graph. TensorFlow knows how to modify the variables because it keeps track of the computations in the model and automatically computes the variable gradients (how to change each variable) to minimize the loss. Because of this, we can see how easy it can be to make changes and try different data sources.

See also


Declaring variables and tensors

Tensors are the primary data structure that TensorFlow uses to operate on the computational graph. Even if now, in TensorFlow 2.x, this aspect is hidden, the data flow graph is still operating behind the scenes. This means that the logic of building a neural network doesn't change all that much between TensorFlow 1.x and TensorFlow 2.x. The most eye-catching aspect is that you no longer have to deal with placeholders, the previous entry gates for data in a TensorFlow 1.x graph.

Now, you simply declare tensors as variables and proceed to building your graph.

A tensor is a mathematical term that refers to generalized vectors or matrices. If vectors are one-dimensional and matrices are two-dimensional, a tensor is n-dimensional (where n could be 1, 2, or even larger).

We can declare these tensors as variables and use them for our computations. To do this, first, we must learn how to create tensors.

Getting ready

When we create a tensor and declare it as a variable, TensorFlow creates several graph structures in our computation graph. It is also important to point out that just by creating a tensor, TensorFlow is not adding anything to the computational graph. TensorFlow does this only after running an operation to initialize the variables. See the next section, on variables and placeholders, for more information.

How to do it…

Here, we will cover the four main ways in which we can create tensors in TensorFlow.

We will not be unnecessarily exhaustive in this recipe or others. We will tend to illustrate only the mandatory parameters of the different API calls, unless you might find it interesting for the recipe to cover any optional parameter; when that happens, we'll justify the reasoning behind it.

  1. Fixed size tensors:
    • In the following code, we create a zero-filled tensor:
    row_dim, col_dim = 3, 3
    zero_tsr = tf.zeros(shape=[row_dim, col_dim], dtype=tf.float32) 
    • In the following code, we create a one-filled tensor:
    ones_tsr = tf.ones([row_dim, col_dim]) 
    • In the following code, we create a constant-filled tensor:
    filled_tsr = tf.fill([row_dim, col_dim], 42) 
    • In the following code, we create a tensor out of an existing constant:
    constant_tsr = tf.constant([1,2,3])

    Note that the tf.constant() function can be used to broadcast a value into an array, mimicking the behavior of tf.fill() by writing tf.constant(42, [row_dim, col_dim]).

  2. Tensors of similar shape: We can also initialize variables based on the shape of other tensors, as follows:
    zeros_similar = tf.zeros_like(constant_tsr) 
    ones_similar = tf.ones_like(constant_tsr) 

    Note that since these tensors depend on prior tensors, we must initialize them in order. Attempting to initialize the tensors in a random order will result in an error.

  3. Sequence tensors: In TensorFlow, all parameters are documented as tensors. Even when scalars are required, the API mentions these as zero-dimensional scalars. It won't therefore be a surprise that TensorFlow allows us to specify tensors that contain defined intervals. The following functions behave very similarly to NumPy's linspace() outputs and range() outputs (for reference: See the following function:
    linear_tsr = tf.linspace(start=0.0, stop=1.0, num=3)

    Note that the start and stop parameters should be float values, and that num should be an integer.

    The resultant tensor has a sequence of [0.0, 0.5, 1.0] (the print(linear_tsr command will provide the necessary output). Note that this function includes the specified stop value. See the following tf.range function for comparison:

    integer_seq_tsr = tf.range(start=6, limit=15, delta=3) 

    The result is the sequence [6, 9, 12]. Note that this function does not include the limit value and it can operate with both integer and float values for the start and limit parameters.

  4. Random tensors: The following generated random numbers are from a uniform distribution:
    randunif_tsr = tf.random.uniform([row_dim, col_dim], 
                                     minval=0, maxval=1) 

Note that this random uniform distribution draws from the interval that includes minval but not maxval (minval <= x < maxval). Therefore, in this case, the output range is [0, 1). If, instead, you need to draw only integers and not floats, just add the dtype=tf.int32 parameter when calling the function.

To get a tensor with random draws from a normal distribution, you can run the following code:

randnorm_tsr = tf.random.normal([row_dim, col_dim], 
                                 mean=0.0, stddev=1.0) 

There are also times where we want to generate normal random values that are assured within certain bounds. The truncated_normal() function always picks normal values within two standard deviations of the specified mean:

runcnorm_tsr = tf.random.truncated_normal([row_dim, col_dim], 
                                          mean=0.0, stddev=1.0) 

We might also be interested in randomizing entries of arrays. To accomplish this, two functions can help us: random.shuffle()and image.random_crop(). The following code performs this:

shuffled_output = tf.random.shuffle(input_tensor) 
cropped_output = tf.image.random_crop(input_tensor, crop_size) 

Later on in this book, we'll be interested in randomly cropping images of size (height, width, 3) where there are three-color spectrums. To fix a dimension in cropped_output, you must give it the maximum size in that dimension:

height, width = (64, 64)
my_image = tf.random.uniform([height, width, 3], minval=0,
         maxval=255, dtype=tf.int32)
cropped_image = tf.image.random_crop(my_image, 
       [height//2, width//2, 3]) 

This code snippet will generate random noise images that will be cropped, halving both the height and width, but the depth dimension will be untouched because you fixed its maximum value as a parameter.

How it works…

Once we have decided how to create the tensors, we may also create the corresponding variables by wrapping the tensor in the Variable() function, as follows:

my_var = tf.Variable(tf.zeros([row_dim, col_dim])) 

There's more on this in the following recipes.

There's more…

We are not limited to the built-in functions: we can convert any NumPy array into a Python list, or a constant into a tensor using the convert_to_tensor() function. Note that this function also accepts tensors as an input in case we wish to generalize a computation inside a function.


Using eager execution

When developing deep and complex neural networks, you need to continuously experiment with architectures and data. This proved difficult in TensorFlow 1.0 because you always need to run your code from the beginning to end in order to check whether it worked. TensorFlow 2.x works in eager execution mode as default, which means that you develop and check your code step by step as you progress into your project. This is great news; now we just have to understand how to experiment with eager execution, so we can use this TensorFlow 2.x feature to our advantage. This recipe will provide you with the basics to get started.

Getting ready

TensorFlow 1.x performed optimally because it executed its computations after compiling a static computational graph. All computations were distributed and connected into a graph as you compiled your network and that graph helped TensorFlow to execute computations, leveraging the available resources (multi-core CPUs of multiple GPUs) in the best way, and splitting operations between the resources in the most timely and efficient way. That also meant, in any case, that once you defined and compiled your graph, you could not change it at runtime but had to instantiate it from scratch, thereby incurring some extra work.

In TensorFlow 2.x, you can still define your network, compile it, and run it optimally, but the team of TensorFlow developers has now favored, by default, a more experimental approach, allowing immediate evaluation of operations, thus making it easier to debug and to try network variations. This is called eager execution. Operations now return concrete values instead of pointers to parts of a computational graph to be built later. More importantly, you can now have all the functionality of the host language available while your model is executing, making it easier to write more complex and sophisticated deep learning solutions.

How to do it…

You basically don't have to do anything; eager execution is the default way of operating in TensorFlow 2.x. When you import TensorFlow and start using its functions, you operate in eager execution since you can perform checks when executing:


That's all you need to do.

How it works…

Just run TensorFlow operations and the results will return immediately:

x = [[2.]]
m = tf.matmul(x, x)
print("the result is {}".format(m))
the result is [[4.]]

That's all there is to it!

There's more…

As TensorFlow is now set on eager execution as default, you won't be surprised to hear that tf.Session has been removed from the TensorFlow API. You no longer need to build a computational graph before running a computation; all you have to do now is build your network and test it along the way. This opens the road to common software best practices, such as documenting the code, using object-oriented programming when scripting your code, and organizing it into reusable self-contained modules.


Working with matrices

Understanding how TensorFlow works with matrices is very important when developing the flow of data through computational graphs. In this recipe, we will cover the creation of matrices and the basic operations that can be performed on them with TensorFlow.

It is worth emphasizing the importance of matrices in machine learning (and mathematics in general): machine learning algorithms are computationally expressed as matrix operations. Knowing how to perform matrix computations is a plus when working with TensorFlow, though you may not need it often; its high-end module, Keras, can deal with most of the matrix algebra stuff behind the scenes (more on Keras in Chapter 3, Keras).

This book does not cover the mathematical background on matrix properties and matrix algebra (linear algebra), so the unfamiliar reader is strongly encouraged to learn enough about matrices to be comfortable with matrix algebra. In the See also section, you can find a couple of resources to help you to revise your calculus skills or build them from scratch, and get even more out of TensorFlow.

Getting ready

Many algorithms depend on matrix operations. TensorFlow gives us easy-to-use operations to perform such matrix calculations. You just need to import TensorFlow and follow this section to the end; if you're not a matrix algebra expert, please first have a look at the See also section of this recipe for resources to help you to get the most out of the following recipe.

How to do it…

We proceed as follows:

  1. Creating matrices: We can create two-dimensional matrices from NumPy arrays or nested lists, as described in the Declaring and using variables and tensors recipe at the beginning of this chapter. We can use the tensor creation functions and specify a two-dimensional shape for functions such as zeros(), ones(), and truncated_normal(). TensorFlow also allows us to create a diagonal matrix from a one-dimensional array or list using the diag() function, as follows:
    identity_matrix = tf.linalg.diag([1.0, 1.0, 1.0]) 
    A = tf.random.truncated_normal([2, 3]) 
    B = tf.fill([2,3], 5.0) 
    C = tf.random.uniform([3,2]) 
    D = tf.convert_to_tensor(np.array([[1., 2., 3.],
                                       [-3., -7., -1.],
                                       [0., 5., -2.]]), 
    [[ 1.  0.  0.] 
     [ 0.  1.  0.] 
     [ 0.  0.  1.]] 
    [[ 0.96751703  0.11397751 -0.3438891 ] 
     [-0.10132604 -0.8432678   0.29810596]] 
    [[ 5.  5.  5.] 
     [ 5.  5.  5.]] 
    [[ 0.33184157  0.08907614] 
     [ 0.53189191  0.67605299] 
     [ 0.95889051 0.67061249]] 

    Please note that the C tensor is created in a random way, and it will probably differ in your session from what is represented in this book.

    [[ 1.  2.  3.] 
     [-3. -7. -1.] 
     [ 0.  5. -2.]] 
  2. Addition, subtraction, and multiplication: To add, subtract, or multiply matrices of the same dimension, TensorFlow uses the following function:
    [[ 4.61596632  5.39771316  4.4325695 ] 
     [ 3.26702736  5.14477345  4.98265553]] 
    [[ 0.  0.  0.] 
     [ 0.  0.  0.]] 
    print(tf.matmul(B, identity_matrix)) 
    [[ 5.  5.  5.] 
     [ 5.  5.  5.]] 

    It is important to note that the matmul() function has arguments that specify whether or not to transpose the arguments before multiplication (the Boolean parameters, transpose_a and transpose_b), or whether each matrix is sparse (a_is_sparse and b_is_sparse).

    If, instead, you need element-wise multiplication between two matrices of the same shape and type (this is very important or you will get an error), you just use the tf.multiply function:

    print(tf.multiply(D, identity_matrix))
    [[ 1.  0.  0.] 
     [-0. -7. -0.] 
     [ 0.  0. -2.]] 

    Note that matrix division is not explicitly defined. While many define matrix division as multiplying by the inverse, it is fundamentally different from real-numbered division.

  3. The transpose: Transpose a matrix (flip the columns and rows) as follows:
    [[0.33184157 0.53189191 0.95889051]
     [0.08907614 0.67605299 0.67061249]]

    Again, it is worth mentioning that reinitializing gives us different values than before.

  4. Determinant: To calculate the determinant, use the following code:
  5. Inverse: To find the inverse of a square matrix, see the following:
    [[-0.5        -0.5        -0.5       ] 
     [ 0.15789474  0.05263158  0.21052632] 
     [ 0.39473684  0.13157895  0.02631579]] 

    The inverse method is based on Cholesky decomposition only if the matrix is symmetric positive definite. If the matrix is not symmetric positive definite, then it is based on LU decomposition.

  6. Decompositions: For Cholesky decomposition, use the following code:
    [[ 1.  0.  1.] 
     [ 0.  1.  0.] 
     [ 0.  0.  1.]] 
  7. Eigenvalues and eigenvectors: For eigenvalues and eigenvectors, use the following code:
    [[-10.65907521  -0.22750691   2.88658212] 
     [  0.21749542   0.63250104  -0.74339638] 
     [  0.84526515   0.2587998    0.46749277] 
     [ -0.4880805    0.73004459   0.47834331]] 

Note that the tf.linalg.eigh() function outputs two tensors: in the first, you find the eigenvalues and, in the second tensor, you have the eigenvectors. In mathematics, such an operation is known as the eigendecomposition of a matrix.

How it works…

TensorFlow provides all the tools for us to get started with numerical computations and adding these computations to our neural networks.

See also

If you need to build your calculus skills quickly and understand more about TensorFlow operations, we suggest the following resources:

  • The free book Mathematics for Machine Learning, which can be found here: This contains everything you need to know if you want to operate successfully with machine learning in general.
  • For an even more accessible source, watch the lessons about vectors and matrices from the Kahn Academy ( to get to work with the most basic data elements of a neural network.

Declaring operations

Apart from matrix operations, there are hosts of other TensorFlow operations we must at least be aware of. This recipe will provide you with a quick and essential glance at what you really need to know.

Getting ready

Besides the standard arithmetic operations, TensorFlow provides us with more operations that we should be aware of. We should acknowledge them and learn how to use them before proceeding. Again, we just import TensorFlow:

import tensorflow as tf

Now we're ready to run the code to be found in the following section.

How to do it…

TensorFlow has the standard operations on tensors, that is, add(), subtract(), multiply(), and division() in its math module. Note that all of the operations in this section will evaluate the inputs elementwise, unless specified otherwise:

  1. TensorFlow provides some variations of division() and the relevant functions.
  2. It is worth mentioning that division() returns the same type as the inputs. This means that it really returns the floor of the division (akin to Python 2) if the inputs are integers. To return the Python 3 version, which casts integers into floats before dividing and always returns a float, TensorFlow provides the truediv() function, as follows:
    print(tf.math.divide(3, 4))
    print(tf.math.truediv(3, 4)) 
    tf.Tensor(0.75, shape=(), dtype=float64) 
  3. If we have floats and want integer division, we can use the floordiv() function. Note that this will still return a float, but it will be rounded down to the nearest integer. This function is as follows:
    tf.Tensor(0.0, shape=(), dtype=float32) 
  4. Another important function is mod(). This function returns the remainder after division. It is as follows:
    print(tf.math.mod(22.0, 5.0))
    tf.Tensor(2.0, shape=(), dtype=float32) 
  5. The cross product between two tensors is achieved by the cross() function. Remember that the cross product is only defined for two three-dimensional vectors, so it only accepts two three-dimensional tensors. The following code illustrates this use:
    print(tf.linalg.cross([1., 0., 0.], [0., 1., 0.]))
    tf.Tensor([0. 0. 1.], shape=(3,), dtype=float32) 
  6. Here's a compact list of the more common math functions. All of these functions operate elementwise:




    Absolute value of one input tensor


    Ceiling function of one input tensor


    Cosine function of one input tensor


    Base e exponential of one input tensor


    Floor function of one input tensor


    Multiplicative inverse (1/x) of one input tensor


    Natural logarithm of one input tensor


    Elementwise maximum of two tensors


    Elementwise minimum of two tensors


    Negative of one input tensor


    The first tensor raised to the second tensor elementwise


    Rounds one input tensor


    The reciprocal of the square root of one tensor


    Returns -1, 0, or 1, depending on the sign of the tensor


    Sine function of one input tensor


    Square root of one input tensor


    Square of one input tensor

  7. Specialty mathematical functions: There are some special math functions that are often used in machine learning that are worth mentioning, and TensorFlow has built-in functions for them. Again, these functions operate elementwise, unless specified otherwise:


Psi function, the derivative of the lgamma() function


Gaussian error function, element-wise, of one tensor


Complementary error function of one tensor


Lower regularized incomplete gamma function


Upper regularized incomplete gamma function


Natural logarithm of the absolute value of the beta function


Natural logarithm of the absolute value of the gamma function


Computes the square of the differences between two tensors

How it works…

It is important to know which functions are available to us so that we can add them to our computational graphs. We will mainly be concerned with the preceding functions. We can also generate many different custom functions as compositions of the preceding, as follows:

# Tangent function (tan(pi/4)=1) 
def pi_tan(x):
    return tf.tan(3.1416/x)
tf.Tensor(1.0000036, shape=(), dtype=float32) 

The complex layers that constitute a deep neural network are just composed of the preceding functions, so now, thanks to this recipe, you have all the basics you need to create anything you want.

There's more…

If we wish to add other operations to our graphs that are not listed here, we must create our own from the preceding functions. Here is an example of an operation that wasn't used previously that we can add to our graph. We can add a custom polynomial function, 3 * x^2 - x + 10, using the following code:

def custom_polynomial(value): 
    return tf.math.subtract(3 * tf.math.square(value), value) + 10
tf.Tensor(362, shape=(), dtype=int32) 

There's no limit to the custom functions you can create now, though I always recommend that you first consult the TensorFlow documentation. Often, you don't need to reinvent the wheel; you can find that what you need has already been coded.


Implementing activation functions

Activation functions are the key for neural networks to approximate non-linear outputs and adapt to non-linear features. They introduce non-linear operations into neural networks. If we're careful as to which activation functions are selected and where we put them, they're very powerful operations that we can tell TensorFlow to fit and optimize.

Getting ready

When we start to use neural networks, we'll use activation functions regularly because activation functions are an essential part of any neural network. The goal of an activation function is just to adjust weight and bias. In TensorFlow, activation functions are non-linear operations that act on tensors. They are functions that operate in a similar way to the previous mathematical operations. Activation functions serve many purposes, but the main concept is that they introduce a non-linearity into the graph while normalizing the outputs.

How to do it…

The activation functions live in the neural network (nn) library in TensorFlow. Besides using built-in activation functions, we can also design our own using TensorFlow operations. We can import the predefined activation functions (from tensorflow import nn) or be explicit and write nn in our function calls. Here, we'll choose to be explicit with each function call:

  1. The rectified linear unit, known as ReLU, is the most common and basic way to introduce non-linearity into neural networks. This function is just called max(0,x). It is continuous, but not smooth. It appears as follows:
    print(tf.nn.relu([-3., 3., 10.]))
    tf.Tensor([ 0.  3. 10.], shape=(3,), dtype=float32) 
  2. There are times where we'll want to cap the linearly increasing part of the preceding ReLU activation function. We can do this by nesting the max(0,x) function in a min() function. The implementation that TensorFlow has is called the ReLU6 function. This is defined as min(max(0,x),6). This is a version of the hard-sigmoid function, is computationally faster, and does not suffer from vanishing (infinitesimally near zero) or exploding values. This will come in handy when we discuss deeper neural networks in later chapters on convolutional neural networks and recurrent ones. It appears as follows:
    print(tf.nn.relu6([-3., 3., 10.]))
    tf.Tensor([ 0.  3. 6.], shape=(3,), dtype=float32)
  3. The sigmoid function is the most common continuous and smooth activation function. It is also called a logistic function and has the form 1 / (1 + exp(-x)). The sigmoid function is not used very often because of its tendency to zero-out the backpropagation terms during training. It appears as follows:
    print(tf.nn.sigmoid([-1., 0., 1.]))
    tf.Tensor([0.26894143 0.5 0.7310586 ], shape=(3,), dtype=float32) 

    We should be aware that some activation functions, such as the sigmoid, are not zero-centered. This will require us to zero-mean data prior to using it in most computational graph algorithms.

  4. Another smooth activation function is the hyper tangent. The hyper tangent function is very similar to the sigmoid except that instead of having a range between 0 and 1, it has a range between -1 and 1. This function has the form of the ratio of the hyperbolic sine over the hyperbolic cosine. Another way to write this is as follows:
    ((exp(x) – exp(-x))/(exp(x) + exp(-x)) 

    This activation function is as follows:

    print(tf.nn.tanh([-1., 0., 1.]))
    tf.Tensor([-0.7615942  0. 0.7615942], shape=(3,), dtype=float32) 
  5. The softsign function is also used as an activation function. The form of this function is x/(|x| + 1). The softsign function is supposed to be a continuous (but not smooth) approximation to the sign function. See the following code:
    print(tf.nn.softsign([-1., 0., -1.]))
    tf.Tensor([-0.5  0.  -0.5], shape=(3,), dtype=float32) 
  6. Another function, the softplus function, is a smooth version of the ReLU function. The form of this function is log(exp(x) + 1). It appears as follows:
    print(tf.nn.softplus([-1., 0., -1.]))
    tf.Tensor([0.31326166 0.6931472  0.31326166], shape=(3,), dtype=float32) 

    The softplus function goes to infinity as the input increases, whereas the softsign function goes to 1. As the input gets smaller, however, the softplus function approaches zero and the softsign function goes to -1.

  7. The Exponential Linear Unit (ELU) is very similar to the softplus function except that the bottom asymptote is -1 instead of 0. The form is (exp(x) + 1) if x < 0, else x. It appears as follows:
    print(tf.nn.elu([-1., 0., -1.])) 
    tf.Tensor([-0.63212055  0. -0.63212055], shape=(3,), dtype=float32) 
  8. Now, from this recipe, you should understand the basic key activations. Our list of the existing activation functions is not exhaustive, and you may discover that for certain problems, you need to try some of the lesser known among them. Apart from the activations from this recipe, you can find even more activations on the Keras activation pages:

How it works…

These activation functions are ways that we can introduce non-linearity in neural networks or other computational graphs in the future. It is important to note where in our network we are using activation functions. If the activation function has a range between 0 and 1 (sigmoid), then the computational graph can only output values between 0 and 1. If the activation functions are inside and hidden between nodes, then we want to be aware of the effect that the range can have on our tensors as we pass them through. If our tensors were scaled to have a mean of zero, we will want to use an activation function that preserves as much variance as possible around zero.

This would imply that we want to choose an activation function such as the hyperbolic tangent (tanh) or the softsign. If the tensors were all scaled to be positive, then we would ideally choose an activation function that preserves variance in the positive domain.

There's more…

We can even easily create custom activations such as the Swish, which is x*sigmoid(x) (see Swish: a Self-Gated Activation Function, Ramachandran et al., 2017,, which can be used as a more performing replacement for ReLU activations in image and tabular data problems:

def swish(x):
    return x * tf.nn.sigmoid(x)
print(swish([-1., 0., 1.]))
tf.Tensor([-0.26894143  0.  0.7310586 ], shape=(3,), dtype=float32)

After having tried the activations proposed by TensorFlow, your next natural step will be to replicate the ones you find on deep learning papers or that you create by yourself.


Working with data sources

For most of this book, we will rely on the use of datasets to fit machine learning algorithms. This section has instructions on how to access each of these datasets through TensorFlow and Python.

Some of the data sources rely on the maintenance of outside websites so that you can access the data. If these websites change or remove this data, then some of the following code in this section may need to be updated. You can find the updated code on this book's GitHub page:

Getting ready

Throughout the book, the majority of the datasets that we will be using are accessible using TensorFlow Datasets, whereas some others will require some extra effort by using a Python script to download, or by manually downloading them through the internet.

TensorFlow Datasets (TFDS) is a collection of datasets ready to use (you can find the complete list here: It automatically handles downloading and preparation of the data and, being a wrapper around, constructs efficient and fast data pipelines.

In order to install TFDS, just run the following installation command on your console:

pip install tensorflow-datasets

We can now move on to explore the core datasets that you will be using in this book (not all of these datasets are included here, just the most common ones. Some other very specific datasets will be introduced in different chapters throughout the book).

How to do it…

  1. Iris data: This dataset is arguably the classic structured dataset used in machine learning and perhaps in all examples of statistics. It is a dataset that measures sepal length, sepal width, petal length, and petal width of three different types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. There are 150 measurements in total, which means that there are 50 measurements for each species. To load the dataset in Python, we will use TFDS functions, as follows:
    import tensorflow_datasets as tfds
    iris = tfds.load('iris', split='train')

    When you are importing a dataset for the first time, a bar will point out where you are as you download the dataset. If you prefer, you can deactivate it if you type the following:


  2. Birth weight data: This data was originally from Baystate Medical Center, Springfield, Mass, 1986. This dataset contains measurements including childbirth weight and other demographic and medical measurements of the mother and the family history. There are 189 observations of eleven variables. The following code shows you how you can access this data as
    import tensorflow_datasets as tfds
    birthdata_url = '' 
    path = tf.keras.utils.get_file(birthdata_url.split("/")[-1], birthdata_url)
    def map_line(x):
        return tf.strings.to_number(tf.strings.split(x))
    birth_file = (
                  .skip(1)     # Skip first header line
  3. Boston housing data: Carnegie Mellon University maintains a library of datasets in their StatLib Library. This data is easily accessible via The University of California at Irvine's machine learning repository ( There are 506 observations of house worth, along with various demographic data and housing attributes (14 variables). The following code shows you how to access this data in TensorFlow:
    import tensorflow_datasets as tfds
    housing_url = ''
    path = tf.keras.utils.get_file(housing_url.split("/")[-1], housing_url)
    def map_line(x):
        return tf.strings.to_number(tf.strings.split(x))
    housing = (
  4. MNIST handwriting data: The Mixed National Institute of Standards and Technology (MNIST) dataset is a subset of the larger NIST handwriting database. The MNIST handwriting dataset is hosted on Yann LeCun's website ( It is a database of 70,000 images of single-digit numbers (0-9), with about 60,000 annotated for a training set and 10,000 for a test set. This dataset is used so often in image recognition that TensorFlow provides built-in functions to access this data. In machine learning, it is also important to provide validation data to prevent overfitting (target leakage). Because of this, TensorFlow sets aside 5,000 images of the training set in a validation set. The following code shows you how to access this data in TensorFlow:
    import tensorflow_datasets as tfds
    mnist = tfds.load('mnist', split=None)
    mnist_train = mnist['train']
    mnist_test = mnist['test']
  5. Spam-ham text data. UCI's machine learning dataset library also holds a spam-ham text message dataset. We can access this .zip file and get the spam-ham text data as follows:
    import tensorflow_datasets as tfds
    zip_url = ''
    path = tf.keras.utils.get_file(zip_url.split("/")[-1], zip_url, extract=True)
    path = path.replace("", "SMSSpamCollection")
    def split_text(x):
        return tf.strings.split(x, sep='\t')
    text_data = (
  6. Movie review data: Bo Pang from Cornell has released a movie review dataset that classifies reviews as good or bad. You can find the data on the Cornell University website: To download, extract, and transform this data, we can run the following code:
    import tensorflow_datasets as tfds
    movie_data_url = ''
    path = tf.keras.utils.get_file(movie_data_url.split("/")[-1], movie_data_url, extract=True)
    path = path.replace('.tar.gz', '')
    with open(path+filename, 'r', encoding='utf-8', errors='ignore') as movie_file:
        for response, filename in enumerate(['\\rt-polarity.neg', '\\rt-polarity.pos']):
            with open(path+filename, 'r') as movie_file:
                for line in movie_file:
                    review_file.write(str(response) + '\t' + line.encode('utf-8').decode())
    def split_text(x):
        return tf.strings.split(x, sep='\t')
    movies = (
  7. CIFAR-10 image data: The Canadian Institute for Advanced Research has released an image set that contains 80 million labeled colored images (each image is scaled to 32 x 32 pixels). There are 10 different target classes (airplane, automobile, bird, and so on). CIFAR-10 is a subset that includes 60,000 images. There are 50,000 images in the training set, and 10,000 in the test set. Since we will be using this dataset in multiple ways, and because it is one of our larger datasets, we will not run a script each time we need it. To get this dataset, just execute the following code to download the CIFAR-10 dataset (this may take a long time):
    import tensorflow_datasets as tfds
    ds, info = tfds.load('cifar10', shuffle_files=True, with_info=True)
    cifar_train = ds['train']
    cifar_test = ds['test'] 
  8. The works of Shakespeare text data: Project Gutenberg is a project that releases electronic versions of free books. They have compiled all of the works of Shakespeare together. The following code shows you how to access this text file through TensorFlow:
    import tensorflow_datasets as tfds
    shakespeare_url = ''
    path = tf.keras.utils.get_file(shakespeare_url.split("/")[-1], shakespeare_url)
    def split_text(x):
        return tf.strings.split(x, sep='\n')
    shakespeare_text = (
  9. English-German sentence translation data: The Tatoeba project ( collects sentence translations in many languages. Their data has been released under the Creative Commons license. From this data, ( has compiled sentence-to-sentence translations in text files that are available for download. Here, we will use the English-German translation file, but you can change the URL to whichever languages you would like to use:
    import os
    import pandas as pd
    from zipfile import ZipFile
    from urllib.request import urlopen, Request
    import tensorflow_datasets as tfds
    sentence_url = ''
    r = Request(sentence_url, headers={'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/'})
    b2 = [z for z in sentence_url.split('/') if '.zip' in z][0] #gets just the '.zip' part of the url
    with open(b2, "wb") as target:
        target.write(urlopen(r).read()) #saves to file to disk
    with ZipFile(b2) as z:
        deu = [line.split('\t')[:2] for line in'deu.txt').read().decode().split('\n')]
    os.remove(b2) #removes the zip file
    # saving to disk prepared en-de sentence file
    with open("deu.txt", "wb") as deu_file:
        for line in deu:
            data = ",".join(line)+'\n'
    def split_text(x):
        return tf.strings.split(x, sep=',')
    text_data = (

With this last dataset, we have completed our review of the datasets that you will most frequently encounter when using the recipes you will find in this book. At the start of each recipe, we'll remind you how to download the relevant dataset and explain why it is relevant for the recipe in question.

How it works…

When it comes to using one of these datasets in a recipe, we'll refer you to this section and assume that the data is loaded in the ways we've just described. If further data transformation or preprocessing is necessary, then that code will be provided in the recipe itself.

Usually, the approach will simply be as follows when we use data from TensorFlow datasets:

import tensorflow_datasets as tfds
dataset_name = "..."
data = tfds.load(dataset_name, split=None)
train = data['train']
test = data['test']

In any case, depending on the location of the data, it may turn out to be necessary to download it, extract it, and transform it.

See also

Here are some additional references for the data resources we use in this book:


Additional resources

In this section, you will find additional links, documentation sources, and tutorials that will be of great assistance when learning and using TensorFlow.

Getting ready

When learning how to use TensorFlow, it helps to know where to turn for assistance or pointers. This section lists some resources to get TensorFlow running and to troubleshoot problems.

How to do it…

Here is a list of TensorFlow resources:

  • The code for this book is available online at the Packt repository:
  • The official TensorFlow Python API documentation is located at Here, there is documentation and examples of all of the functions, objects, and methods in TensorFlow.
  • TensorFlow's official tutorials are very thorough and detailed. They are located at They start covering image recognition models, and work through Word2Vec, RNN models, and sequence-to-sequence models. They also have additional tutorials for generating fractals and solving PDE systems. Note that they are continually adding more tutorials and examples to this collection.
  • TensorFlow's official GitHub repository is available via Here, you can view the open source code and even fork or clone the most current version of the code if you want. You can also see current filed issues if you navigate to the issues directory.
  • A public Docker container that is kept up to date by TensorFlow is available on Dockerhub at
  • A great source for community help is Stack Overflow. There is a tag for TensorFlow. This tag seems to be growing in interest as TensorFlow is gaining in popularity. To view activity on this tag, visit
  • While TensorFlow is very agile and can be used for many things, the most common use of TensorFlow is deep learning. To understand the basis of deep learning, how the underlying mathematics works, and to develop more intuition on deep learning, Google has created an online course that's available on Udacity. To sign up and take this video lecture course, visit
  • TensorFlow has also made a site where you can visually explore training a neural network while changing the parameters and datasets. Visit to explore how different settings affect the training of neural networks.
  • Andrew Ng teaches an online course called Neural Networks and Deep Learning :
  • Stanford University has an online syllabus and detailed course notes for Convolutional Neural Networks for Visual Recognition:

About the Authors

  • Alexia Audevart

    Alexia Audevart, also a Google Developer Expert in machine learning, is the founder of datactik. She is a data scientist and helps her clients solve business problems by making their applications smarter. Her first book is a collaboration on artificial intelligence and neuroscience.

    Browse publications by this author
  • Konrad Banachewicz

    Konrad Banachewicz holds a PhD in statistics from Vrije Universiteit Amsterdam. He is a lead data scientist at eBay and a Kaggle Grandmaster. He worked in a variety of financial institutions on a wide array of quantitative data analysis problems. In the process, he became an expert on the entire lifetime of a data product cycle.

    Browse publications by this author
  • Luca Massaron

    Luca Massaron is a Google Developer Expert in machine learning with more than a decade of experience in data science. He is also the author of several best-selling books on AI and a Kaggle master who reached number 7 for his performance in data science competitions.

    Browse publications by this author
Book Title
Access this book, plus 7,500 other titles for FREE
Access now