You're reading from Deep Learning with MXNet Cookbook

Product typeBook

Published inDec 2023

Reading LevelBeginner

PublisherPackt

ISBN-139781800569607

Edition1st Edition

Languages

Python

Tools

MXNet

Concepts

Machine Learning

Author (1)

Andrés P. Torres

NumPy and MXNet ND arrays

If you have worked with data previously in Python, chances are you have found yourself working with NumPy and its N-dimensional arrays (ND arrays). These are also known as tensors, and the 0D variants are called scalars, the 1D variants are called vectors, and the 2D variants are called matrixes.

MXNet provides its own ND array type, and there are two different ways to work with them. On one hand, there is the nd module, MXNet’s native and optimized way to work with MXNet ND arrays. On the other hand, there is the np module, which has the same interfaces and syntax as the NumPy ND array type and has also been optimized, but it’s limited due to the interface constraints. With MXNet ND arrays, we can leverage its underlying engine, with compute optimizations such as Intel MKL and/or NVIDIA CUDA, if our hardware configuration is compatible. This means we will be able to use almost the same syntax as when working with NumPy, but accelerated with the MXNet engine and our GPUs, not supported by NumPy.

Moreover, as we will see in the next chapters, a very common operation that we will execute on MXNet is automatic differentiation on these ND arrays. By using MXNet ND array libraries, this operation will also leverage our hardware for optimum performance. NumPy does not provide automatic differentiation out of the box.

Getting ready

If you have already installed MXNet, as described in the previous recipe, in terms of executing accelerated code, the only remaining steps before using MXNet ND arrays is importing their libraries:

import numpy as np
import mxnet as mx

However, it is worth noting here an important underlying difference between NumPy ND array operations and MXNet ND array operations. NumPy follows an eager evaluation strategy – that is, all operations are evaluated at the moment of execution. Conversely, MXNet uses a lazy evaluation strategy, more optimal for large compute loads, where the actual calculation is deferred until the values are actually needed.

Therefore, when comparing performances, we will need to force MXNet to finalize all calculations before computing the time needed for them. As we will see in the examples, this is achieved by calling the wait_to_read() function, Furthermore, when accessing the data with functions such as print() or .asnumpy(), execution is then completed before calling these functions, yielding the wrong impression that these functions are actually time-consuming:

Let’s check a specific example and start by running it on the CPU:

import time
x_mx_cpu = mx.np.random.rand(1000, 1000, ctx = mx.cpu())
start_time = time.time()
mx.np.dot(x_mx_cpu, x_mx_cpu).wait_to_read()
print("Time of the operation: ", time.time() - start_time)

This will yield a similar output to the following:

Time of the operation: 0.04673886299133301

However, let’s see what happens if we measure the time without the call to wait_to_read():

x_mx_cpu = mx.np.random.rand(1000, 1000, ctx = mx.cpu())
start_time = time.time()
x_2 = mx.np.dot(x_mx_cpu, x_mx_cpu)
 print("(FAKE, MXNet has lazy evaluation)")
 print("Time of the operation : ", time.time() - start_time)
 start_time = time.time()
print(x_2)
 print("(FAKE, MXNet has lazy evaluation)")
 print("Time to display: ", time.time() - start_time)

The following will be the output:

(FAKE, MXNet has lazy evaluation)
 Time of the operation : 0.00118255615234375
 [[256.59583 249.70404 249.48639 ... 251.97151 255.06744 255.60669]
 [255.22629 251.69475 245.7591 ... 252.78784 253.18878 247.78052]
 [257.54187 254.29262 251.76346 ... 261.0468 268.49127 258.2312 ]
 ...
 [256.9957 253.9823 249.59073 ... 256.7088 261.14255 253.37457]
 [255.94278 248.73282 248.16641 ... 254.39209 252.4108 249.02774]
 [253.3464 254.55524 250.00716 ... 253.15712 258.53894 255.18658]]
 (FAKE, MXNet has lazy evaluation)
 Time to display: 0.042133331298828125

As we can see, the first experiment indicated that the computation took ~50 ms to complete; however, the second experiment indicated that the computation took ~1 ms (50 times less!), and the visualization was more than 40 ms. This is an incorrect result. This is because we measured our performance incorrectly in the second experiment. Refer to the first experiment and the call to wait_to_read() for a proper performance measurement.

How to do it...

In this section, we will compare performance in terms of computation time for two compute-intensive operations:

Matrix creation
Matrix multiplication

We will compare five different compute profiles for each operation:

Using the NumPy library (no CPU or GPU acceleration)
Using the MXNet np module with CPU acceleration but no GPU
Using the MXNet np module with CPU acceleration and GPU acceleration
Using the MXNet nd module with CPU acceleration but no GPU
Using the MXNet nd module with CPU acceleration and GPU acceleration

To finalize, we will plot the results and draw some conclusions.

Timing data structures

We will store the computation time in five dictionaries, one for each compute profile (timings_np, timings_mx_cpu, and timings_mx_gpu). The initialization of the data structures is as follows:

timings_np = {}
timings_mx_np_cpu = {}
timings_mx_np_gpu = {}
timings_mx_nd_cpu = {}
timings_mx_nd_gpu = {}

We will run each operation (matrix generation and matrix multiplication) with matrixes in a different order, namely the following:

matrix_orders = [1, 5, 10, 50, 100, 500, 1000, 5000, 10000]

Matrix creation

We define three functions to generate matrixes; the first function will use the NumPy library to generate a matrix, and it will receive as an input parameter the matrix order. The second function will use the MXNet np module, and the third function will use the MXNet and module. For the second and third functions, as input parameters we will provide the context where the matrix needs to be created, apart from the matrix order. This context specifies whether the result (the created matrix in this case) must be computed in the CPU or the GPU (and which GPU if there are multiple devices available):

def create_matrix_np(n):
    """
    Given n, creates a squared n x n matrix,
    with each matrix value taken from a random
    uniform distribution between [0, 1].
    Returns the created matrix a.
    Uses NumPy.
    """
    a = np.random.rand(n, n)
    return a
def create_matrix_mx(n, ctx=mx.cpu()):
    """
    Given n, creates a squared n x n matrix,
    with each matrix value taken from a random
    uniform distribution between [0, 1].
    Returns the created matrix a.
    Uses MXNet NumPy syntax and context ctx
    """
    a = mx.np.random.rand(n, n, ctx=ctx)
    a.wait_to_read()
    return a
def create_matrix_mx_nd(n, ctx=mx.cpu()):
    """
    Given n, creates a squared n x n matrix,
    with each matrix value taken from a random
    uniform distribution between [0, 1].
    Returns the created matrix a.
    Uses MXNet ND native syntax and context ctx
    """
    a = mx.nd.random.uniform(shape=(n, n), ctx=ctx)
    a.wait_to_read()
    return a

To store necessary data for our performance comparison later, we use the structures created previously, with the following code:

timings_np["create"] = []
for n in matrix_orders:
    result = %timeit -o create_matrix_np(n)
    timings_np["create"].append(result.best)
timings_mx_np_cpu["create"] = []
for n in matrix_orders:
    result = %timeit -o create_matrix_mx_np(n)
    timings_mx_np_cpu["create"].append(result.best)
timings_mx_np_gpu["create"] = []
ctx = mx.gpu()
for n in matrix_orders:
    result = %timeit -o create_matrix_mx_np(n, ctx)
    timings_mx_np_gpu["create"].append(result.best)
timings_mx_nd_cpu["create"] = []
for n in matrix_orders:
    result = %timeit -o create_matrix_mx_nd(n)
    timings_mx_nd_cpu["create"].append(result.best)
timings_mx_nd_gpu["create"] = []
ctx = mx.gpu()
for n in matrix_orders:
    result = %timeit -o create_matrix_mx_nd(n, ctx)
    timings_mx_nd_gpu["create"].append(result.best)

Matrix multiplication

We define three functions to compute the matrixes multiplication; the first function will use the NumPy library and will receive as input parameters the matrixes to multiply. The second function will use the MXNet np module, and the third function will use the MXNet nd module. For the second and third functions, the same parameters are used. The context where the multiplication will happen is given by the context where the matrixes were created; no parameter needs to be added. Both matrixes need to have been created in the same context, or an error will be triggered:

def multiply_matrix_np(a, b):
    """
    Multiplies 2 squared matrixes a and b
    and returns the result c.
    Uses NumPy.
    """
    #c = np.matmul(a, b)
    c = np.dot(a, b)
    return c
def multiply_matrix_mx_np(a, b):
    """
    Multiplies 2 squared matrixes a and b
    and returns the result c.
    Uses MXNet NumPy syntax.
    """
    c = mx.np.dot(a, b)
    c.wait_to_read()
    return c
def multiply_matrix_mx_nd(a, b):
    """
    Multiplies 2 squared matrixes a and b
    and returns the result c.
    Uses MXNet ND native syntax.
    """
    c = mx.nd.dot(a, b)
    c.wait_to_read()
    return c

To store the necessary data for our performance comparison later, we will use the structures created previously, with the following code:

timings_np["multiply"] = []
for n in matrix_orders:
    a = create_matrix_np(n)
    b = create_matrix_np(n)
    result = %timeit -o multiply_matrix_np(a, b)
    timings_np["multiply"].append(result.best)
timings_mx_np_cpu["multiply"] = []
for n in matrix_orders:
    a = create_matrix_mx_np(n)
    b = create_matrix_mx_np(n)
    result = %timeit -o multiply_matrix_mx_np(a, b)
    timings_mx_np_cpu["multiply"].append(result.best)
timings_mx_np_gpu["multiply"] = []
ctx = mx.gpu()
for n in matrix_orders:
    a = create_matrix_mx_np(n, ctx)
    b = create_matrix_mx_np(n, ctx)
    result = %timeit -o multiply_matrix_mx_np(a, b)
    timings_mx_gpu["multiply"].append(result.best)
timings_mx_nd_cpu["multiply"] = []
for n in matrix_orders:
    a = create_matrix_mx_nd(n)
    b = create_matrix_mx_nd(n)
    result = %timeit -o multiply_matrix_mx_nd(a, b)
    timings_mx_nd_cpu["multiply"].append(result.best)
timings_mx_nd_gpu["multiply"] = []
ctx = mx.gpu()
for n in matrix_orders:
    a = create_matrix_mx_nd(n, ctx)
    b = create_matrix_mx_nd(n, ctx)
    result = %timeit -o multiply_matrix_mx_nd(a, b)
    timings_mx_nd_gpu["multiply"].append(result.best)

Drawing conclusions

The first step before making any assessments is to plot the data we have captured in the previous steps. For this step, we will use the pyplot module from a library called Matplotlib, which will allow us to create charts easily. The following code plots the runtime (in seconds) for the matrix generation and all the matrix orders computed:

import matplotlib.pyplot as plt
fig = plt.figure()
plt.plot(matrix_orders, timings_np["create"], color='red', marker='s')
plt.plot(matrix_orders, timings_mx_np_cpu["create"], color='blue', marker='o')
plt.plot(matrix_orders, timings_mx_np_gpu["create"], color='green', marker='^')
plt.plot(matrix_orders, timings_mx_nd_cpu["create"], color='yellow', marker='p')
plt.plot(matrix_orders, timings_mx_nd_gpu["create"], color='orange', marker='*')
plt.title("Matrix Creation Runtime", fontsize=14)
plt.xlabel("Matrix Order", fontsize=14)
plt.ylabel("Runtime (s)", fontsize=14)
plt.grid(True)
ax = fig.gca()
ax.set_xscale("log")
ax.set_yscale("log")
plt.legend(["NumPy", "MXNet NumPy (CPU)", "MXNet NumPy (GPU)", "MXNet ND (CPU)", "MXNet ND (GPU)"])
plt.show()

Quite similarly as shown in the previous code block, the following code plots the runtime (in seconds) for the matrix multiplication and all the matrix orders computed:

import matplotlib.pyplot as plt
fig = plt.figure()
plt.plot(matrix_orders, timings_np["multiply"], color='red', marker='s')
 plt.plot(matrix_orders, timings_mx_np_cpu["multiply"], color='blue', marker='o')
 plt.plot(matrix_orders, timings_mx_np_gpu["multiply"], color='green', marker='^')
 plt.plot(matrix_orders, timings_mx_nd_cpu["multiply"], color='yellow', marker='p')
 plt.plot(matrix_orders, timings_mx_nd_gpu["multiply"], color='orange', marker='*')
 plt.title("Matrix Multiplication Runtime", fontsize=14)
 plt.xlabel("Matrix Order", fontsize=14)
 plt.ylabel("Runtime (s)", fontsize=14)
 plt.grid(True)
 ax = fig.gca()
ax.set_xscale("log")
ax.set_yscale("log")
plt.legend(["NumPy", "MXNet NumPy (CPU)", "MXNet NumPy (GPU)", "MXNet ND (CPU)", "MXNet ND (GPU)"])
 plt.show()

These are the plots displayed (the results will vary according to the hardware configuration):

Figure 1.5 – Runtimes – a) Matrix creation, and b) Matrix multiplication

Important note

Note that the charts use a logarithmic scale for both axes, horizontal and vertical (the differences are larger than they seem). Furthermore, the actual values depend on the hardware architecture that the computations are run on; therefore, your specific results will vary.

There are several conclusions that can be drawn, both from each individual operation and collectively:

For smaller matrix orders, using NumPy is much faster in both operations. This is because MXNet works in a different memory space, and the amount of time to move the data to this memory space is longer than the actual compute time.
In matrix creation, for larger matrix orders, the difference between NumPy (remember, it’s CPU only) and MXNet with the np module and CPU acceleration is negligible, but with the nd module and CPU, acceleration is ~2x faster. For matrix multiplication, and depending on your hardware, MXNet with CPU acceleration can be ~2x faster (regardless of the module). This is because MXNet uses Intel MKL to optimize CPU computations.
In the ranges that are interesting for deep learning – that is, large computational loads involving matrix orders > 1,000 (which can represent data such as images composed of several megapixels or large language dictionaries), GPUs deliver typical gains of several orders of magnitude (~200x for creation, and ~40x for multiplication, exponentially growing with every increase of matrix order). This is by far the most compelling reason to work with GPUs when running deep learning experiments.
When using the GPU, the MXNet np module is faster than the MXNet nd module in creation (~7x), but the difference is negligible in multiplication. Typically, deep learning algorithms are more similar to multiplications to terms of computational loads, and therefore, a priori, there is no significant advantage in using the np module or the nd module. However, MXNet recommends using the native MXNet nd module (and the author subscribes to this recommendation) because some operations on the np module are not supported by autograd (MXNet’s auto-differentiation module). We will see in the upcoming chapters, when we train neural networks, how the autograd module is used and why it is critical.

How it works...

MXNet provides two optimized modules to work with ND arrays, including one that is an in-place substitute for NumPy. The advantages of operating with MXNet ND arrays are twofold:

MXNet ND array operations support automatic differentiation. As we will see in the following chapters, automatic differentiation is a key feature that allows developers to concentrate on the forward pass of the models, letting the backward pass be automatically derived.
Conversely, operations with MXNet ND arrays are optimized for the underlying hardware, yielding impressive results with GPU acceleration. We computed results for matrix creation and matrix multiplication to validate this conclusion experimentally.

There’s more…

In this recipe, we have barely scratched the surface of MXNet operations with ND arrays. If you want to read more about MXNet and ND arrays, this is the link to the official MXNet API reference: https://mxnet.apache.org/versions/1.0.0/api/python/ndarray/ndarray.html.

Furthermore, a very interesting tutorial can be found in the official MXNet documentation: https://gluon.mxnet.io/chapter01_crashcourse/ndarray.html.

Moreover, we have taken a glimpse at how to measure performance on MXNet. We will revisit this topic in the following chapters; however, a good deep-dive into the topic is given in the official MXNet documentation: https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/performance/backend/profiler.html.

You have been reading a chapter from

Deep Learning with MXNet Cookbook

Published in: Dec 2023Publisher: PacktISBN-13: 9781800569607

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Andrés P. Torres

Andrés P. Torres, is the Head of Perception at Oxa, a global leader in industrial autonomous vehicles, leading the design and development of State-Of The-Art algorithms for autonomous driving. Before, Andrés had a stint as an advisor and Head of AI at an early-stage content generation startup, Maekersuite, where he developed several AI-based algorithms for mobile phones and the web. Prior to this, Andrés was a Software Development Manager at Amazon Prime Air, developing software to optimize operations for autonomous drones.
Read more about Andrés P. Torres

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages