# Introduction to Deep Learning in Go

This book will very quickly jump into the practicalities of implementing **Deep Neural Networks** (**DNNs**) in Go. Simply put, this book's title contains its aim. This means there will be a lot of technical detail, a lot of code, and (not too much) math. By the time you finally close this book or turn off your Kindle, you'll know how (and why) to implement modern, scalable DNNs and be able to repurpose them for your needs in whatever industry or mad science project you're involved in.

Our choice of Go reflects the maturing of the landscape of Go libraries built for the kinds of operations our DNNs perform. There is, of course, much debate about the trade-offs made when selecting languages or libraries, and we will devote a section of this chapter to our views and argue for the choices we've made.

However, what is code without context? Why do we care about this seemingly convoluted mix of linear algebra, calculus, statistics, and probability? Why use computers to recognize things in images or identify aberrant patterns in financial data? And, perhaps most importantly, what do the approaches to these tasks have in common? The initial sections of this book will try to provide some of this context.

Scientific endeavor, when broken up into the disciplines that represent their institutional and industry specialization, is governed by an idea of progress. By this, we mean a kind of momentum, a moving forward, toward some end. For example, the ideal goal of medicine is to be able to identify and cure any ailment or disease. Physicists aim to understand completely the fundamental laws of nature. Progress trends in this general direction. Science is itself an optimization method. So, what might the ultimate goal of **Machine Learning** (**ML**) be?

We'll be upfront. We think it's the creation of **Artificial General Intelligence** (**AGI**). That's the prize: a general-purpose learning computer to take care of the jobs and leave life to people. As we will see when we cover the history of **Deep Learning** (**DL**) in detail, founders of the top **Artificial Intelligence** (**AI**) labs agree that AGI represents a *meta-solution* to many of the complex problems in our world today, from economics to medicine to government.

This chapter will cover the following topics:

- Why DL?
- DL—history applications
- Overview of ML in Go
- Using Gorgonia

# Introducing DL

We will now offer a high-level view of why DL is important and how it fits into the discussion about AI. Then, we will look at the historical development of DL, as well as current and future applications.

# Why DL?

So, who are you, dear reader? Why are you interested in DL? Do you have your private vision for AI? Or do you have something more modest? What is your *origin story*?

In our survey of colleagues, teachers, and meetup acquaintances, the origin story of someone with a more formal interest in machines has a few common features. It doesn't matter much if you grew up playing games against the computer, an invisible enemy who sometimes glitched out, or if you chased down actual bots in *id Software's Quake* back in the late 1990s; the idea of some combination of software and hardware thinking and acting independently had an impact on each of us early on in life.

And then, as time passed, with age, education, and exposure to pop culture, your ideas grew refined and maybe you ended up as a researcher, engineer, hacker, or hobbyist, and now you're wondering how you might participate in booting up the grand machine.

If your interests are more modest, say you are a data scientist looking to understand cutting-edge techniques, but are ambivalent about all of this talk of sentient software and science fiction, you are, in many ways, better prepared for the realities of ML in 2019 than most. Each of us, regardless of the scale of our ambition, must understand the logic of code and hard work through trial and error. Thankfully, we have very fast graphics cards.

And what is the result of these basic truths? Right now, in 2019, DL has had an impact on our lives in numerous ways. Hard problems are being solved. Some trivial, some not. Yes, Netflix has a model of your most embarrassing movie preferences, but Facebook has automatic image annotation for the visually impaired. Understanding the potential impact of DL is as simple as watching the expression of joy on the face of someone who has just seen a photo of a loved one for the first time.

# DL – a history

We will now briefly cover the history of DL and the historical context from which it emerged, including the following:

- The idea of
**AI** - The beginnings of computer science/information theory
- Current academic work about the state/future of DL systems

While we are specifically interested in DL, the field didn't emerge out of nothing. It is a group of models/algorithms within ML itself, a branch of computer science. It forms one approach to AI. The other, so-called **symbolic AI**, revolves around hand-crafted (rather than learned) features and rules written in code, rather than a weighted model that contains patterns extracted from data algorithmically.

The idea of thinking machines, before becoming a science, was very much a fiction that began in antiquity. The Greek god of arms manufacturing, *Hephaestus*, built automatons out of gold and silver. They served his whims and are an early example of human imagination naturally considering what it might take to replicate an embodied form of itself.

Bringing the history forward a few thousand years, there are several key figures in 20^{th}-century information theory and computer science that built the platform that allowed the development of AI as a distinct field, including the recent work in DL we will be covering.

The first major figure, Claude Shannon, offered us a general theory of communication. Specifically, he described, in his landmark paper, *A Mathematical Theory of Computation,* how to ensure against information loss when transmitting over an imperfect medium (like, say, using vacuum tubes to perform computation). This notion, particularly his noisy-channel coding theorem, proved crucial for handling arbitrarily large quantities of data and algorithms reliably, without the errors of the medium itself being introduced into the communications channel.

Alan Turing described his *Turing machine* in 1936, offering us a universal model of computation. With the fundamental building blocks he described, he defined the limits of what a machine might compute. He was influenced by John Von Neumann's idea of the *stored-program*. The key insight from Turing's work is that digital computers can simulate any process of formal reasoning (the *Church-Turing* hypothesis). The following diagram shows the Turing machine process:

*So, you mean to tell us, Mr. Turing, that computers might be made to reason…like us?!*

John Von Neumann was himself influenced by Turing's 1936 paper. Before the development of the transistor, when vacuum tubes were the only means of computation available (in systems such as ENIAC and its derivatives), John Von Neumann published his final work. It remained incomplete at his death and is entitled *The Computer and the Brain*. Despite remaining incomplete, it gave early consideration to how models of computation may operate in the brain as they do in machines, including observations from early neuroscience around the connections between neurons and synapses.

Since AI was first conceived as a discrete field of research in 1956, with ML coined in 1959, the field has gone through a much-discussed ebb and flow—periods where hype and funding were plentiful, and periods where private sector money was non-existent and research conferences wouldn't even accept papers that emphasized neural network approaches to building AI systems.

Within AI itself, these competing approaches cannibalized research dollars and talent. Symbolic AI met its limitations in the sheer impossibility of handcrafting rules for advanced tasks such as image classification, speech recognition, and machine translation. ML sought to radically reconfigure this process. Instead of applying a bunch of human-written rules to data and hoping to get answers, human labor was, instead, to be spent on building a machine that could infer rules from data when the answers were known. This is an example of *supervised learning*, where the machine learns an essential *cat-ness* after processing thousands of example images with an associated *cat* label.

Quite simply, the idea was to have a system that could generalize. After all, the goal is AGI. Take a picture of your family's newest furry feline and the computer, using its understanding of *cat-ness*, correctly identifies a *cat*! An active area of research within ML, one thought essential for building a general AI, is *transfer learning*, where we might take the machine that understands *cat-ness* and plug it into a machine that, in turn, acts when *cat-ness* is identified. This is the approach many AI labs around the world are taking: building systems out of systems, augmenting algorithmic weakness in one area with statistical near certainty in another, and, hopefully, building a system that better serves human (or business) needs.

The notion of *serving human needs* brings us to an important point regarding the ethics of AI (and the DL approaches we will be looking at). There has been much discussion in the media and academic or industry circles around the ethical implications of these systems. What does it mean for our society if we have easy, automated, widespread surveillance thanks to advances in computer vision? What about automated weapons systems or manufacturing? It is no longer a stretch to imagine vast warehouses staffed by nothing with a pulse. What then for the people who used to do those jobs?

Of course, full consideration of these important issues lies outside the scope of this book, but this is the context in which our work takes place. You will be one of the privileged few able to build these systems and move the field forward. The work of the **Future of Humanity Institute** at Oxford University, run by Nick Bostrom, and the **Future of Life Institute**, run by MIT physicist, Max Tegmark, are two examples of where the kind of academic debate around AI ethics issues is taking place. This debate is not limited to academic or non-profit circles, however; DeepMind, an Alphabet company whose goal is to be an *Apollo program for AGI*, launched *DeepMind Ethics & Society* in October 2017.

This may seem far removed from the world of code and CUDA and neural networks to recognize cat pictures, but, as progress is made and these systems become more advanced and have more wide-ranging applications, our societies will face real consequences. As researchers and developers, it behooves us to have some answers, or at least ideas of how we might deal with these challenges when we have to face them.

# DL – hype or breakthrough?

DL and the associated hype is a relatively recent development. Most discussion of its *emergence* centers around the ImageNet benchmarks of 2012, where a deep convolutional neural network beat the previous year's error rate by 9%, a significant improvement where previous winners had made incremental improvements at best with techniques that used hand-crafted features in their models. The following diagram shows this improvement:

Despite the recent hype, the components that make DL work, which allow us to train deep models, have proven very effective in image classification and various other tasks. These were developed in the 1980s by Geoffrey Hinton and his group at the University of Toronto. Their early work took place during one of the *flow* periods discussed earlier in this chapter. Indeed, they were wholly dependent on funding from the **Canadian Institute for Advanced Research** (**CIFAR**).

As the 21^{st} century began in earnest, after the tech bubble that had burst in March 2000 began to inflate again, the availability of high-performance GPUs and the growth in computational power more generally meant that these techniques, which had been developed decades earlier but had gone unused due to a lack of funding and industry interest, suddenly became viable. Benchmarks that previously saw only incremental improvements in image recognition, speech recognition, natural language processing, and sequence modeling all had their *y-*axes adjusted.

It was not just massive advances in hardware paired with old algorithms that got us to this point. There have been algorithmic advances that have allowed us to train particularly deep networks. The most well-known of these is batch normalization, introduced in 2015. It ensures numeric stabilization across layers and can prevent exploding gradients, reducing training time dramatically. There is still active debate about *why* batch normalization is so effective. An example of this is a paper published in May 2018 refuting the central premise of the original paper, namely, that it is not the *internal co-variant shift* that is reduced, rather it *makes the optimization landscape smoother*, that is, the gradients can more reliably propagate, and the effects of a learning rate on training time and stability are more predictable.

Collectively, from the folk science of ancient Greek myths to the very real breakthroughs in information theory, neuroscience, and computer science, specifically in models of computation, have combined to produce network architectures and the algorithms needed to train them that scale well to solving many fundamental AI tasks in 2018 that had proven intractable for decades.

# Defining deep learning

Now, let's take a step back and start with a simple, working definition of DL. As we work through this book, our understanding of this term will evolve, but, for now, let's consider a simple example. We have an image of a person. How can we *show* this image to a computer? How can we *teach* the computer to associate this image with the word *person*?

First, we figure out a representation of this image, say the RGB values for every pixel in the image. We then feed that array of values (together with several trainable parameters) into a series of operations we're quite familiar with (multiplication and addition). This produces a new representation that we can use to compare against a representation we know maps to the label, *person*. We automate this process of comparison and update the values of our parameters as we go.

This description covers a simple, shallow ML system. We'll get into more detail in a later chapter devoted to neural networks but, for now, to make this system deep, we increase the number of operations on a greater number of parameters. This allows us to capture more information regarding the thing we're representing (the person's image). The biological model that influences the design of this system is the human nervous system, including neurons (the things we fill with our representations) and synapses (the trainable parameters).

The following diagram shows the ML system in progress:

So, DL is just an evolutionary twist on the 1957's *perceptron,* the simplest and the original binary classifier.* *This twist, together with dramatic increases in computing power, is the difference between a system that doesn't work and a system that allows a car to drive autonomously.

Beyond self-driving cars, there are numerous applications for DL and related approaches in farming, crop management, and satellite image analysis. Advanced computer vision powers machines that remove weeds and reduce pesticide use. We have near-real-time voice search that is fast and accurate. This is the fundamental stuff of society, from food production to communication. Additionally, we are also on the cusp of compelling, real-time video and audio generation, which will make today's privacy debates or drama about what is *fake news* look minor.

Long before we get to AGI, we can improve the world around us using the discoveries we make along the way. DL is one of these discoveries. It will drive an increase in automation, which, as long as the political change that accompanies it is supportive, can offer improvements across any number of industries, meaning goods and services will get cheaper, faster, and more widely available. Ideally, this means people will be set increasingly free from the routines of their ancestors.

The darker side of progress is not to be forgotten either. Machine vision that can identify victims can also identify targets. Indeed, the Future of Life Institute's open letter on autonomous weapons (*Autonomous Weapons: an Open Letter from AI & Robotics Researchers*), endorsed by science and tech luminaries such as Stephen Hawking and Elon Musk, is an example of the interplay and tensions between academic departments, industry labs, and governments about what the right kind of progress is. In our world, the nation-state has traditionally controlled the guns and the money. Advanced AI can be weaponized, and this is a race where perhaps one group wins and the rest of us lose.

More concretely, the field of ML is progressing incredibly fast. How might we measure this? The premier ML conference **Neural Information Processing Systems** (**NIPS**) has over seven times the registrations in the year 2017 that it did in 2010.

Registrations for 2018 happened more in the manner of a rock concert than a dry technical conference, reflected in the following statistic tweeted out by the organizers themselves:

The *de facto* central repository of ML preprints, **arXiv**, has a hockey-stick growth chart of such extremes, where tools have emerged to help researchers to track all of the new work. An example of this is the director of AI at Tesla, Andrej Karpathy's site, arxiv-sanity (http://www.arxiv-sanity.com/). This site allows us to sort/group papers and organize an interface by which we can pull research we're interested in from the stream of publications with relative ease.

We cannot predict what will happen to the rate of progress over the next five years. The professional guesses of venture capitalists and pundits range from exponential to *the next AI winter is nigh*. But we have techniques and libraries and compute power *now,* and knowing how to use them at their limits for a natural language processing or computer vision task can help to solve real-world problems.

This is what our book aims to show you how to do.

# Overview of ML in Go

This section will take a look at the ML ecosystem in Go, first discussing the essentials we want from a library, and then assessing each of the main Go ML libraries in turn.

Go's ML ecosystem has historically been quite limited. The language was introduced in 2009, well before the DL revolution that has brought many new programmers into the fold. You might assume that Go has seen the growth in libraries and tools that other languages have. History, instead, determined that many of the higher-level APIs for the mathematical operations that underpin our networks have appeared as Python libraries (or have complete Python bindings). There are numerous well-known examples of this, including PyTorch, Keras, TensorFlow, Theano, and Caffe (you get the idea).

Unfortunately, these libraries have either zero or incomplete bindings for Go. For example, TensorFlow can do inference (*Is this a cat or not?*), but not training (*What is a cat anyway? Oh, okay, I'll take a look at these examples and figure it out*). While this takes advantage of Go's strengths when it comes to deployment (compilation down to single binary, compiler speed, and low memory footprint), from a developer's perspective, you're then forced to work across two languages (Python for training your model and Go for running it).

Issues you may face, beyond the cognitive hit of switching syntax when designing, implementing, or troubleshooting, extend to environment and configuration problems. These problems include questions such as: *Is my Go environment configured properly*? *Is my Python 2 binary symlinked to Python or is it Python 3*? *Is TensorFlow GPU working properly*? If our interest is in designing the best model and getting it trained and deployed in the minimum amount of time, none of the Python or Go bindings libraries are suitable.

It is important, at this point, to ask: so, what do we want out of a *DL library* in Go? Quite simply, we want as much unnecessary complication abstracted away as possible while preserving flexibility and control over our model and how it is trained.

What does this mean in practice? The following list outlines the answers to this query:

- We do not want to interface with
**Basic Linear Algebra Subprograms**(**BLAS**) directly to construct basic operations such as multiplication and addition. - We do not want to define a tensor type and associated function(s) each time we implement a new network.
- We do not want to write an implementation of
**Stochastic Gradient Descent**(**SGD**) from scratch each time we want to train our network.

The following are some of the things that will be covered in this book:

**Automatic or symbolic differentiation**: Our DNN is trying to learn some function. It iteratively*solves*the problem of*what is the function that will take an input image and output the label cat*? by calculating the gradient (the*gradient descent optimizations*) with respect to the loss (*how wrong is our function*?). This allows us to understand whether to change the weights in our network and by how much, with the specific mode of differentiation*breaking up*the calculation of these gradients (effectively using the chain rule), giving us the performance we need to be able to train deep networks with millions of parameters.**N****umerical stabilization****function****s**: This is essential for DL, as we will explore in later sections of this book. A primary example of this is**Batch Normalization**or BatchNorm, as the attendant function is often called. It aims to put our data on the same scale to increase training speed, and it reduces the possibility of maximum values cascading through the layers and causing gradient explosion (something we will discuss in greater detail in Chapter 2,*What is a Neural Network and How Do I Train One?*).**Activation functions**: These are mathematical operations that introduce nonlinearities into the various layers in our neural network and help to determine which neurons in a layer will be*activated*, passing their values down to the next layer in the network. Examples include Sigmoid,**Rectified Linear Unit**(**ReLU**), and Softmax. These will be considered in greater detail in Chapter 2,*What is a Neural Network and How Do I Train One?***Gradient descent optimizations**: We will also cover these extensively in Chapter 2,*What is a Neural Network and How Do I Train One?*But, as the primary optimization method used in DNNs, we consider this a core function necessary for any library to have DL as its purpose.**CUDA support**: Nvidia's drivers allow us to offload the fundamental operations involved in our DNNs to our GPU. GPUs are really great for parallel workloads involving matrix transformations (indeed, this was their original purpose: computing the world-geometry of games) and can reduce the time it takes to train your model by an order of magnitude or more. Suffice to say, CUDA support is essential for modern DNN implementations and is therefore available in the major Python libraries described previously.**Deployment tools**: As we will cover in Chapter 9,*Building a Deep Learning Pipeline*, deployment of a model for training or inference is often overlooked in discussions about DL. With neural network architectures growing more complex, and with the availability of vast amounts of data, training your network on, say, AWS GPUs, or deploying your trained model to other systems (for example, a recommendation system integrated into a news website) is a critical step. You will improve your training time and extend the amount of computing that can be used. This means being able to experiment with more complex models too. Ideally, we would want a library that makes it easy to integrate with existing tools or has tools of its own.

Now that we've got a reasonable set of requirements for our ideal library, let's take a look at a number of the popular options out there in the community. The following list is by no means exhaustive; however, it covers most of the major ML-related Go projects on GitHub, from the most narrow to the most general.

# ML libraries

We will now consider each of the main ML libraries, assessing their utility based on the criteria we defined earlier, including any negative aspects or shortcomings.

# Using Gorgonia

At the time of writing this book, there are two libraries that would typically be considered for DL in Go, TensorFlow and Gorgonia. However, while TensorFlow is definitely well regarded and has a full-featured API in Python, this is not the case in Go. As discussed previously, the Go bindings for TensorFlow are only suited to loading models that have already been created in Python, but not for creating models from scratch.

Gorgonia has been built from the ground up to be a Go library that is able to both train ML models and perform inference. This is a particularly valuable property, especially if you have an existing Go application or you are looking to build a Go application. Gorgonia allows you to develop, train, and maintain your DL model in your existing Go environment. For this book, we will be using Gorgonia exclusively to build models.

Before we go on to build models, let's go through some basics of Gorgonia and learn how to build simple equations in it.

# The basics of Gorgonia

**Gorgonia** is a lower-level library, which means that we need to build the equations and the architecture for models ourselves. This means that there isn't a built-in DNN classifier function that will magically create an entire model with several hidden layers and immediately be ready to apply to your dataset.

Gorgonia facilitates DL by being a library that makes working with multidimensional arrays easy. It does this by providing loads of operators to work with so you can build the underlying mathematical equations that make up layers in a DL model. We can then proceed to use these layers in our model.

Another important feature of Gorgonia is performance. By removing the need to think about how to optimize tensor operations, we can focus on building the model and ensuring the architecture is correct, rather than worrying about whether or not our model will be performant.

As Gorgonia is a little lower-level than a typical ML library, building a model takes a few more steps. However, this does not mean that building a model in Gorgonia is difficult. It requires the following three basic steps:

- Create a computation graph
- Input the data
- Execute the graph

Wait, what's a computation graph? A **computation graph** is a directed graph where each of the nodes is either an operation or a variable. Variables can be fed into operations, which will then produce a value. This value can then be fed into another operation. In more familiar terms, a graph is like a function that takes all of the variables and then produces a result.

A variable can be anything; we can make it a single scalar value, a vector (that is, an array), or a matrix. In DL, we typically work with a more generalized structure called a tensor; a tensor can be thought of as something similar to an *n*-dimensional matrix.

The following screenshot shows a visual representation of *n*-dimensional tensors:

We represent equations as graphs because this makes it easier for us to optimize the performance of our model. This is enabled by the fact that, by putting each node in a directed graph, we have a good idea of what its dependencies are. Since we model each node as an independent piece of code, we know that all it needs to execute are its dependencies (which can be other nodes or other variables). Also, as we traverse the graph, we can know which nodes are independent of each other and run those concurrently.

For example, take the following diagram:

As **A**, **B**, and **C** are independent, we can easily compute these concurrently. The computation of **V** requires both **A** and **B** to be ready. However, **W** only requires **B** to be ready. This follows into the next level, and so on, up until we are ready to compute the final output in **Z**.

# Simple example – addition

The easiest way to understand how this all fits together is by building a simple example.

To start, let's implement a simple graph to add two numbers together—basically, this would be: *c = a + b*:

- First, let's import some libraries—most importantly, Gorgonia, as follows:

package main

import (

"fmt"

"log"

. "gorgonia.org/gorgonia"

)

- Then, let's start our main function, like so:

func main() {

g := NewGraph()

}

- To that, let's add our scalars, as shown here:

a = NewScalar(g, Float64, WithName("a"))

b = NewScalar(g, Float64, WithName("b"))

- Then, very importantly, let's define our operation node, as follows:

c, err = Add(a, b)

if err != nil {

log.Fatal(err)

}

Note that `c` will not actually have a value now; we've just defined a new node of our computation graph, so we need to execute it before it will have a value.

- To execute it, we need to create a virtual machine object for it to run in, as follows:

machine := NewTapeMachine(g)

- Then, set the initial values of
`a`and`b`, and proceed to get the machine to execute our graph, as shown here:

Let(a, 1.0)

Let(b, 2.0)

if machine.RunAll() != nil {

log.Fatal(err)

}

The complete code is as follows:

package main

import (

"fmt"

"log"

. "gorgonia.org/gorgonia"

)

func main() {

g := NewGraph()

var a, b, c *Node

var err error

// define the expression

a = NewScalar(g, Float64, WithName("a"))

b = NewScalar(g, Float64, WithName("b"))

c, err = Add(a, b)

if err != nil {

log.Fatal(err)

}

// create a VM to run the program on

machine := NewTapeMachine(g)

// set initial values then run

Let(a, 1.0)

Let(b, 2.0)

if machine.RunAll() != nil {

log.Fatal(err)

}

fmt.Printf("%v", c.Value())

// Output: 3.0

}

Now, we have built our first computation graph in Gorgonia and executed it!

# Vectors and matrices

Of course, being able to add to numbers isn't why we're here; we're here to work with tensors, and eventually, DL equations, so let's take the first step toward something a little more complicated.

The goal here is to now create a graph that will compute the following simple equation:

*z = Wx*

Note that *W* is an *n* x *n* matrix, and *x* is a vector of size *n*. For the purposes of this example, we will use *n = 2*.1957.

Again, we start with the same basic main function, as shown here:

package main

import (

"fmt"

"log"

G "gorgonia.org/gorgonia"

"gorgonia.org/tensor"

)

func main() {

g := NewGraph()

}

You'll notice that we've chosen to alias the Gorgonia package as `G`.

We then create our first tensor, the matrix, `W`, like so:

matB := []float64{0.9,0.7,0.4,0.2}

matT := tensor.New(tensor.WithBacking(matB), tensor.WithShape(2, 2))

mat := G.NewMatrix(g,

tensor.Float64,

G.WithName("W"),

G.WithShape(2, 2),

G.WithValue(matT),

)

You'll notice that we've done things a bit differently this time around, as listed here:

- We've started by declaring an array with the values that we want in our matrix
- We've then created a tensor from that matrix with a shape of 2 x 2, as we want a 2 x 2 matrix
- After all of that, we've then created a new node in our graph for the matrix, given it the name
`W`, and initialized it with the value of the tensor

We then create our second tensor and input node the same way, the vector, `x`, as follows:

vecB := []float64{5,7}

vecT := tensor.New(tensor.WithBacking(vecB), tensor.WithShape(2))

vec := G.NewVector(g,

tensor.Float64,

G.WithName("x"),

G.WithShape(2),

G.WithValue(vecT),

)

Just like last time, we then add an operator node, `z`, that will multiply the two (instead of an addition operation):

`z, err := G.Mul(mat, vec)`

Then, as last time, create a new tape machine and run it, as shown here, and then print the result:

machine := G.NewTapeMachine(g)

if machine.RunAll() != nil {

log.Fatal(err)

}

fmt.Println(z.Value().Data())

// Output: [9.4 3.4]

# Visualizing the graph

In many cases, it is also very useful to visualize the graph; you can easily do this by adding `io` or `ioutil` to your imports and the following line to your code:

ioutil.WriteFile("simple_graph.dot", []byte(g.ToDot()), 0644)

This will produce a DOT file; you can open this in GraphViz, or, more conveniently, convert it to an SVG. You can view it in most modern browsers by installing GraphViz and entering the following in the command line:

`dot -Tsvg simple_graph.dot -O`

This will produce `simple_graph.dot.svg`; you can open this in your browser to see a rendering of the graph, as follows:

You can see, in our graph, that we have two inputs, `W` and `x`, and this then gets fed into our operator, being a matrix multiplication with a vector giving us the result as well—another vector.

# Building more complex expressions

Of course, we've mostly covered how to build simple equations; however, what happens if your equation is a little bit more complicated, for example, like the following:

*z = Wx + b*

We can also very easily do this by changing our code a bit to add the following line:

b := G.NewScalar(g,

tensor.Float64,

G.WithName("b"),

G.WithValue(3.0)

)

Then, we can change our definition for `z` slightly, as shown here:

a, err := G.Mul(mat, vec)

if err != nil {

log.Fatal(err)

}

z, err := G.Add(a, b)

if err != nil {

log.Fatal(err)

}

As you can see, we've created a multiplication operator node, and then created an addition operator node on top of that.

Alternatively, you can also just do it in a single line, as follows:

`z, err := G.Add(G.Must(G.Mul(mat, vec)), b)`

Notice that we use `Must` here to suppress the error object; we are merely doing it here for convenience, as we know that the operation to add this node to the graph will work. It is important to note that you may want to restructure this code to create the node for addition separately so that you can have error handling for each step.

If you now proceed to build and execute the code, you will find that it will produce the following:

`// Output: [12.4 6.4]`

The computation graph now looks like the following screenshot:

You can see that `W` and `x` both feed into the first operation (our multiplication operation) and then, later, it feeds into our addition operation to produce our results.

That's an introduction to using Gorgonia! As you can now hopefully see, it is a library that contains the necessary primitives that will allow us to build the first simple, and then more complicated, neural networks in the following chapters.

# Summary

This chapter included a brief introduction to DL, both its history and applications. It was followed by a discussion of why Go is a great language for DL and demonstrated how the library we use in Gorgonia compares to other libraries in Go.

The next chapter will cover the magic that makes neural networks and DL work, which includes activation functions, network structure, and training algorithms.