Learn Unity ML-Agents - Fundamentals of Unity Machine Learning

4.5 (6 reviews total)
By Micheal Lanham
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Unity Machine Learning agents allow researchers and developers to create games and simulations using the Unity Editor, which serves as an environment where intelligent agents can be trained with machine learning methods through a simple-to-use Python API.

This book takes you from the basics of Reinforcement and Q Learning to building Deep Recurrent Q-Network agents that cooperate or compete in a multi-agent ecosystem. You will start with the basics of Reinforcement Learning and how to apply it to problems. Then you will learn how to build self-learning advanced neural networks with Python and Keras/TensorFlow. From there you move o n to more advanced training scenarios where you will learn further innovative ways to train your network with A3C, imitation, and curriculum learning models. By the end of the book, you will have learned how to build more complex environments by building a cooperative and competitive multi-agent ecosystem.

Publication date:
June 2018


Chapter 1. Introducing Machine Learning and ML-Agents

All around us, our perception of learning and intellect is being challenged daily with the advent of new and emerging technologies. From self-driving cars, playing Go and Chess, to computers being able to beat humans at classic Atari games, the advent of a group of technologies we colloquially call Machine Learning have come to dominate a new era in technological growth – a new era of growth that has been compared with the same importance as the discovery of electricity and has already been categorized as the next human technological age. 

This book is intended to introduce you to a very small slice of that new era in a fun and informative way using the Machine Learning Agents platform called ML-Agents from Unity. We will first explore some basics of Machine Learning and ML-Agents. Then, we will cover training and specifically Reinforcement Learning and Q Learning. After that, we will learn how to use Keras to build a Neural Network that we will evolve into a Deep Q-Network. From there, we will look at various ways to improve the Deep Q-Network with different training strategies. This will lead us to our first example, where we train an agent to play a more complex game. Then, finally, we will finish with a look at a multi-agent example that allows agents to compete with or against each other.


Machine Learning is a big subject and could certainly take years to master. You certainly won't learn everything you need to know from this book. This book is intended only as an enjoyable introduction to a complex and frustrating topic. We will try and point out other areas for learning more about certain techniques or backgrounds.

In our first chapter, we will take a gradual introduction to ML and ML-Agents. Here is what we will cover in this chapter:

  • Machine Learning
  • ML-Agents
  • Running an example
  • Creating an environment
  • Academy, Agent, and Brain

Let's get started, and in the next section, we will introduce what Machine Learning is and the particular aspect of ML we plan to focus on in this book.


If you have not already done so, be sure to download and install the latest version of Unity (https://unity3d.com/). Make sure you have the latest released version of the software and avoid any beta versions. We will use the Personal version in this book, but any version of Unity should work fine.



Machine Learning

Games and simulations are no stranger to AI technologies and there are numerous assets available to the Unity developer in order to provide simulated machine intelligence. These technologies include content like Behavior Trees, Finite State Machine, navigation meshes, A*, and other heuristic ways game developers use to simulate intelligence. So, why Machine Learning and why now? After all, many of the base ML techniques, like neural nets, we will use later in this book have been used in games before. 

The reason, is due in large part to the OpenAI initiative, an initiative that encourages research across academia and the industry to share ideas and research on AI and ML. This has resulted in an explosion of growth in new ideas, methods, and areas for research. This means for games and simulations that we no longer have to fake or simulate intelligence. Now, we can build agents that learn from their environment and even learn to beat their human builders.


Machine Learning is an implementation of Artificial Intelligence. It is a way for a computer to assimilate data or state and provide a learned solution or response. We often think of AI now as a broader term to reflect a "smart" system. A full game AI system, for instance, may incorporate ML tools combined with more classic AIs like Behavior Trees in order to simulate a richer, more unpredictable AI. We will use AI to describe a system and ML to describe the implementation.

Training models

Machine Learning is so aptly named because it uses various forms of training to analyze data or state and provide that trained response. These methods are worth mentioning and we will focus on one particular method of learning that is currently showing good success. Before we get to that though, for later chapters, let's breakdown the three types of training we frequently see in ML:

  • Unsupervised Training: This method of training examines a dataset on its own and performs a classification. The classification may be based on certain metrics and can be discovered by the training itself. Most people used to think that all AI or ML worked this way, but of course, it does not:
    • ESRI, which is a major mapping provider of GIS software and data provides a demographic dataset called Tapestry. This dataset is derived from a combination of US census data and other resources. It is processed through an ML algorithm that classifies the data into 68 consumer segments using Unsupervised Training. The Tapestry data is not free but can be invaluable for anyone building ML for a consumer or retail application.
  • Supervised Training: This is the typical training method most data science ML methods use to perform prediction or classification. It is a type of training that requires input and output data be labelled. As such, it requires a set of training data in order to build a model. Oftentimes, depending on the particular ML technique, it can require vast amounts of data:
    • Google Inception is an image classification ML model that is freely available. It has been trained by millions of images into various trained classifications. The Inception model is small enough to fit on a mobile device in order to provide real-time image classification.
  • Reinforcement Learning: This is based on control theory and provides a method of learning without any initial state or model of the environment. This is a powerful concept because it eliminates the need to model the environment or undertake the tedious data labeling often required by Supervised Training. Instead, agents are modeled in the environment and receive rewards based on their actions. Of course, that also means that this advanced method of training is not without its pitfalls and frustrations. We will start learning the details of RL in Chapter 2, The Bandit and Reinforcement Learning:
    • DeepMind built the bot that was able to play classic Atari 2600 games better than a human.
  • Imitation Learning: This is a technique where agents are trained by watching a demonstration of the desired actions and then imitating them. This is a powerful technique and has plenty of applications. We will explore this type of training inChapter 4, Going Deeper with Deep Learning.
  • Curriculum Learning: This is an advanced form of learning that works by breaking down a problem into levels of complexity, which allows the agent or ML to overcome each level of complexity before moving on to more advanced activities. For example, an agent waiter may first need to learn to balance a tray, then the tray with a plate of food, then walking with the tray and food, and finally delivering the food to a table. We will explore this form of training in Chapter 5, Playing the Game.
  • Deep Learning: This uses various forms of internal training mechanisms to train a multi-layer neural network. We will spend more time on neural networks and Deep Learning in Chapter 3, Deep Reinforcement Learning with Python.

You may have already noticed the interchange of terms ML and agent use to denote the thing that is learning. It is helpful to think of things in these terms for now. Later in this chapter, we will start to distinguish the differences between an agent and their brain or ML. For now, though, let's get back to some basics and explore a simple ML example in the next section.

A Machine Learning example

In order to demonstrate some of these concepts in a practical manner, let's look at an example scenario where we use ML to solve a game problem. In our game, we have a cannon that shoots a projectile at a specific velocity in a physics-based world. The object of the game is to choose the velocity to hit the target at a specific distance. We have already fired the cannon ten times and recorded the results in a table and chart, as shown in the following screenshot:

Record and chart of cannon shots

Since the data is labelled already, this problem is well-suited for Supervised Training. We will use a very simple method called linear regression in order to give us a model that can predict a velocity in order to hit a target at a certain distance. Microsoft Excel provides a quick way for us to model linear regression on the chart by adding a trendline, as follows:

Linear Regression applied with a trendline

By using this simple feature in Excel, you can quickly analyze your data and see an equation that best fits that data. Now, this is a rudimentary example of data science, but hopefully you can appreciate how this can easily be used to predict complex environments just based on the data. While the linear regression model can provide us with an answer, it obviously is not very good and the Rreflects that. The problem we have with our model is that we are using a linear model to try and solve a nonlinear problem. This is reflected with the arrows to the points, where the distance shows the amount of errors from the trendline. Our goal with any ML method will be to minimize the errors in order to find the solution of best fit. In most cases, that is all ML is, finding an equation that best predicts/classifies a value or action.

Getting back to our earlier question, we can now solve the velocity using some simple algebraic substitution, as shown in the following equation:

Where d = distance and v = velocity:

Our final answer would be an answer of 56.05, but as we already mentioned, we may still miss, because our model is not entirely accurate. However, if you look at the graph, our errors appear to minimize around the distance of 300. So, in our specific example, our model fits well. Looking closer at the graph, though, you can see that at a distance of around 100, our error gets quite large and it is unlikely that we will hit our target. 


R2 or R squared is an error value between 0 and 1, with 1 being the highest or best fit. R2 attempts to summarize the quality of fit. In some cases, it works well and in others there are other measures of fit that work better. We will use different measures of quality of fitness, but the concepts are similar.

The example we just looked at is quite simple and doesn't take into account many other factors, such as elevation differences or movement speed, and so on. If we wanted to add those inputs, we would just add more columns to our table. Each new column would expand our data space and consequently increase the complexity of the model. As you can quickly see, our model could quickly expand and become impractical. This is essentially the shortcomings the gaming industry already experienced using ML techniques at the turn of the century when implementing game AI. It is also a shortcoming that any other industry faces when implementing supervision-based models. That is the need to constantly re-sample and relabel data and consequently retrain models, which is why Reinforcement Learning and other methods of learning have become so significant. They provide a method of learning whereby autonomous agents or ML with no previous knowledge of an environment can successfully explore.

ML uses in gaming

Unity has embraced the idea of incorporating ML into all aspects of its product and not just for use as a game AI. While most developers may try to use ML for gaming, it certainly helps game development in the following areas:

  • Map/Level Generation: There are already plenty of examples where developers have used ML to auto-generate everything from dungeons to realistic terrain. Getting this right can provide a game with endless replayability, but it can be some of the most challenging ML to develop.
  • Texture/Shader Generation: Another area that is getting the attention of ML is texture and shader generation. These technologies are getting a boost brought on by the attention of advanced generative adversarial networks, or GAN. There are plenty of great and fun examples of this tech in action; just do a search for DEEP FAKES in your favorite search engine.
  • Model Generation: There are a few projects coming to fruition in this area that could greatly simplify 3D object construction through enhanced scanning and/or auto-generation. Imagine being able to textually describe a simple model and having ML build it for you, in real-time, in a game or other AR/VR/MR app, for example.
  • Audio Generation: Being able to generate audio sound effects or music on the fly is already being worked on for other areas, not just games. Yet, just imagine being able to have a custom designed soundtrack for your game developed by ML.
  • Artificial Players: This encompasses many uses from the gamer themselves using ML to play the game on their behalf to the developer using artificial players as enhanced test agents or as a way to engage players during low activity. If your game is simple enough, this could also be a way of auto testing levels, for instance. We will explore an example of using ML to play a game in Chapter 5, Playing the Game.
  • NPCs or Game AI: Currently, there are better patterns out there to model basic behavioral intelligence in the form of Behavior Trees. While it's unlikely that BTs or other similar patterns will go away any time soon, imagine being able to model an NPC that may actually do an unpredictable, but rather cool behavior. This opens all sorts of possibilities that excite not only developers but players as well. We will look at ways of modeling behavioral patterns using ML in Chapter 6Terrarium Revisited – Building A Multi-Agent Ecosystem.

Our interest in this book will be in the area of artificial players and the game AI, as it tends to be the most broad topic in scope. The reader is encouraged to search out the other areas mentioned in the preceding list on their own and as/when they relate to their own project.

It is highly recommended that you take a course, read a book, or watch a video on Data Science. The area of data science deals primarily with Supervised and Unsupervised Training on ML against known datasets. However, you will or should learn data scrubbing, data labeling, the mathematics of ML, and calculating errors to name just a few important concepts. Having a background in Data Science will help you model problems as well as help you uncover possible issues when things don't work as expected.

That overview of ML certainly won't rival any Data Science course, but it should get us started for the rest of the good stuff starting in the next section, where we start looking at ML in action with Unity ML-Agents.



For the rest of this book, we will be using the ML-Agents platform with Unity to build ML models that we can learn to play and simulate in various environments. Before we do that, though, we need to pull down the ML-Agents package from GitHub using git. Jump on your computer and open up a command prompt or shell window and follow along:


If you have never used git before, make sure to install it from https://git-scm.com/. You will need to install git before continuing with the following exercises and thus the rest of this book.

  1. Navigate to your work or root folder (on Windows, we will assume that this is C:\):
  1. Execute the following command:
      mkdir ML-Agents
  1. This will create the folder ML-Agents. Now, execute the following:
      cd ML-Agents
      git clone https://github.com/Unity-Technologies/ml-agents.git
  1. This uses git to pull down the required files for ML-Agents into a new folder called ml-agents. git will show the files as they are getting pulled into the folder. You can verify that the files have been pulled down successfully by changing to the new folder and executing:
      cd ml-agents
  1. Right now, we are doing this to make sure that there are any files here. We will get to the specifics later.

Good—that should have been fairly painless. If you had issues pulling the code down, you can always visit the ML-Agents page on GitHub at https://github.com/Unity-Technologies/ml-agents and manually pull the code down. Of course, we will be using more of git to manage and pull files, so you should resolve any problems you may have encountered.


If you are not familiar with GitHub and git, then you really should be. git completely dominates source control across all areas of software development now and is widely used, even at Microsoft, who abandoned their own source control for it. Do yourself a favor, even if you develop your code just for yourself: use source control.

Now that we have ML-Agents installed, we will take a look at one of Unity's sample projects that ships with a toolkit in the next section.


Running a sample

Unity ships the ML-Agents package with a number of prepared samples that demonstrate various aspects of learning and training scenarios. Let's open up Unity and load up a sample project and get a feel for how the ML-Agents run by following this exercise:

  1. Open the Unity editor and go to the starting Project dialog. 
  1. Click the Open button at the top of the dialog and navigate to and select the ML-Agents/ml-agents/unity-environment folder, as shown in the following screenshot:

Loading the unity-environment project into the editor

  1. This will load the unity-environment project into the Unity editor. Depending on the Unity version you are using, you may get a warning that the version needs to be upgraded. As long as you are using a recent version of Unity, you can just click Continue. If you do experience problems, try upgrading or downgrading your version of Unity.
  2. Locate the Scene file in the Assets/ML-Agents/Examples/3DBall folder of the Project window, as shown in the following screenshot:

Locating the example scene file in the 3DBall folder

  1. Double-click the 3DBall scene file to open the scene in the editor.
  2. Press the Play button at the top center of the editor to run the scene. You will see that the scene starts running and that balls are being dropped, but the balls just fall off the platforms. This is because the scene starts up in Player mode, which means you can control the platforms with keyboard input. Try to balance the balls on the platform using the arrow keys on the keyboard.
  3. When you are done running the scene, click the Play button again to stop the scene.

Setting the agent Brain

As you witnessed, the scene is currently set for Player control, but obviously we want to see how some of this ML-Agents stuff works. In order to do that, we need to change the Brain type the agent is using. Follow along to switch the Brain type in the 3D Ball agent:

  1. Locate the Ball3DAcademy object in the Hierarchy window and expand it to reveal the Ball3DBrain object.
  2. Select the Ball3DBrain object and then look to the Inspector window, as shown in the following screenshot:

Switching the Brain on the Ball3DBrain object

  1. Switch the Brain component, as shown in the preceding excerpt, to the Heuristic setting. The Heuristic brain setting is for ML-Agents that are internally coded within Unity scripts in a heuristic manner. Heuristic programming is nothing more than selecting a simpler quicker solution when a classic, in our case, ML algorithms, may take longer. Writing a Heuristic brain can often help you better define a problem and it is a technique we will use later in this chapter. The majority of current game AIs fall within the category of using Heuristic algorithms. 
  1. Press Play to run the scene. Now, you will see the platforms balancing each of the balls – very impressive for a heuristic algorithm. Next, we want to open the script with the heuristic brain and take a look at some of the code.


You may need to adjust the Rotation Speed property, up or down, on the Ball 3D Decision (Script). Try a value of .5 for a rotation speed if the Heuristics brain seems unable to effectively balance the balls. The Rotation Speed is hidden in the preceding screen excerpt.

  1. Click the Gear icon beside the Ball 3D Decision (Script), and from the context menu, select Edit Script, as shown in the following screenshot:

Editing the Ball 3D Decision script

  1. Take a look at the Decide method in the script as follows:
      public float[] Decide(
              List<float> vectorObs,
              List<Texture2D> visualObs,
              float reward,
              bool done,
              List<float> memory)
                 == SpaceType.continuous)
                  List<float> act = new List<float>();

        // state[5] is the velocity of the ball in the x orientation. 
        // We use this number to control the Platform's z axis rotation 
        // so that the Platform is tilted in the x orientation 
          act.Add(vectorObs[5] * rotationSpeed);

        // state[7] is the velocity of the ball in the z orientation. 
        // We use this number to control the Platform's x axis rotation 
        // so that the Platform is tilted in the z orientation 
          act.Add(-vectorObs[7] * rotationSpeed);

          return act.ToArray();

          // If the vector action space type is discrete, then we don't do 
          return new float[1] { 1f };
  1. We will cover more details about what the inputs and outputs of this method mean later. For now though, look at how simple the code is. This is the heuristic brain that is balancing the balls on the platform, which is fairly impressive when you see the code. The question that may just hit you is: why are we bothering with ML programming, then? The simple answer is that the 3D ball problem is deceptively simple and can be easily modeled with eight states. Take a look at the code again and you can see that only eight states are used (0 to 7), with each state representing the direction the ball is moving in. As you can see, this works well for this problem but when we get to more complex examples, we may have millions upon billions of states – hardly anything we could easily solve using heuristic methods.

Heuristic brains should not be confused with Internal brains, which we will get to in Chapter 6, Terrarium Revisited – Building a Multi-Agent Ecosystem. While you could replace the heuristic code in the 3D ball example with an ML algorithm, that is not the best practice for running an advanced ML such as Deep Learning algorithms, which we will discover in Chapter 3Deep Reinforcement Learning with Python

In the next section, we are going to modify the Basic example in order to get a better feel for how ML-Agents components work together.


Creating an environment

One thing you may have noticed while looking over the last example was that an ML-Agent environment requires a bit of custom setup. Unity documentation recommends that an ML environment be constructed of Academy, Agent, and Brain objects with associated scripts. There is a Template folder in the ML-Agents project which we will use to configure and set up a simple environment. Let's jump back to the Unity editor and get started setting up our first simple ML environment:

  1. Locate the Template folder in the ML-Agents folder within the Project window of the editor.  
  2. Right-click (Command Click on macOS) on the Template folder and select Show in Explorer from the context menu. This will open an explorer window with the files. 
  3. Select and copy the Template folder.
  4. Navigate up two levels to the Assets folder and paste the copied folder. This will add the Template folder to the root Assets folder.
  5. Rename the Template folder to Simple.


When you return to the editor, you will see a few namespace errors due to the duplicate Template scripts. We will fix that shortly.

  1. Return to the Unity editor and confirm the folder and files have been copied to the new Simple folder, as shown in the following screenshot:

Verifying that the Simple folder was created

  1. Double-click on the Scene to open it in the editor.

Renaming the scripts

That sets up the simple scene, but you may have noticed that there are still a few duplicated naming errors. We will need to rename the Template scripts in the Simple/Scripts folder. Follow this next exercise to rename each of the scripts:

  1. Open the Scripts folder.
  2. Rename each of the files from Template to Simple, as shown in the following excerpt of the Project window:

Renaming the Template scripts to Simple

  1. Double-click on of the SimpleAcademy script file to open it in your code editor. Rename the class from TemplateAcademy to SimpleAcademy so that it matches the file name, as shown in the following code:
       public class SimpleAcademy : Academy {
  1. Repeat this process for the Agent and Decision scripts. The objects in the scene are still pointing to the template scripts, so we will update that next. Make sure to save all the scripts with your changes before returning to the editor. If all the files are renamed correctly, the naming errors will go away.
  2. Select and rename the Ball3DAcademy to just Academy in the Hierarchy window.
  3. Select the Academy object in the Hierarchy window. Click the Gear icon beside the TemplateAcademy component in the Inspector window and select Remove Component to remove the script.
  1. Click the Add Component button and type Simple in the component search bar, as shown in the following screenshot:

Adding the SimpleAcademy object to the Academy object

  1. Click on the Simple Academy item, as shown in the preceding excerpt, to add the component to the Academy object.
  2. Repeat the process for the Agent object and remove the TemplateAgent script and add the SimpleAgent script.
  3. After you are done, be sure to save the scene and the project.


It is surprising that Unity didn't provide a better set of editor tools to build a new ML Agent environment, at least not at the time of writing this book. In the source code download for this book (Chapter_1/Editor_Tools), an asset package has been provided that can automate this setup for you. We may decide to put this package and some others from this book on the asset store.

That sets up a new ML environment for us to start implementing our own Academy, Agent, and Decision (Brain) scripts. We will get into the details of these scripts in the next section when we set up our first learning problem.


Academy, Agent, and Brain

In order to demonstrate the concepts of each of the main components (Academy, Agent, and Brain/Decision), we will construct a simple example based on the classic multi-armed bandit problem.  The bandit problem is so named because of its similarity to the slot machine that is colloquially known in Vegas as the one armed bandit. It is named as such because the machines are notorious for taking the poor tourist's money who play them. While a traditional slot machine has only one arm, our example will feature four arms or actions a player can take, with each action providing the player with a given reward. Open up Unity to the Simple project we started in the last section:

  1. From the menu, select GameObject | 3D Object | Cube and rename the new object Bandit
  2. Click the Gear icon beside the Transform component and select Reset from the context menu. This will reset our object to (0,0,0), which works well since it is the center of our scene.
  3. Expand the Materials section on the Mesh Renderer component and click the Target icon. Select the NetMat material, as shown in the following screenshot:

Selecting the NetMat material for the Bandit

  1. Open the Assets/Simple/Scripts folder in theProject window.
  2. Right-click (Command Click on macOS) in a blank area of the window and from the Context menu, select Create | C# Script. Name the script Bandit and replace the code with the following:
      public class Bandit : MonoBehaviour
        public Material Gold;
        public Material Silver;
        public Material Bronze;
        private MeshRenderer mesh;
        private Material reset;

        // Use this for initialization
        void Start () {
        mesh = GetComponent<MeshRenderer>();
        reset = mesh.material;

        public int PullArm(int arm)
         var reward = 0;
         switch (arm)
           case 1:
             mesh.material = Gold;
             reward = 3;
            case 2:
              mesh.material = Bronze;
              reward = 1;
            case 3:
             mesh.material = Bronze;
             reward = 1;
           case 4:
             mesh.material = Silver;
             reward = 2;
          return reward;

        public void Reset()
          mesh.material = reset;
  1. This code just simply implements our four armed bandit. The first part declares the class as Bandit extended from MonoBehaviour. All GameObjects in Unity are extended from MonoBehaviour. Next, we define some public properties that define the material we will use to display the reward value back to us. Then, we have a couple of private fields that are placeholders for the MeshRenderer called mesh and the original Material we call reset. We will implement the Start method next, which is a default Unity method that runs when the object starts up. This is where we will set our two private fields based on the object's MeshRenderer.  Next comes the PullArm method which is just a simple switch statement that sets the appropriate material and reward. Finally, we will finish up with the Reset method where we just reset the original property. 
  2. When you are done entering the code, be sure to save the file and return to Unity.
  3. Drag and drop the Bandit script from the Assets/Simple/Scripts folder in the Project window and drop it on the Bandit object in the Hierarchy window. This will add the Bandit component to the object.
  4. Select the Bandit object in the Hierarchy window and then in the Inspector window click the Target icon and select each of the material slots (Gold, Silver, Bronze), as shown in the following screenshot:

Setting the Gold, Silver and Bronze materials on the Bandit

This will set up our Bandit object as a visual placeholder. You could, of course, add the arms and make it look more visually like a multi-armed slot machine, but for our purposes, the current object will work fine. Remember that our Bandit has 4 arms, each with a different reward.

Setting up the Academy

An Academy object and component represents the training environment where we define the training configuration for our agents. You can think of an Academy as the school or classroom in which our agents will be trained. Open up the Unity editor and select the Academy object in the Hierarchy window. Then, follow these steps to configure the Academy component:

  1. Set the properties for the Academy component, as shown in the following screenshot:

Setting the properties on the Academy component of the Academy object

  1. The following is a quick summary of the initial Academy properties we will cover:
    • Max Steps: This limits the number of actions your Academy will let each Agent execute before resetting itself. In our current example, we can leave this at 0, because we are only doing a single step. By setting it to zero, our agent will continue forever until Done is called.
    • Training Configuration: In any ML problem, we often break the problem into a training and test set.  This allows us to build an ML or agent model on a training environment or dataset. Then, we can take the trained ML and exercise it on a real dataset using inference.  The Training configuration section is where we will configure the environment for training.
    • Infrerence Configuration: Inference is where we infer or exercise our model against a previously unseen environment or dataset. This configuration area is where we set parameters when our ML is running in this type of environment.

The Academy setup is quite straightforward for this simple example. We will get to the more complex options in later chapters, but do feel free to expand the options and look at the properties.

Setting up the Agent

Agents represents the actors that we are training to learn to perform some task or set of task-based commands on some reward. We will cover more about actors, actions, state, and rewards when we talk more about Reinforcement Learning in Chapter 2, The Bandit and Reinforcement Learning. For now, all we need to do is set the Brain the agent will be using. Open up the editor and follow these steps:

  1. Locate the Agent object in the Hierarchy window and select it.
  1. Click the Target icon beside the Brain property on the Simple Agent component and select the Brain object in the scene, as shown in the following screenshot:

Setting the Agent Brain

  1. Click the Target icon on the Simple Agent component and from the context menu select Edit Script. The agent script is what we use to observe the environment and collect observations. In our current example, we always assume that there is no previous observation.
  2. Enter the highlighted code in the CollectObservations method as follows:
      public override void CollectObservations()
  1. CollectObservations is the method called to set what the Agent observes about the environment. This method will be called on every agent step or action. We use AddVectorObs to add a single float value of 0 to the agent's observation collection. At this point, we are not currently using any observations and will assume our bandit provides no visual clues as to what arm to pull.   The agent will also need to evaluate the rewards and when they are collected. We will need to add four slots, one for each arm to our agent, in order to represent the reward when that arm is pulled.
  2. Enter the following code in the SimpleAgent class:
      public Bandit bandit;
      public override void AgentAction(float[] vectorAction, 
      string textAction)
        var action = (int)vectorAction[0];

      public override void AgentReset()
  1. The code in our AgentStep method just takes the current action and applies that to the Bandit with the PullArm method, passing in the arm to pull. The reward returned from the bandit is added using AddReward. After that, we implement some code in the AgentReset method. This code just resets the Bandit back to its starting state. AgentReset is called when the agent is done, complete, or runs out of steps. Notice how we call the method Done after each step; this is because our bandit is only a single state or action.
  2. Add the following code just below the last section:
      public Academy academy;
      public float timeBetweenDecisionsAtInference;
      private float timeSinceDecision;

      public void FixedUpdate()

      private void WaitTimeInference()
        if (!academy.GetIsInference())
          if (timeSinceDecision >= timeBetweenDecisionsAtInference)
            timeSinceDecision = 0f;
            timeSinceDecision += Time.fixedDeltaTime;
  1. We need to add the preceding code in order for our brain to wait long enough for it to accept Player decisions. Our first example that we will build will use player input. Don't worry too much about this code, as we only need it to allow for player input. When we develop our Agent Brains, we won't need to put a delay in.
  2. Save the script when you are done editing.
  3. Return to the editor and set the properties on the Simple Agent, as shown in the following screenshot:

Setting the Simple Agent properties

We are almost done. The agent is now able to interpret our actions and execute them on the Bandit. Actions are sent to the agent from the Brain. The Brain is responsible for making decisions and we will cover its setup in the next section.

Setting up the Brain

We have seen the basics of how a Brain functions when we looked at the earlier Unity example. There are a number of different types of brains from Player, Heuristic, Internal, and External. For our simple example, we are going to set up a Player brain. Follow these steps to configure the Brain object to accept input from the player:

  1. Locate the Brain object in the Hierarchy window; it is a child of the Academy.
  2. Select the Brain object and set the Player inputs, as shown in the following screenshot:

Setting the Player inputs on the Brain

  1. Save your scene and project.
  2. Press Play to run the scene. Type any of the keys A, S, D, or F to pull each of the arms from 1 to 4. As you pull the arm, the Bandit will change color based on the reward. This is a very simple game and a human pulling the right arm each time should be a fairly simple exercise.

Now, we have a simple Player brain that lets us test our simple four armed bandit. We could take this a step further and implement a Heuristic brain, but we will leave that as an exercise to the reader. For now though, until we get to the next chapter, you should have enough to run with to get comfortable with some of the basic concepts of ML-Agents.


 Complete these exercises on your own for additional learning:

  1. Change the materials the agent uses to signal a reward – bonus points if you create a new material.
  2. Add an additional arm to the Bandit.
  3. In our earlier cannon example, we used a Linear Regression ML algorithm to predict the velocity needed for a specific distance. As we saw, our cannon problem could be better fit with another algorithm. Can you pick a better method to do this regression? 


Access to Excel can make this fairly simple.

  1. Implement a SimpleDecision script that uses a Heuristic algorithm to always pick the best solution.


You can look at the 3DBall example we looked at earlier.  You will need to add the SimpleDecision script to the Brain in order to set a Heuristics brain.



We covered the basics about Machine Learning and ML-Agents in this chapter by starting to introduce Machine Learning and the more common learning models, including Reinforcement Learning. After that, we looked at a game example with a cannon, where simple ML can be applied to solve the velocity required to strike a specific distance. Next, we quickly introduced ML-Agents and pulled the required code down from GitHub. This allowed us to run one of the more interesting examples in this book and explore the inner workings of the Heuristics brain. Then, we laid the foundations for a simple scene and set up the environment we will use over the next couple of chapters. Finally, we completed the chapter by setting up a simple Academy, Agent, and Brain, which were used to operate a multi-armed bandit using a Player brain.

In the next chapter, we will continue with our Bandit example and extend the problem to a contextual bandit, which is our first step toward Reinforcement Learning and building ML algorithms.



About the Author

  • Micheal Lanham

    Micheal Lanham is a proven software and tech innovator with 20 years of experience. During that time, he has developed a broad range of software applications in areas including games, graphics, web, desktop, engineering, artificial intelligence, GIS, and machine learning applications for a variety of industries as an R&D developer. At the turn of the millennium, Micheal began working with neural networks and evolutionary algorithms in game development. He was later introduced to Unity and has been an avid developer, consultant, manager, and author of multiple Unity games, graphic projects, and books ever since.

    Browse publications by this author

Latest Reviews

(6 reviews total)
pertect content and extraordinary experience
In general a good explanation of ML-Agents concepts. The Book is based on a pre 0.6 version of ML-Agents, though. Therefore the projects/examples in the it are bit outdated regarding the workflow. If your using an ML-Agents version >= 0.6 you might have a hard time following along. Please update the book to resemble the latest version of ML-Agents.
This book is a good read.

Recommended For You

Book Title
Access this book and the full library for FREE
Access now