Reader small image

You're reading from  Deep Reinforcement Learning Hands-On. - Second Edition

Product typeBook
Published inJan 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838826994
Edition2nd Edition
Languages
Right arrow
Author (1)
Maxim Lapan
Maxim Lapan
author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan

Right arrow

RL in Robotics

This chapter is a bit unusual in comparison to the other chapters in this book for the following reasons:

  • It took me almost four months to gather all the materials, do the experiments, write the examples, and so on
  • This is the only chapter in which we will try to step beyond emulated environments into the physical world
  • In this chapter, we will build a small robot from accessible and cheap components to be controlled using reinforcement learning (RL) methods

This topic is an amazing and fascinating field for many reasons that couldn't be covered in a whole book, much less in a short chapter. So, this chapter doesn't pretend to offer anywhere close to complete coverage of the robotics field. It is just a short introduction that shows what can be done with commodity components, and outlines future directions for your own experiments and research. In addition, I have to admit that I'm not an expert in robotics and have never worked...

Robots and robotics

I'm sure you know what the word "robot" means and have seen them both in real life and in science fiction movies. Putting aside fictitious ones, there are many robots in industry (check "Tesla assembly line" on YouTube), the military (if you haven't seen the Boston Dynamics videos, you should stop reading and check them out), agriculture, medicine, and our homes. Automatic vacuum cleaners, modern coffee machines, 3D printers, and many other applications are examples of specialized mechanisms with complicated logic that are driven by some kind of software. At a high level, all those robots have common features, which we're going to discuss. Of course, this kind of classification is not perfect. As is often the case, there are lots of outliers that might fulfill the given criteria, but could still hardly be considered as robots.

Firstly, robots are connected to the world around them with some kind of sensors or other communication...

The first training objective

Let's now discuss what we want our robot to do and how we're going to get there. It's not very hard to notice that the potential capabilities of the hardware described are quite limited:

  • We have only four servos with a constrained angle of rotation: This makes our robot's movements highly dependent on friction with the surface, as it can't bring its individual legs up, which is also the case with the Minitaur robot, which has two motors attached to every leg.
  • Our hardware capacity is small: The memory is limited, the central processing unit (CPU) is not very fast, and no hardware accelerators are present. In the subsequent sections, we will take a look at how to deal with those limitations to some extent.
  • We have no external connectivity besides a micro-USB port: Some boards might have Wi-Fi hardware, which could be used to offload the NN inference to a larger machine, but in this chapter's example, I'm...

The emulator and the model

In this section, we will cover the process of obtaining the policy that we will deploy on the hardware. As mentioned, we will use a physics emulator (PyBullet in our case) to simulate our robot. I won't describe in detail how to set up PyBullet, as it was covered in the previous chapter. Let's jump into the code and the model definition.

In the previous chapter, we used robot models already prepared for us, like Minitaur and HalfCheetah, which exposed the familiar and simple Gym interface with the reward, observations, and actions. Now we have custom hardware and have formulated our own reward objective, so we need to make everything ourselves. From my personal experiments, it turned out to be surprisingly complex to implement a low-level robot model and wrap it in a Gym environment. There were several reasons for that:

  • PyBullet classes are quite complicated and poorly designed from a software engineer point of view. They contain a lot...

DDPG training and results

To train the policy using our model, we will use deep deterministic policy gradients (DDPGs), which we covered in detail in Chapter 17, Continuous Action Space. I won't spend time here showing the code, which is in Chapter18/train_ddpg.py and Chapter18/lib/ddpg.py. For exploration, the Ornstein-Uhlenbeck process was used in the same way as for the Minitaur model.

The only thing I'd like to emphasize is the size of the model, in which the actor part was intentionally reduced to meet our hardware limitations. The actor has one hidden layer with 20 neurons, giving just two matrices (not counting the bias) of 28×20 and 20×4. The input dimensionality is 28, due to observation stacking, where four past observations are passed to the model. This dimensionality reduction leads to very fast training, which can be done without a GPU involved.

To train the model, you should run the train_ddpg.py program, which accepts the following arguments...

Controlling the hardware

In this section, I will describe how we can use the trained model on the real hardware.

MicroPython

For a very long time, the only option in embedded software development was using low-level languages like C or assembly. There are good reasons behind this: limited hardware capabilities, power-efficiency constraints, and the necessity of dealing with real-world events predictably. Using a low-level language, you normally have full control over the program execution and can optimize every tiny detail of your algorithm, which is great.

The downside of this is complexity in the development process, which becomes tricky, error-prone, and lengthy. Even for hobbyist projects that don't have very high efficiency standards, platforms like Arduino offer a quite limited set of languages, which normally includes C and C++.

MicroPython (http://micropython.org) provides an alternative to this low-level development by bringing the Python interpreter to microcontrollers...

Policy experiments

The first model that I trained was with the Height objective and without zeroing the yaw component. A video of the robot executing the policy is available here: https://www.youtube.com/watch?v=u5rDogVYs9E. The movements are not very natural. In particular, the front-right leg is not moving at all. This model is available in the source tree as Chapter18/hw/libhw/t1.py.

As this might be related to the yaw observation component, which is different during the training and inference, the model was retrained with the --zero-yaw command-line option. The result is a bit better: all legs are now moving, but the robot's actions are still not very stable. The video is here: https://www.youtube.com/watch?v=1JVVnWNRi9k. The model used is in Chapter18/hw/libhw/t1zyh.py.

The third experiment was done with a different training objective, HeightOrient, which not only takes into account the height of the model, but also checks that the body of the robot is parallel to the...

Summary

Thanks for reaching the end! I hope you enjoyed reading this chapter as much as I enjoyed writing it. This field is very interesting; we have just touched on it a little, but I hope that this chapter will show you a direction for your own experiments and projects. The goal of the chapter wasn't building a robot that will stand, as this could be done in a much easier and more efficient way; the true goal was to show how the RL way of thinking can be applied to robotics problems, and how you can do your own experiments with real hardware without having access to expensive robotic arms, complex robots, and so on.

At the same time, I see some potential for the RL approach to be applied to complex robots, and who knows, maybe you will build the next version of iRobot Corporation to bring more robots into our lives. If you are interested in buying the kits for the robot platform described in this chapter, it would be really helpful if you could fill out this form: https:...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Reinforcement Learning Hands-On. - Second Edition
Published in: Jan 2020Publisher: PacktISBN-13: 9781838826994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan