Reader small image

You're reading from  Artificial Intelligence for Robotics - Second Edition

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805129592
Edition2nd Edition
Concepts
Right arrow
Author (1)
Francis X. Govers III
Francis X. Govers III
author image
Francis X. Govers III

Francis X. Govers III is an Associate Technical Fellow for Autonomy at Bell Textron, and chairman of the Textron Autonomy Council. He is the designer of over 30 unmanned vehicles and robots for land, sea, air, and space, including RAMSEE, the autonomous security guard robot. Francis helped lead the design of the International Space Station, the F-35 JSF Fighter, the US Army Future Combat Systems, and telemetry systems for NASCAR and IndyCar. He is an engineer, pilot, author, musician, artist, and maker. He received five outstanding achievement awards from NASA and recognition from Scientific American for World Changing Ideas. He has a Master of Science degree from Brandeis University and is a veteran of the US Air Force.
Read more about Francis X. Govers III

Right arrow

Picking Up and Putting Away Toys using Reinforcement Learning and Genetic Algorithms

This chapter is where the robots start to get challenging – and fun. What we want to do now is have the robot’s manipulator arm start picking up objects. Not only that, but instead of preprogramming arm moves and grasping actions, we want the robot to be able to learn how to pick up objects, and how to move its arm without hitting itself.

How would you teach a child to pick up toys in their room? Would you offer a reward for completing the task, such as “If you pick up your toys, you will get a treat?” Or would you offer a threat of punishment, such as “If you don’t pick up your toys, you can’t play games on your tablet.” This concept, offering positive feedback for good behavior and negative feedback for undesirable actions, is called reinforcement learning. That is one of the ways we will train our robot in this chapter.

If this sounds...

Technical requirements

The exercise in this chapter does not require any new software or tools that we haven’t already seen in previous chapters. We will start by using Python and ROS 2. You will need an IDE for Python (IDLE or Visual Studio Code) to edit the source code.

Since this chapter is all about moving the robot arm, you will need a robot arm to execute the code. The one I used is the LewanSoul Robot xArm, which I purchased from Amazon.com. This arm uses digital servos, which makes the programming much easier, and provides us with position feedback, so we know what position the arm is in. The arm I purchased can be found at http://tinyurl.com/xarmRobotBook at the time of publication.

Note

If you don’t want to buy a robot arm (or can’t), you can run this code against a simulation of a robot arm using ROS 2 and Gazebo, a simulation engine. You can find instructions at https://community.arm.com/arm-research/b/articles/posts/do-you-want-to-build-a...

Task analysis

Our tasks for this chapter are pretty straightforward. We will use a robot arm to pick up the toys we identified in the previous chapter. This can be divided into the following tasks:

  • First, we build an interface to control the robot arm. We are using ROS 2 to connect the various parts of the robot together, so this interface is how the rest of the system sends commands and receives data from the arm. Then we get into teaching the arm to perform its function, which is picking up toys. The first level of capability is picking up or grasping toys. Each toy is slightly different, and the same strategy won’t work every time. Also, the toy might be in different orientations, so we have to adapt to how the toy is presented to the robot’s end effector (a fancy name for its hand). So rather than write a lot of custom code that may or may not work all the time, we want to create a structure so that the robot can learn for itself.
  • The next problem we face...

Designing the software

The first steps in designing the robot arm control software are to establish a coordinate frame (how we measure movement), after which we set up our solution space by creating states (arm positions) and actions (movements that change positions). The following diagram shows the coordinate frame for the robot arm:

Figure 5.2 – Robot arm coordinate frame

Figure 5.2 – Robot arm coordinate frame

Let’s define the coordinate frame of the robot – our reference that we use to measure movement – as shown in the preceding diagram. The X direction is toward the front of the robot, so movement forward and backward is along the X-axis. Horizontal movement (left or right) is along the Y-axis. Vertical movement (up and down) is in the Z direction. We place the zero point – the origin of our coordinates – down the center of the robot arm with zero Z (Z=0) on the floor. So, if I say the robot hand is moving positively in X, then it is moving away...

Setting up the solution

We will call the act of setting the motors to a different position an action, and we will call the position of the robot arm and hand the state. An action applied to a state results in the arm being in a new state.

We are going to have the robot associate states (a beginning position of the hand) and an action (the motor commands used when at that state) with the probability of generating either a positive or negative outcome – we will be training the robot to figure out which sets of actions result in maximizing the reward. What’s a reward? It’s just an arbitrary value that we use to define whether the learning the robot accomplished was positive – something we wanted – or negative – something we did not want. If the action resulted in positive learning, then we increment the reward, and if it does not, then we decrement the reward. The robot will use an algorithm to both try and maximize the reward, and to incrementally...

Creating the interface to the arm

As previously noted, we are using ROS 2 as our interface service, which creates a Modular Open System Architecture (MOSA). This turns our components into plug-and-play devices that can be added, removed, or modified, much like the apps on a smartphone. The secret to making that happen is to create a useful, generic interface, which we will do now.

Note

I’m creating my own interface to ROS 2 that is just for this book. We won’t be using any other ROS packages with this arm – just what we create, so I wanted the very minimum interface to get the job done.

We’ll be creating this interface in Python. Follow these steps:

  1. First, create a package for the robot arm in ROS 2. A package is a portable organization unit for functionality in ROS 2. Since we have multiple programs and multiple functions for the robot arm, we can bundle them together:
    cd ~/ros2_ws/src
    ros2 pkg create –build-type ament-cmake ros_xarm...

Introducing Q-learning for grasping objects

Training a robot arm end effector to pick up an oddly shaped object using the Q-learning RL technique involves several steps. Here’s a step-by-step explanation of the process:

  1. Define the state space and action space:
    • Define the state space: This includes all the relevant information about the environment and the robot arm, such as the position and orientation of the object, the position and orientation of the end effector, and any other relevant sensor data
    • Define the action space: These are the possible actions the robot arm can take, such as rotating the end effector, moving it in different directions, or adjusting its gripper
  2. Set up the Q-table: Create a Q-table that represents the state-action pairs and initialize it with random values. The Q-table will have a row for each state and a column for each action. As we test each position that the arm moves to, we will store the reward that was computed by the Q-learning equation...

Introducing GAs

Moving the robot arm requires the coordination of three motors simultaneously to create a smooth movement. We need a mechanism to create different combinations of motor movement for the robot to test. We could just use random numbers, but that would be inefficient and could take thousands of trials to get to the level of training we want.

What if we had a way of trying different combinations of motor movement, and then pitting them against one another to pick the best one? It would be a sort of Darwinian survival of the fittest for arm movement scripts – such as a GA process. Let’s explore how we can apply this concept to our use case.

Understanding how the GA process works

Here are the steps involved in our GA process:

  1. We do a trial run to go from position 1 (neutral carry) to position 2 (pickup). The robot moves the arm 100 times before getting the hand into the right position. Why 100? We need a large enough sample space to allow the...

Alternative robot arm ML approaches

The realm of robot arm control via machine learning is really just getting started. There are a couple of research avenues I wanted to bring to your attention as you look for further study. One way to approach our understanding of robot movement is to consider the balance between exploitation and exploration. Exploitation is getting the robot to its goal as quickly as possible. Exploration is using the space around the robot to try new things. The path-planning program may have been stuck on a local minimum (think of this as a blind alley), and there could be better, more optimal solutions available that had not been considered.

There is also more than one way to teach a robot. We have been using a form of self-exploration in our training. What if we could show the robot what to do and have it learn by example? We could let the robot observe a human doing the same task, and have it try to emulate the results. Let’s discuss some alternative...

Summary

Our task for this chapter was to use machine learning to teach the robot how to use its robot arm. We used two techniques with some variations. We used a variety of reinforcement learning techniques, or Q-learning, to develop a movement path by selecting individual actions based on the robot’s arm state. Each motion was scored individually as a reward, and as part of the overall path as a value. The process stored the results of the learning in a Q-matrix that could be used to generate a path. We improved our first cut of the reinforcement learning program by indexing, or encoding, the motions from a 27-element array of possible combinations of motors as numbers from 0 to 26, and likewise indexing the robot state to a state lookup table. This resulted in a 40x speedup of the learning process. Our Q-learning approach struggled with the large number of states that the robot arm could be in.

Our second technique was a GA. We created individual random paths to make a...

Questions

  1. In Q-learning, what does the Q stand for?

    Hint: You will have to research this yourself.

  2. What could we do to limit the number of states that the Q-learning algorithm has to search through?
  3. What effect does changing the learning rate have on the learning process?
  4. What function or parameter serves to penalize longer paths in the Q-learning equation? What effect does increasing or decreasing this function have?
  5. In the genetic algorithm, how would you go about penalizing longer paths so that shorter paths (fewer number of steps) would be preferred?
  6. Look up the SARSA variation of Q-learning. How would you implement the SARSA technique into program 2.
  7. What effect does changing the learning rate in the genetic algorithm have? What are the upper and lower bounds of the learning rate?
  8. In a genetic algorithm, what effect does reducing the population have?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Artificial Intelligence for Robotics - Second Edition
Published in: Mar 2024Publisher: PacktISBN-13: 9781805129592
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Francis X. Govers III

Francis X. Govers III is an Associate Technical Fellow for Autonomy at Bell Textron, and chairman of the Textron Autonomy Council. He is the designer of over 30 unmanned vehicles and robots for land, sea, air, and space, including RAMSEE, the autonomous security guard robot. Francis helped lead the design of the International Space Station, the F-35 JSF Fighter, the US Army Future Combat Systems, and telemetry systems for NASCAR and IndyCar. He is an engineer, pilot, author, musician, artist, and maker. He received five outstanding achievement awards from NASA and recognition from Scientific American for World Changing Ideas. He has a Master of Science degree from Brandeis University and is a veteran of the US Air Force.
Read more about Francis X. Govers III