Chapter 7. Programming Vision Sensors Using Python and ROS
In the previous chapter, we have seen some of the robotic sensors used in our robot and its interfacing with the Launchpad board. In this chapter, we will mainly discuss vision sensors and its interface used in our robot.
The robot we are designing will have a 3D sensor and we can interface it with vision libraries such as OpenCV, OpenNI, and
Point Cloud Library (PCL). Some of the applications of the 3D vision sensor in our robot are autonomous navigation, obstacle avoidance, object detection, people tracking, and so on.
We will also discuss the interfacing of vision sensors and image processing libraries with ROS. In the last section of the chapter, we will see a navigational algorithm for our robot called SLAM (Simultaneous Localization and Mapping) and its implementation using a 3D sensor, ROS, and image processing libraries.
In the first section, we will see some 2D and 3D vision sensors available on the market that we will use...
List of robotic vision sensors and image processing libraries
A 2D vision sensor or an ordinary camera delivers 2D image frames of the surroundings, whereas a 3D vision sensor delivers 2D image frames and an additional parameter called depth of each image point. We can find the x, y, and z distance of each point from the 3D sensor with respect to the sensor axis.
There are quite a few vision sensors available on the market. Some of the 2D and 3D vision sensors that can be used in our robot are mentioned in this chapter.
The following figure shows the latest 2D vision sensor called Pixy/CMU cam 5 (http://www.cmucam.org/), which is able to detect color objects with high speed and accuracy and can be interfaced to an Arduino board. Pixy can be used for fast object detection and the user can teach which object it needs to track. Pixy module has a CMOS sensor and NXP (http://www.nxp.com/) processor for image processing:
The commonly available 2D vision sensors are webcams. They contain...
Introduction to OpenCV, OpenNI, and PCL
Let's discuss about the software frameworks and libraries that we are using in our robots. First, we can discuss OpenCV. This is one of the libraries that we are going to use in this robot for object detection and other image processing functionalities.
OpenCV is an open source BSD-licensed computer vision based library that includes hundreds of computer vision algorithms. The library, mainly aimed for real-time computer vision, was developed by Intel Russia research, and is now actively supported by Itseez (http://itseez.com/).
OpenCV is written mainly in C and C++ and its primary interface is in C++. It also has good interfaces in Python, Java, Matlab/Octave and wrappers in other languages such as C# and Ruby.
In the new version of OpenCV, there is support for CUDA and OpenCL to get GPU acceleration (http://www.nvidia.com/object/cuda_home_new.html).
OpenCV will run on most of the OS platforms (such as Windows, Linux, Mac OS...
Programming Kinect with Python using ROS, OpenCV, and OpenNI
Let's look at how we can interface and work with the Kinect sensor in ROS. ROS is bundled with OpenNI driver, which can fetch RGB and the depth image of Kinect. This package can be used for Microsoft Kinect, PrimeSense Carmine, Asus Xtion Pro, and Pro Live.
This driver mainly publishes raw depth, RGB, and IR image streams. The openni_launch
package will install packages such as openni_camera
and openni_launch
. The openni_camera
package is the Kinect driver that publishes raw data and sensor information, whereas the openni_launch
package contains ROS launch files. It's basically an XML file that launches multiple nodes at a time and publishes data such as point clouds.
How to launch OpenNI driver
The following command will open the OpenNI device and load all nodelets to convert raw depth/RGB/IR streams to depth images, disparity images, and point clouds. The ROS nodelet
package is designed to provide a way to run multiple algorithms...
Working with Point Clouds using Kinect, ROS, OpenNI, and PCL
A Point Cloud is a data structure used to represent a collection of multidimensional points and is commonly used to represent 3D data. In a 3D Point Cloud, the points usually represent the x, y, and z geometric coordinates of an underlying sampled surface. When the color information is present, the Point Cloud becomes 4D.
Point Clouds can be acquired from hardware sensors (such as stereo cameras, 3D scanners, or time-of-flight cameras). It can be generated from a computer program synthetically. PCL supports the OpenNI 3D interfaces natively; thus it can acquire and process data from devices (such as Prime Sensor's 3D cameras, Microsoft Kinect, or Asus XTion PRO).
PCL will be installed along with the ROS indigo full desktop installation. Let's see how we can generate and visualize Point Cloud in RViz, a data visualization tool in ROS.
Opening device and Point Cloud generation
Open a new terminal and launch the ROS OpenNI driver along...
Conversion of Point Cloud to laser scan data
We are using Kinect in this robot for replicating the function of expensive laser range scanner. Kinect can deliver Point Cloud data which contains the depth of each point of surrounding. The Point Cloud data is processed and converted to data equivalent to a laser scanner using the ROS depthimage_to_laserscan
package. The main function of this package is to slice a section of the Point Cloud data and convert it to a laser scan equivalent data type. The Pointcloud2
data type is sensor_msgs/PointCloud2
and for the laser scanner, the data type is sensor_msgs/LaserScan
. This package will perform this processing and fake the laser scanner. The laser scanner output can be viewed using RViz. In order to run the conversion, we have to start the convertor nodelets that will perform this operation. We have to specify this in our launch file to start the conversion. The following is the required code in the launch file to start the depthimage_to_laserscan...
Working with SLAM using ROS and Kinect
The main aim of deploying vision sensors in our robot is to detect objects and perform robot navigation in an environment. SLAM is a technique used in mobile robots and vehicles to build up a map of an unknown environment or update a map within a known environment by tracking the current location of a robot.
Maps are used to plan the robot trajectory and to navigate through this path. Using maps, the robot will get an idea about the environment. The main two challenges in mobile robot navigation are mapping and localization.
Mapping involves generating a profile of obstacles around the robot. Through mapping, the robot will understand how the world looks. Localization is the process of estimating a pose of the robot relative to the map we build.
SLAM fetches data from different sensors and uses it to build maps. The 2D/3D vision sensor can be used as an input to SLAM. The 2D vision sensors such as laser range finders and 3D sensors such as Kinect are mainly...
In this chapter, we saw vision sensors to be used in our robot. We used Kinect in our robot and discussed OpenCV, OpenNI, PCL and their application. We also discussed the role of vision sensors in robot navigation and a popular SLAM technique and its application using ROS. In the next chapter, we will discuss speech processing and synthesis to be used in this robot.