Computer vision is one of the most important concepts in machine learning and artificial intelligence. With the wide use of smart phones for capturing, sharing, and uploading images every day, the amount of data generated through images is increasing exponentially. So, the need for experts specializing in the field of computer vision is at an all-time high. Industries such as the health care industry are on the verge of a revolution due to the progress made in the field of medical imaging. This chapter introduces you to computer vision and the various industries in which computer vision is used. You will also learn about Convolutional Neural Networks (CNNs), which are the most widely used neural networks for image processing. Like neural networks, CNNs are also made up of neurons. The neurons receive inputs that are processed using weighted sums and activation functions. However, unlike ANNs, which use vectors as inputs, a CNN uses images as its input. In this chapter, we will...
To understand computer vision, let's first understand what human vision is. Human vision is the ability of the human eye and brain to see and recognize objects. Computer vision is the process of giving a machine a similar, if not better, understanding of seeing and identifying objects in the real world. It is fairly simple for a human eye to precisely identify whether an animal is a tiger or a lion. But it takes a lot of training for a computer system to understand such objects distinctly. Computer vision can also be defined as building mathematical models that can mimic the function of a human eye and brain. Basically, it is about training computers to understand and process images and videos.
Computer vision is an integral part of many cutting-edge areas of robotics: health care and medical (X-ray, MRI scans, CT scans, and so on), drones, self-driving cars, sports and recreation, and so on. Almost all business need computer vision to run successfully. Imagine the large amount...
Convolutional Neural Networks
When you talk about computer vision, you talk about CNNs in the same breath. A CNN is a class of deep neural network that is mostly used in the field of computer vision and imaging. CNNs are used to identify images, cluster them by their similarity, and implement object recognition within scenes. A CNN has different layers, namely the input layer, the output layer, and multiple hidden layers. These hidden layers of a CNN consist of fully connected layers, convolutional layers, a RELU layer as an activation function, normalization layers, and pooling layers. On a very simple level, CNNs help to identify images and label them appropriately; for example, a tiger image will be identified as a tiger:
An example of a CNN classifying a tiger:
The main components of a CNN architecture are as follows:
Input image
Convolutional layer
Pooling layer
Flattening
An input image forms the first component of a CNN architecture. An image can be of any type: a human, an animal, scenery, a medical X-ray image, and so on. Each image is converted into a mathematical matrix of zeros and ones. At a very high level, the following figure explains how a computer views an image of the letter T. All the blocks that have a value of one represent the data, while the zeros represents blank space:
The convolution layer is the place where the image processing starts. A convolution layer consists of two steps:
Feature detector or filter
Feature map
Feature detector or filter: This is a matrix or pattern that you put on an image to transform it into a feature map:
Now, as highlighted, this feature detector is put (superimposed) on the original image and...
The word augmentation means the action or process of making or becoming greater in size or amount. Image or data augmentation works in a similar manner. Image/data augmentation creates many batches of our images. Then, it applies random transformations on random images inside the batches. Data transformation can be rotating images, shifting them, flipping them, and so on. By applying this transformation, we get more diverse images inside the batches, and we also have much more data than we had originally.
A cylinder can be rotated from different angles and seen differently. In the following figure, a single cylinder is seen from five different angles. So, we have effectively created five different images from a single image:
The following is example code of image augmentation; here, the ImageDataGenerator class is used for processing. shear_range, zoom_range, and horizontal_flip are all used for the transformation of images:
In this chapter, we studied why we need computer vision and how it works. We understood why computer vision is one of the hottest fields in machine learning. Then, we worked with convolutional neural networks, their architecture, and how we can build CNNs in real-life applications. We also tried to improve our algorithms by adding more ANN and CNN layers and by changing activation and optimizer functions. We also tried different activation functions and loss functions. In the end, we were able to successfully classify new images of cats and dogs through the algorithm. Remember, the images of dogs and cats can be substituted with any other images, such as tigers and deer, or MRI scans of brains with and without a tumor. Any binary-classification computer-imaging problem can be solved with the same approach.
In the next chapter, we will study an even more efficient technique for working on computer vision, which is less time-consuming and easier to implement.