Mastering Computer Vision with TensorFlow 2.x

Computer Vision and TensorFlow Fundamentals

Computer vision is rapidly expanding in many different applications as traditional techniques, such as image thresholding, filtering, and edge detection, have been augmented by deep learning methods. TensorFlow is a widely used, powerful machine learning tool created by Google. It has user configurable APIs available to train and build complex neural network model in your local PC or in the cloud and optimize and deploy at scale in edge devices.

In this chapter, you will gain an understanding of advanced computer vision concepts using TensorFlow. This chapter discusses the foundational concepts of computer vision and TensorFlow to prepare you for the later, more advanced chapters of this book. We will look at how to perform image hashing and filtering. Then, we will learn about various methods of feature extraction and image retrieval. Moving on, we will learn about visual search in applications, its methods, and the challenges we might face. Then, we will look at an overview of the high-level TensorFlow software and its different components and subsystems.

The topics we will be covering in this chapter are as follows:

Detecting edges using image hashing and filtering
Extracting features from an image
Object detection using Contours and the HOG detector
An overview of TensorFlow, its ecosystem, and installation

Detecting edges using image hashing and filtering

Image hashing is a method used to find similarity between images. Hashing involves modifying an input image to a fixed size of binary vector through transformation. There are different algorithms for image hashing using different transformations:

Perpetual hash (phash): A cosine transformation
Difference hash (dhash): The difference between adjacent pixels

After a hash transformation, images can be compared quickly with the Hamming distance. The Python code for applying a hash transformation is shown in the following code. A hamming distance of 0 shows an identical image (duplicate), whereas a larger hamming distance shows that the images are different from each other. The following snippet imports Python packages, such as PIL, imagehash, and distance. imagehash is a Python package that supports various types of hashing algorithms. PIL is a Python imaging library, and distance is a Python package that calculates the hamming distance between two hashed images:

from PIL import Image
import imagehash
import distance
import scipy.spatial
hash1 = imagehash.phash(Image.open(…/car1.png))
hash2 = imagehash.phash(Image.open(…/car2.png))
print hamming_distance(hash1,hash2)

Image filtering is a fundamental computer vision operation that modifies the input image by applying a kernel or filter to every pixel of the input image. The following are the steps involved in image filtering, starting from light entering a camera to the final transformed image:

Using a Bayer filter for color pattern formation
Creating an image vector
Transforming the image
Linear filtering—convolution with kernels
Mixing Gaussian and Laplacian filters
Detecting edges in the image

Using a Bayer filter for color pattern formation

A Bayer filter transforms a raw image into a natural, color-processed image by applying a demosaic algorithm. The image sensor consists of photodiodes, which produce electrically charged photons proportional to the brightness of the light. The photodiodes are grayscale in nature. Bayer filters are used to convert the grayscale image to color. The color image from the Bayer filter goes through an Image Signal Processing (ISP) which involves several weeks of manual adjustment of various parameters to produce desired image quality for human vision. Several research work are currently ongoing to convert the manual ISP to a CNN based processing to produce an image and then merge the CNN with image classification or object detection model to produce one coherent neural network pipeline that takes Bayer color image and detects object with bounding boxes. Details of such work can be found in the 2019 paper by Sivalogeswaran Ratnasingam titled Deep Camera: A Fully Convolutional Neural Network for Image Signal Processing. The link for the paper is shown here: http://openaccess.thecvf.com/content_ICCVW_2019/papers/LCI/Ratnasingam_Deep_Camera_A_Fully_Convolutional_Neural_Network_for_Image_Signal_ICCVW_2019_paper.pdf.

Here is an example of a Bayer filter:

In the preceding diagram, we can observe the following:

The Bayer filter consists of Red (R), Green (G), and Blue (B) channels in a predefined pattern, such that there is twice the number of G channels compared to B and R.
The G, R, and B channels are alternately distributed. Most channel combinations are RGGB, GRGB, or RGBG.
Each channel will only let a specific color to pass through, the combination of colors from different channels produce a pattern as shown in the preceding image.

Creating an image vector

Color images are a combination of R, G, and B. Colors can be represented as an intensity value, ranging from 0 to 255. So, each image can be represented as a three-dimensional cube, with the x and y axis representing the width and height and the z axis representing three color channels (R, G, B) representing the intensity of each color. OpenCV is a library with built-in programming functions written for Python and C++ for image processing and object detection.

We will start by writing the following Python code to import an image, and then we will see how the image can be broken down into a NumPy array of vectors with RGB. We will then convert the image to grayscale and see how the image looks when we extract only one component of color from the image:

import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
image = Image.open('../car.jpeg'). # enter image path in ..
plt.imshow(image)
image_arr = np.asarray(image)  # convert image to numpy array
image_arr.shape

The preceding code will return the following output:

Output:
(296, 465, 4)
gray = cv2.cvtColor(image_arr, cv2.COLOR_BGR2GRAY)
plt.imshow(gray, cmap=‘gray')

The following figure shows the colored image and the corresponding grayscale image based on the preceding transformation:

The following is the Python code that we will use to turn the image into R, G, and B color components:

plt.imshow(image_arr[:,:,0]) # red channel
plt.imshow(image_arr[:,:,1]) # green channel
plt.imshow(image_arr[:,:,2]) # blue channel

The following figure shows the transformed image of the car after extracting only one channel (either R, G, or B):

The preceding figure can be represented as a 3D volume with the following axes:

The x axis, representing the width.
The y axis, representing the height.
Each color channel represents the depth of the image.

Let's take a look at the following figure. It shows the R, G, and B pixel values for the image of the car at different x and y coordinates as a 3D volume; a higher value indicates a brighter image:

Transforming an image

Image transformation involves the translation, rotation, magnification, or shear of an image. If (x,y) is the coordinate of the pixel of an image, then the transformed image coordinate (u,v) of the new pixel can be represented as follows:

Translation: Some examples of translation constant values are c11 = 1, c12 = 0, and c13 = 10; c21 = 0, c22 = 1, and c23 = 10. The resulting equation becomes u = x + 10 and v = y + 10:

Rotation: Some examples of rotation constant values are c11 = 1, c12 =0.5, and c13 = 0; c21 = -0.5, c22 = 1, and c23 = 0.

The resulting equation becomes u = x + 0.5 y and v = -0.5 x + y:

Rotation + Translation: Some examples of rotation and translation combined constant values are c11 = 1, c12 =0.5, and c13 = 10; c21 = -0.5, c22 = 1, and c23 = 10. The resulting equation becomes u = x + 0.5 y + 10 and v = -0.5 x + y +10:

Shear: Some examples of shear constant values are c11 = 10, c12 =0, and c13 = 0; c21 = 0, c22 = 10, and c23 = 0. The resulting equation becomes u = 10 x and v = 10 y:

Image transformation is particularly helpful in computer vision for getting different images from the same image. This helps the computer develop a neural network model that is robust to translation, rotation, and shear. For example, if we only input an image of the front of a car in the convoluted neural network (CNN) during the training phase, the model will not be able to detect the image of a car rotated by 90 degrees during the test phase.

Next, we will discuss the mechanics of the convolution operation and how filters are applied to transform an image.

Linear filtering—convolution with kernels

Convolution in computer vision is a linear algebra operation of two arrays (one of them is an image and the other one is a small array) to produce a filtered image array whose shape is different than the original image array. Convolution is cumulative and associative. It can be represented mathematically as follows:

The preceding formula is explained as follows:

F(x,y) is the original image.
G(x,y) is the filtered image.
U is the image kernel.

Depending on the kernel type, U, the output image will be different. The Python code for the conversion is as follows:

import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
image = Image.open(‘…/carshort.png')
plt.imshow(image)
image_arr = np.asarray(image)  # convert image to numpy array
image_arr.shape 
gray = cv2.cvtColor(image_arr, cv2.COLOR_BGR2GRAY)
plt.imshow(gray, cmap='gray')
kernel = np.array([[-1,-1,-1],
                 [2,2,2],
                 [-1,-1,-1]])
blurimg = cv2.filter2D(gray,-1,kernel)
plt.imshow(blurimg, cmap='gray')

The image output for the preceding code is as follows:

To the left is the input image and to the right is the image obtained by applying a horizontal kernel to the image. A horizontal kernel detects only the horizontal edges, which can be seen by the white streaks of horizontal lines. Details about the horizontal kernel can be seen in the Image gradient section.

The preceding code imports the necessary Python libraries for machine learning and computer vision work, such as NumPy to process an array, cv2 for openCV computer vision work, PIL to process images in the Python code, and Matplotlib to plot results. It then imports the image using PIL and converts it to grayscale using the OpenCV BGr2GRAY scale function. It creates a kernel for edge filtering using a NumPy array, blurs the image using the kernel, and then displays it using the imshow() function.

The filtering operation is broken down into three distinct classes:

Image smoothing
Image gradient
Image sharpening

Image smoothing

In image smoothing, the high-frequency noise from an image is removed by applying low-pass filters, such as the following:

A mean filter
A median filter
A Gaussian filter

This blurs the image and is performed by applying a pixel whose end values do not change sign and do not differ in value appreciably.

Image filtering is typically done by sliding a box filter over an image. A box filter is represented by an kernel divided by (n*m), where n is the number of rows and m is the number of columns. For a 3 x 3 kernel this looks as follows:

Let's say this kernel is applied to the RGB image described previously. For reference, the 3 x 3 image value is shown here:

The mean filter

The mean filter filters the image with an average value after the convolution operation of the box kernel is carried out with the image. The resulting array after matrix multiplication will be as follows:

The mean value is 42 and replaces the center intensity value of 166 in the image, as you can see in the following array. The remaining values of the image will be converted in a similar manner:

The median filter

The median filter filters the image value with the median value after the convolution operation of the box kernel is carried out on the image. The resulting array after matrix multiplication will be as follows:

The median value is 48 and replaces the center intensity value of 166 in the image, as shown in the following array. The remaining values of the image will be converted in a similar manner:

The Gaussian filter

The Gaussian kernel is represented by the following equation:

is the standard deviation of the distribution and k is the kernel size.

For the standard deviation () of 1, and the 3 x 3 kernel (k=3), the Gaussian kernel looks as follows:

In this example, when the Gaussian kernel is applied, the image is transformed as follows:

So, in this case, the center intensity value is 54. Compare this value with the median and mean filter values.

Image filtering with OpenCV

The image filtering concepts described previously can be better understood by applying a filter to a real image. OpenCV provides a method to do that. The OpenCV code we will use can be found at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0/blob/master/Chapter01/Chapter1_imagefiltering.ipynb.

The important code is listed in the following snippet. After importing the image, we can add noise. Without noise, the image filtering effect can not be visualized very well. After that, we need to save the image. This is not necessary for the mean and Gaussian filter, but if we don't save the image with the median filter and import it back again, Python displays an error.

Note that we use plt.imsave to save the image, rather than OpenCV. A direct save using imwrite will result in a black image as the image needs to be normalized to a 255 scale before saving. plt.imsave does not have that restriction.

After this, we use blur, medianBlur, and GaussianBlur to convert the image using the mean, median, and Gaussian filters:

img = cv2.imread('car.jpeg')
imgnoise = random_noise(img, mode='s&p',amount=0.3)
plt.imsave("car2.jpg", imgnoise)
imgnew = cv2.imread('car2.jpg')
meanimg = cv2.blur(imgnew,(3,3))
medianimg = cv2.medianBlur(imgnew,3)
gaussianimg = cv2.GaussianBlur(imgnew,(3,3),0)

The following figure shows the resulting image plotted using matplotlib pyplot:

Note that in each of the three cases, the filter removes the noise from the image. In this example, it appears the median filter is the most effective of the three methods in removing the noise from the image.

Image gradient

The image gradient calculates the change in pixel intensity in a given direction. The change in pixel intensity is obtained by performing a convolution operation on an image with a kernel, as shown here:

The kernel is chosen such that the two extreme rows or columns have opposite signs (positive and negative) so it produces a difference operator when multiplying and summing across the image pixel. Let's take a look at the following example:

The horizontal kernel:

The vertical kernel:

The image gradient described here is a fundamental concept for computer vision:

The image gradient can be calculated in both the x and y directions.
By using the image gradient, edges and corners are determined.
The edges and corners pack a lot of information about the shape or feature of an image.
So, the image gradient is a mechanism that converts lower-order pixel information to higher-order image features, which is used by convolution operation for image classification.

Image sharpening

In image sharpening, the low-frequency noise from an image is removed by applying a high-pass filter (difference operator), which results in the line structure and edges becoming more visible. Image sharpening is also known as a Laplace operation, which is represented by the second derivative, shown here:

Because of the difference operator, the four adjacent cells relative to the midpoint of the kernel always have opposite signs. So, if the midpoint of the kernel is positive, the four adjacent cells are negative, and vice versa. Let's take a look at the following example:

Note that the advantage of the second-order derivative over the first-order derivative is that the second-order derivative will always go through zero crossings. So, the edges can be determined by looking at the zero-crossing point (0 value) rather than the magnitude of the gradients (which can change from image to image and within a given image) for the first-order gradient.

Mixing the Gaussian and Laplacian operations

So far, you have learned that the Gaussian operation blurs the image and the Laplacian operation sharpens the image. But why do we need each operation, and in what situation is each operation used?

An image consists of characteristics, features, and other non-feature objects. Image recognition is all about extracting features from an image and eliminating the non-feature objects. We recognize an image as a particular object, such as a car, because its features are more prominent compared to its non-features. Gaussian filtering is the method of suppressing the non-features from the features, which blurs the image.

Applying it multiple times blurs the image more and suppresses both the features and the non-features. But since the features are stronger, they can be extracted by applying Laplacian gradients. This is the reason why we convolve two or more times with a Gaussian kernel of sigma and then apply the Laplacian operation to distinctly show the features. This is a common technique used in most convolution operations for object detection.

The following figure shows the input 3 x 3 image section, the kernel value, the output value after the convolution operation, and the resulting image:

The preceding figure shows various Gaussian and oblique kernels and how a 3 x 3 section of the image is transformed by applying the kernel. The following figure is a continuation of the preceding one:

The preceding representation clearly shows how the image becomes more blurred or sharp based on the type of convolution operation. This comprehension of the convolution operation is fundamental as we learn more about using the CNN to optimize kernel selection in various stages of the CNN.

Detecting edges in an image

Edge detection is the most fundamental way of processing in computer vision to find features in an image based on the change in brightness and image intensity. A change in brightness results from discontinuity in depth, orientation, illumination, or corners. The edge detection method can be based on the first or second order:

The following graph illustrates the edge detection mechanism graphically:

Here, you can see that the intensity of the image changes from dark to bright around the midway point, so the edge of the image is at the middle point. The first derivative (the intensity gradient) goes up and then down at the midway point, so the edge detection can be calculated by looking at the maximum value of the first derivative. However, the problem with the first derivative method is, depending on the input function, the maximum value can change, so the threshold value of the maximum value cannot be predetermined. However, the second derivative, as shown, always goes through zero points at the edges.

Sobel and Canny are the first-order edge detection methods, while the second-order method is a Laplacian edge detector.

The Sobel edge detector

The Sobel operator detects edges by calculating the gradient (Sobelx and Sobely in the following code) of the image intensity function. The gradient is calculated by applying a kernel to the image. In the following code, the kernel size (ksize) is 5. After this, the Sobel gradient (SobelG) is calculated by taking the ratio of the gradients (sobely/sobelx):

Sobelx=cv2.Sobel(gray,cv2.CV_64F,1,0,ksize=5)
Sobely=cv2.Sobel(gray,cv2.CV_64F,0,1,ksize=5)
mag,direction = cv2.cartToPolar(sobelx,sobely,angleInDegrees =True)
sobelG = np.hypot(sobelx,sobely)

The Canny edge detector

The Canny edge detector uses a two-dimensional Gaussian filter to remove the noise, then applies Sobel edge detection with non-maximum suppression to pick out the maximum ratio value between the x and y gradients at any pixel point and, finally, applies edge thresholding to detect whether or not there is an edge. The following code shows Canny edge detection on a grayscale image. The min and max values are the thresholding values that compare the image gradient to determine the edges:

Canny = cv2.Canny(gray,minVal=100,maxVal=200)

The following figure shows the image of the car after applying Sobel-x, Sobel-y, and the Canny edge detector:

As we can see, Canny performs much better than Sobel in detecting the car. This is because Canny uses a two-dimensional Gaussian filter to remove the noise, then applies Sobel edge detection with non-maximum suppression to pick out the maximum ratio value between the x and y gradients at any pixel point and, finally, applies edge thresholding to detect whether or not there is an edge.

Extracting features from an image

Once we know how to detect edges, the next task is to detect features. Many edges combine to form features. Feature extraction is the process of recognizing visual patterns in an image and extracting any discriminating local features that match with the image of an unknown object. Before performing feature extraction, it is important to understand the image histogram. An image histogram is the distribution of the color intensity of the image.

An image feature matches with the test image if the histograms are similar. The following is the Python code used to create an image histogram of the car:

import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
image = Image.open('../car.png')
plt.imshow(image)
image_arr = np.asarray(image) # convert image to numpy array
image_arr.shape
color = ('blue', 'green', 'red')
for i,histcolor in enumerate(color):
 carhistogram = cv2.calcHist([image_arr],[i],None,[256],[0,256])
 plt.plot(carhistogram,color=histcolor)
 plt.xlim([0,256])

The preceding Python code first imports the necessary Python libraries, such as cv2 (OpenCV), NumPy (for array calculation), PIL (to import an image), and Matplotlib (to plot graphs). After that, it converts the image into an array and loops through each color and plots the histogram for each color (R, G, and B).

The following graph shows the histogram output of the car image. The x axis represents the color intensity value from 0 (black) to 256 (white) and the y axis represents the frequency of occurrence:

The histogram shows that the peak color intensity of R, G, and B is at around 100, with a second peak at around 150. This means that the average color of the car is gray. A frequency of 0 at an intensity value of 200 (seen on the far-right side of the image) shows that the car is definitely not white. Similarly, a frequency of 0 at an intensity value of 50 shows that the image is not completely black.

Image matching using OpenCV

Image matching is a technique that matches two different images to find common features. Image matching techniques have many practical applications, such as matching fingerprints, matching your carpet color with your floor or wall color, matching photographs to find two images of the same person, or comparing manufacturing defects to group them into similar categories for faster analysis. This section provides a high-level overview of the image matching techniques available in OpenCV. Two commonly used methods are described here: BruteForce (BFMatcher) and Fast Library for Approximate Nearest Neighbors (FLANN). Later on in this book, we will also discuss other types of matching techniques, such as histogram matching and local binary pattern in Chapter 2, Content Recognition Using Local Binary Pattern, and visual search in Chapter 6, Visual Search Using Transfer Learning.

In BFMatcher, the hamming distance between the test image and every section of the target image is compared for the best possible match. FLANN, on the other hand, is faster but will only find the approximate nearest neighbors—so, it finds good matches, but not necessarily the best possible one. The KNN tool assumes that similar things are flocked next to each other. It finds the first approximate nearest neighbors based on the distance between the target and the source. The Python code for image matching can be found at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0/blob/master/Chapter01/Chapter1_SIFT.ipynb.

Note in the following figure that BFMatcher finds a more similar image. This figure is the output returned by the preceding code (preface_SIFT.ipynb). Let's have a look:

The preceding figure shows how we can apply BFMatcher and FLANN's KNN matcher to match a single tile to the whole bathroom floor. It is clear that BFMatcher (the blue lines) finds more tile points compared to the FLANN matcher (the red lines).

The image matching technique described preceding can also be used to find relative distance between two points - one point can be a reference point such as your car from where the image is taken and the other one can be another car in the road. Such distance is then used to develop a collision avoidance system.

Object detection using Contours and the HOG detector

Contours are closed regions within an image that has a similar shape. In this section, we will use Contours to classify and detect simple objects within an image. The image we will use consists of apples and oranges and we will use the Contour and the Canny edge detection method to detect the object and write the image class name on the bounding box. The code for this section can be found at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0/blob/master/Chapter01/Chapter1_contours_opencv_object_detection_HOG.ipynb.

The methodology is described in the following subsections.

Contour detection

We first need to import the image and then use the Canny edge detector to find the edges in the image. This works very well as the shape of our object is a circle with a rounded edge. The following is the detailed code required:

threshold =100
canny_output = cv2.Canny(img, threshold, threshold * 2)
contours, hierarchy = cv2.findContours(canny_output, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

As the preceding code shows, after Canny edge detection, we applied the OpenCV findContours() method. This method has three arguments:

The image, which in our case is a Canny edge detector output.
The retrieval method, which has many options. The one we are using is an external method as we are interested in drawing a bounding box around the object.
The contour approximation method.

Detecting a bounding box

This method essentially consists of understanding the characteristics of an image and its various classes and developing methods to classify the image class.

Note that the OpenCV method does not involve any training. For every contour, we define a bounding box using the OpenCV boundingRect property.

We will be using two important characteristics to select the bounding box:

The size of the region of interest: We will eliminate all contours with a size that is less than 20.

Note that 20 is not a universal number, it just works for this image. For a larger image, the value can be larger.

The color of the region of interest: Within each bounding box, we need to define the region of interest from 25% to 75% of the width to ensure that we do not consider the empty region of rectangles outside the circle. This is important to minimize variation. Next, we define the mean color by using CV2.mean.

We will determine the color's mean and max thresholds by looking at the three images of oranges that encircle it. The following code uses OpenCV's built-in method to draw the bounding box using cv2.boundingRect. It then draws a region of interest (ROI) based on the width and height selection and finds the mean color within the region:

count=0
font = cv2.FONT_HERSHEY_SIMPLEX
for c in contours:
    x,y,w,h = cv2.boundingRect(c)
    if (w >20 and h >20):
        count = count+1
        ROI = img[y+int(h/4):y+int(3*h/4), x+int(h/4):x+int(3*h/4)]
        ROI_meancolor = cv2.mean(ROI)
        print(count,ROI_meancolor)
        if (ROI_meancolor[0] > 30 and ROI_meancolor[0] < 40 and ROI_meancolor[1] > 70 and ROI_meancolor[1] < 105
            and ROI_meancolor[2] > 150 and ROI_meancolor[2] < 200):
                cv2.putText(img, 'orange', (x-2, y-2), font, 0.8, (255,255,255), 2, cv2.LINE_AA)
                cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,255),3)
                cv2.imshow('Contours', img)
        else:
                cv2.putText(img, 'apple', (x-2, y-2), font, 0.8, (0,0,255), 2, cv2.LINE_AA)
                cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),3)
                cv2.imshow('Contours', img)

In the preceding code, pay attention to the two if statements—size-based (w,h) and color-based (ROI_meancolor[0,1,2]):

The size-based statement eliminates all contours of less than 20.
ROI_meancolor [0,1,2] indicates the RGB value of the mean color.

Here, the third, fourth, and eighth lines represent the orange and the if statement constrains the color to look between 30 and 40 for the B component, 70 and 105 for the G component, and 150 and 200 for the R component.

The output is as follows. In our example, 3, 4, and 8 are oranges:

1 (52.949200000000005, 66.38640000000001, 136.2072, 0.0)
2 (43.677693761814744, 50.94659735349717, 128.70510396975425, 0.0)
3 (34.418282548476455, 93.26246537396122, 183.0893351800554, 0.0)
4 (32.792241946088104, 78.3931623931624, 158.78238001314926, 0.0)
5 (51.00493827160494, 55.09925925925926, 124.42765432098766, 0.0)
6 (66.8863771564545, 74.85960737656157, 165.39678762641284, 0.0)
7 (67.8125, 87.031875, 165.140625, 0.0)
8 (36.25, 100.72916666666666, 188.67746913580245, 0.0)

Note that OpenCV processes images as BGR not RGB.

The HOG detector

The Histogram of Oriented Gradients (HOG) is a useful feature that can be used to determine the localized image intensity of an image. This technique can be used to find objects within an image. The localized image gradient information can be used to find similar images. In this example, we will use scikit-image to import the HOG and use it to plot the HOG of our image. You may have to install scikit-image, if it's not already installed, using pip install scikit-image:

from skimage.feature import hog
from skimage import data, exposure
fruit, hog_image = hog(img, orientations=8, pixels_per_cell=(16, 16),
cells_per_block=(1, 1), visualize=True, multichannel=True)
hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 10))
cv2.imshow('HOG_image', hog_image_rescaled)

The result of the preceding code on our sample image is illustrated in the following figure:

In the preceding figure, we can observe the following:

The left side shows a bounding box, whereas the right side shows the HOG gradients for each object in the image.
Note that each of the apples and oranges is correctly detected, with a bounding box encircling the fruit without any overlap.
The HOG descriptors show a rectangular bounding box with gradients showing the circular patterns.
The gradients between the oranges and apples show a similar pattern, with the only distinction being the size.

Limitations of the contour detection method

The example shown in the previous subsection looks very good in terms of object detection. We did not have to do any training and, with a little adjustment of a few parameters, we were able to detect the oranges and apples correctly. However, we will add the following variations and see if our detector is still able to detect the objects correctly:

We will add objects other than apples and oranges.
We will add another object with a similar shape to apples and oranges.
We will change the light intensity and reflection.

If we execute the same code from the previous subsection, it will detect every object as if it is an apple. This is because the width and height parameters selected were too broad and included all objects as well as the RGB values, which appear differently in this image than before. In order to detect the objects correctly, we will introduce the following changes to the if statements for the size and color, as in the following code:

  if (w >60 and w < 100 and h >60 and h <120):
  if (ROI_meancolor[0] > 10 and ROI_meancolor[0] < 40 and ROI_meancolor[1] > 65 and ROI_meancolor[1] < 105

Note that the preceding changes place constraints on the if statement that were not there before.

The RGB colors are as follows:

1 (29.87429111531191, 92.01890359168242, 182.84026465028356, 0.0) 82 93
2 (34.00568181818182, 49.73605371900827, 115.44163223140497, 0.0) 72 89
3 (39.162326388888886, 62.77256944444444, 148.98133680555554, 0.0) 88 96
4 (32.284938271604936, 53.324444444444445, 141.16493827160494, 0.0) 89 90
5 (12.990362811791384, 67.3078231292517, 142.0997732426304, 0.0) 84 84
6 (38.15, 56.9972, 119.3528, 0.0) 82 100
7 (47.102716049382714, 80.29333333333334, 166.3264197530864, 0.0) 86 90
8 (45.76502082093992, 68.75133848899465, 160.64901844140394, 0.0) 78 82
9 (23.54432132963989, 98.59972299168975, 191.97368421052633, 0.0) 67 76

The result of the preceding code on our changed image is shown here:

A remote control, fork, knife, and a plastic portion cup can be seen in the preceding figure. Note how the HOG features for the apples, oranges, and plastic cup are similar, which is as expected as each of these is circular in shape:

The plastic cup does not have a bounding box around it as it was not detected.
The fork and knife have a very different angular HOG shape compared to the apples and oranges.
The remote control has a rectangular HOG shape.

This simple example suggests that this method of object detection will not work for larger image datasets and we will need to adjust parameters to take into account various lighting, shape, size, and orientation conditions. This is why we will be discussing the CNN throughout the rest of this book. Once we use this method to train the image on different conditions, it will detect objects correctly on the new set of conditions, regardless of the shape of the object. However, despite the limitations of the preceding method, we learned how to use the color and size to separate one image from another.

ROI_meancolor is a powerful method for detecting an object's average color within a bounding box. You can use it, for example, to differentiate players from one team from another based on their jersey color within a bounding box, a green apple from a red apple, or any type of color-based separation method.

An overview of TensorFlow, its ecosystem, and installation

In the previous sections, we covered the basics of computer vision techniques, such as image conversion, image filtering, convolution using a kernel, edge detection, histograms, and feature matching. This understanding and its various applications should develop a solid foundation for the advanced concept of deep learning, which will be introduced later on in this book.

Deep learning in computer vision is the cumulative learning of many different image features (such as edges, colors, boundaries, shapes, and so on) through a convolution operation of many intermediate (hidden) layers to gain a complete understanding of the image type. Deep learning augments computer vision techniques because it stacks many layers of calculations about how neurons behave. This is done by combining various inputs to produce outputs based on mathematical functions and computer vision methods, such as edge detection.

TensorFlow is an End to End (E2E) machine learning platform, where images and data are transformed into tensors to be processed by the neural network. For example, an image with a size of 224 x 224 can be represented as a tensor of rank 4 as 128, 224, 224, 3, where 128 is the batch size of the neural network, 224 is the height and width, and 3 is the color channel (R, G, and B).

If your code is based on TensorFlow 1.0, then converting it to version 2.0 can be one of the biggest challenges. Follow the instructions found at https://www.tensorflow.org/guide/migrate to convert to version 2.0. Most of the time, problems in conversion happen in a low-level API when you execute the Python code in TensorFlow using the terminal.

Keras is the high-level API for TensorFlow. The following three lines of code are the starting point for installing Keras:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow import keras

If you don't use the last line, then you may have to use the from tensorflow.keras import for all of your functions.

TensorFlow uses tf.data to build a complex input pipeline from simple code, which simplifies and speeds up the data input process. You will learn about this in Chapter 6, Visual Search Using Transfer Learning.

In Keras, layers of the model are stacked together in what is known as a sequential. This is introduced by model=tf.keras.Sequential(), and every layer is added by using the model.add statement. First, we need to compile the model using model.compile and then we can begin training using the model.train function.

TensorFlow models are saved as checkpoints and saved models. Checkpoints capture the value of parameters, filters, and weights used by the model. The checkpoint is associated with source code. The saved model, on the other hand, can be deployed to a production setting and does not need source code.

TensorFlow offers distributed training against multiple GPUs. The TensorFlow model output can be visualized using a Keras API or a TensorFlow graph.

TensorFlow versus PyTorch

PyTorch is another deep learning library similar to TensorFlow. It is based on Torch and has been developed by Facebook. While TensorFlow creates a static graph, PyTorch creates a dynamic graph. In TensorFlow, the entire computational graph has to be defined first and then the model is run whereas, in PyTorch, the graph can be defined parallel to model building.

TensorFlow Installation

To install TensorFlow 2.0 on your PC, type the following command in your Terminal. Make sure you hit Enter after each command:

pip install --upgrade pip                                                 
pip install tensorflow

The preceding command will download and extract the following packages in Terminal in addition to TensorFlow:

Keras (a high-level neural network API written in Python that is capable of running over the top of TensorFlow)
protobuf (a serializing protocol for structured data)
TensorBoard (TensorFlow's data visualization tool)
PyGPU (a Python feature used for image processing a GPU calculation for performance increase)
cctools (the native IDE for Android)
c-ares (the library function)
clang (the compiler frontend for C, C++, Objective-C, OpenCL, and OpenCV)
llvm (the compiler architecture used to produce frontend and backend binary code)
theano (the Python library used to manage multi-dimensional arrays)
grpcio (the gRPC package for Python used to implement a remote procedure call)
libgpuarray (a common n-dimensional GPU array that can be used by all packages in Python)
termcolor (a color formatting output in Python)
absl (a collection of Python library code used to build Python applications)
mock (substitutes a real object with the virtual environment to aid testing)
gast (a library used to process Python abstract syntax)

During installation, press y for yes when asked:

Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

If everything is installed correctly, you will see the preceding messages.

After installation, check the TensorFlow version by entering either of the following commands based on whether your PC has just a CPU or both a CPU and a GPU. Note that for all computer vision work, a GPU is preferred to speed up the calculation of images. Use pip3 for Python 3.6 or higher and pip for Python 2.7:

pip3 show tensorflow
pip3 show tensorflow-gpu
pip show tensorflow

The output should show the following:

Name: tensorflow
Version: 2.0.0rc0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/.../anaconda3/lib/python3.7/site-packages
Requires: gast, google-pasta, tf-estimator-nightly, wrapt, tb-nightly, protobuf, termcolor, opt-einsum, keras-applications, numpy, grpcio, keras-preprocessing, astor, absl-py, wheel, six
Required-by: gcn

At times, you may notice that even after the installation of TensorFlow, the Anaconda environment doesn't recognize that TensorFlow is installed. In that case, it is best to uninstall TensorFlow using the following command in Terminal and then reinstall it:

python3 -m pip uninstall protobuf
python3 -m pip uninstall tensorflow-gpu

Kindle Customer Dec 29, 2020

Good variety of examples to work through. The examples and concepts go beyond the typical introductory book. The code is generic enough to run locally or on AWS SageMaker. The author provides great support when problems were encountered. I'm glad I purchased this book.Would love to see another advanced book that covers TensorFlow model zoo, model/research, object detection API, TensorRT and deploying models to IoT edge devices (e.g. NVIDIA Jetsons)

Amazon Verified review

maninblack Jul 22, 2020

This book is a good starting point for computer science professionals who are new to the field of Computer Vision. I personally have not worked with CV a lot, but have the general prerequisite knowledge of machine learning concepts, and found this book to be a good starting point. This book had a good balance between theoretical concepts and working code snippets that can be tried out. The GitHub repo linked in the book was very helpful to replicate the code snippets as well, and I have not come across many books with this kind of functionality. All the code snippets I tried out worked well without any problems. From a practical implementation point of view, the section on deploying the model on a phone was quite interesting and fun to explore. Each concept mentioned in the book was very well illustrated and supported with thorough mathematical explanations as well.On the downside, I felt the book was focused mostly on image applications and could have also included more applications with videos as input in addition to images. I would have also liked a little more intuition when explaining certain concepts, in addition to the mathematical formulations already mentioned in the book. That really helps readers grasp the concept well.Overall, this was a well-written and instructive book for people wanting to gain knowledge in the CV domain!

MIke R Jul 04, 2020

There are many books out there / but this book stands out - very clear explanation of codes and contents, lots of detailed explanations for object detection, classification, visual search, matching and training in cloud. All the codes work and and their screen video has been really helpful. I find deployment of model in phone and Raspberry PI example has many practical usage. +4minus point: some sections like action recognition and semantic segmentation could be expanded more with training a custom model. -1bonus point: the author has been very helpful to connect and answer questions through LinkedIn +1

c Jun 24, 2022

Says it teaches image in painting but the linked notebook is just a GAN to generate MNIST digits

Mastering Computer Vision with TensorFlow 2.x: Build advanced computer vision applications using machine learning and deep learning techniques

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access