Loading an image into a matrix
The JSFeat library uses its own data structure for matrices. First, we load an image using regular HTML and JavaScript operations. We then place a canvas on our webpage:
Then we need to place an image here. We do this with just a few lines of code:
This is just a common way of displaying an image on a canvas. We define the image source path, and when the image is loaded, we set the canvas dimensions to those of an image and draw the image itself. Let's move on. Loading a canvas' content into a matrix is a bit tricky. Why is that? We need to use a jsfeat.data_t
method, which is a data structure that holds a binary representation of an array. Anyway, since it is just a wrapper for the JavaScript ArrayBuffer, it should not be a problem:
Here, we create a matrix as we did earlier, but in addition to that we add a new parameter, matrix buffer, which holds all the necessary data.
Probably, you already noticed that the third parameter for the matrix construction looks strange. It sets the type of matrix. Matrices have two properties:
The first part represents the type of data in the matrix. In our example, it is U8_t
; it states that we use unsigned byte array. Usually, an image uses 0-255 range for a color representation, that is why we need bytes here.
Remember that an image consists of 3 main channels (red, green, and blue) and an alpha channel. The second part of the parameter shows the number of channels we use for the matrix. If there is only one channel, then it is a grayscale image.
How do we convert a colored image into a grayscale image? For the answer, we must move to the next section.
Working with matrices is not easy. Who are we to fear the difficulties? With the help of this section, you will learn how to combine different matrices to produce interesting results.
Basic operations are really useful when you need to implement something new. Usually, Computer Vision uses grayscale images to work with them, since most Computer Vision algorithms do not need color information to track the object. As you may already know, Computer Vision mostly relies on the shape and intensity information to produce the results. In the following code, we will see how to convert a color matrix into a grayscale (one channel) matrix:
Just a few lines of code! First, we create an object, which will hold our grayscale image. Next, we apply the JSFeat
function to that image. You may also define matrix boundaries for conversion, if you want. Here is the result of the conversion:
For this type of operation, you do not actually need to load a color image into the matrix; instead of mat.data
, you can use imageData.data
from the context—it's up to you.
To see how to display a matrix, refer to the Matrix displaying section.
One of the useful operations in Computer Vision is a matrix transpose, which basically just rotates a matrix by 90 degrees counter-clockwise. You need to keep in mind that the rows and columns of the original matrix are reflected during this operation:
Again, we need to predefine the resulting matrix, and only then we can apply the transpose operation:
Another operation that can be helpful is a matrix multiplication. Since it is hard to see the result on an image, we will fill matrices manually. The following code works by the formula C = A * B, the number of rows of the first matrix must be equal to the number of columns of the second matrix, e.g. MxN and NxK, those are dimensions for the first and the second matrices accordingly:
Here, the M = K = 3 and N = 2. Keep in mind that during the matrix creation, we place columns as a first parameter, and only as the second do we place rows. We populate matrices with dummy values and call the multiply function. After displaying the result in the console, you will see this:
Here the first column is matrix A, the second – matrix B and the third column is the result matrix of C.
JSFeat also provides such functions for matrix multiplication as multiply_ABt
, multiply_AAt
, and so on, where t means transposed. Use these functions when you do not want to write additional lines of code for the transpose method. In addition to this, there are matrix operations for 3 x 3 matrices, which are faster and optimized for this dimension. Besides, they are useful when, for example, you need to work with coordinates.
In the two-dimensional world, we use only x and y for coordinates. However, for more complex algorithms, when we need to define a point of intersection between two parallel lines, we need to add z (third) coordinate to a point, this system of coordinates is called homogeneous coordinates. They are especially helpful when you need to project a three-dimensional object onto a two-dimensional space.
Consider find features on an image, these features are usually used for object detection. There are many algorithms for this but you need a robust approach, which has to work with different object sizes. Moreover, you may need to reduce the redundancy of an image or search something the size of which you are unsure of. In that case, you need a set of images. The solution to this is a pyramid of an image. An image pyramid is a collection of several images, which are downsampled from the original.
The code for creating an image pyramid will look like this:
First, we define the number of levels for the pyramid; here, we set it to 4. In JSFeat, the first level is skipped by default, since it is the original image. Next, we define the starting dimensions and output types. Then, we allocate space for the pyramid levels and build the pyramid itself. A pyramid is generally downsampled by a factor of 2:
JSFeat pyramid is just an array of matrices, it shows different pyramid layers starting from the original image and ending with the smallest image in the pyramid.
What we did not discuss in the previous section is how to display output matrices. It is done in different ways for grayscale and colored images. Here is the code for displaying matrices for a colored image:
We just need to cast the matrix data to the appropriate format and put the resulting ImageData
function into the context. It is harder to do so for a grayscale image:
This is a binary data representation. We populate the ImageData
function with the alpha channel, which is constant for all pixels as well as for red, green, and blue channels. For a gray image, they have the same value, which is set as the pix
variable. Finally, we need to put the ImageData
function into the context as we did in the previous example.