Building a Feedforward Neural Network
In this chapter we will cover the following recipes:
- Feed-forward propagation from scratch in Python
- Building back-propagation from scratch in Python
- Building a neural network in Keras
Introduction
A neural network is a supervised learning algorithm that is loosely inspired by the way the brain functions. Similar to the way neurons are connected to each other in the brain, a neural network takes input, passes it through a function, certain subsequent neurons get excited, and consequently the output is produced.
In this chapter, you will learn the following:
- Architecture of a neural network
- Applications of a neural network
- Setting up a feedforward neural network
- How forward-propagation works
- Calculating loss values
- How gradient descent works in back-propagation
- The concepts of epochs and batch size
- Various loss functions
- Various activation functions
- Building a neural network from scratch
- Building a neural network in Keras
Architecture of a simple neural network
An artificial neural network is loosely inspired by the way the human brain functions. Technically, it is an improvement over linear and logistic regression as neural networks introduce multiple non-linear measures in estimating the output. Additionally, neural networks provide a great flexibility in modifying the network architecture to solve the problems across multiple domains leveraging structured and unstructured data.
The more complex the function, the greater the chance that the network has to tune to the data that is given as input, hence the better the accuracy of the predictions.
The typical structure of a feed-forward neural network is as follows:
A layer is a collection of one or more nodes (computation units), where each node in a layer is connected to every other node in the next immediate layer. The input level/layer is constituted of the input variables that are required to predict the output values.
The number of nodes in the output layer depends on whether we are trying to predict a continuous variable or a categorical variable. If the output is a continuous variable, the output has one unit.
If the output is categorical with n possible classes, there will be n nodes in the output layer. The hidden level/layer is used to transform the input layer values into values in a higher-dimensional space, so that we can learn more features from the input. The hidden layer transforms the output as follows:
In the preceding diagram, x_{1},x_{2, }..., x_{n} are the independent variables, and x_{0} is the bias term (similar to the way we have bias in linear/logistic regression).
Note that w_{1},w_{2}, ..., w_{n} are the weights given to each of the input variables. If a is one of the units in the hidden layer, it will be equal to the following:
The f function is the activation function that is used to apply non-linearity on top of the sum-product of the input and their corresponding weight values. Additionally, higher non-linearity can be achieved by having more than one hidden layer.
In sum, a neural network is a collection of weights assigned to nodes with layers connecting them. The collection is organized into three main parts: the input layer, the hidden layer, and the output layer. Note that you can have n hidden layers, with the term deep learning implying multiple hidden layers. Hidden layers are necessary when the neural network has to make sense of something really complicated, contextual, or not obvious, such as image recognition. The intermediate layers (layers that are not input or output) are known as hidden, since they are practically not visible (there's more on how to visualize the intermediate layers in Chapter 4, Building a Deep Convolutional Neural Network).
Training a neural network
Training a neural network basically means calibrating all of the weights in a neural network by repeating two key steps: forward-propagation and back-propagation.
In forward-propagation, we apply a set of weights to the input data, pass it through the hidden layer, perform the nonlinear activation on the hidden layer output, and then connect the hidden layer to the output layer by multiplying the hidden layer node values with another set of weights. For the first forward-propagation, the values of the weights are initialized randomly.
In back-propagation, we try to decrease the error by measuring the margin of error of output and then adjust weight accordingly. Neural networks repeat both forward- and back-propagation to predict an output until the weights are calibrated.
Applications of a neural network
Recently, we have seen a huge adoption of neural networks in a variety of applications. In this section, let's try to understand the reason why adoption might have increased considerably. Neural networks can be architected in multiple ways. Here are some of the possible ways:
The box at the bottom is the input, followed by the hidden layer (the middle box), and the box at the top is the output layer. The one-to-one architecture is a typical neural network with a hidden layer between the input and output layer. Examples of different architectures are as follows:
Architecture | Example |
One-to-many | The input is an image and the output is a caption for the image |
Many-to-one | The input is a movie review (multiple words) and the output is the sentiment associated with the review |
Many-to-many | Machine translation of a sentence in one language to a sentence in another language |
Apart from the preceding points, neural networks are also in a position to understand the content in an image and detect the position where the content is located using an architecture named Convolutional Neural Network (CNN), which looks as follows:
Here, we saw examples of recommender systems, image analysis, text analysis, and audio analysis, and we can see that neural networks give us the flexibility to solve a problem using multiple architectures, resulting in increased adoption as the number of applications increases.
Feed-forward propagation from scratch in Python
In order to build a strong foundation of how feed-forward propagation works, we'll go through a toy example of training a neural network where the input to the neural network is (1, 1) and the corresponding output is 0.
Getting ready
The strategy that we'll adopt is as follows: our neural network will have one hidden layer (with neurons) connecting the input layer to the output layer. Note that we have more neurons in the hidden layer than in the input layer, as we want to enable the input layer to be represented in more dimensions:
Calculating the hidden layer unit values
We now assign weights to all of the connections. Note that these weights are selected randomly (based on Gaussian distribution) since it is the first time we're forward-propagating. In this specific case, let's start with initial weights that are between 0 and 1, but note that the final weights after the training process of a neural network don't need to be between a specific set of values:
In the next step, we perform the multiplication of the input with weights to calculate the values of hidden units in the hidden layer.
The hidden layer's unit values are obtained as follows:
The hidden layer's unit values are also shown in the following diagram:
Note that in the preceding output we calculated the hidden values. For simplicity, we excluded the bias terms that need to be added at each unit of a hidden layer.
Now, we will pass the hidden layer values through an activation function so that we attain non-linearity in our output.
Applying the activation function
Activation functions are applied at multiple layers of a network. They are used so that we achieve high non-linearity in input, which can be useful in modeling complex relations between the input and output.
The different activation functions are as follows:
For our example, let’s use the sigmoid function for activation. The sigmoid function looks like this, graphically:
By applying sigmoid activation, S(x), to the three hidden=layer sums, we get the following:
final_h_{1} = S(1.0) = 0.73
final_h_{2} = S(1.3) = 0.78
final_h_{3} = S(0.8) = 0.69
Calculating the output layered values
Now that we have calculated the hidden layer values, we will be calculating the output layer value. In the following diagram, we have the hidden layer values connected to the output through the randomly-initialized weight values. Using the hidden layer values and the weight values, we will calculate the output values for the following network:
We perform the sum product of the hidden layer values and weight values to calculate the output value. For simplicity, we excluded the bias terms that need to be added at each unit of the hidden layer:
0.73 * 0.3 + 0.79 * 0.5 + 0.69 * 0.9 = 1.235
The values are shown in the following diagram:
Because we started with a random set of weights, the value of the output neuron is very different from the target, in this case by +1.235 (since the target is 0).
Calculating the loss values
Loss values (alternatively called cost functions) are values that we optimize in a neural network. In order to understand how loss values get calculated, let's look at two scenarios:
- Continuous variable prediction
- Categorical variable prediction
Calculating loss during continuous variable prediction
Typically, when the variable is a continuous one, the loss value is calculated as the squared error, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network:
In the preceding equation, y(i) is the actual value of output, h(x) is the transformation that we apply on the input (x) to obtain a predicted value of y, and m is the number of rows in the dataset.
Calculating loss during categorical variable prediction
When the variable to predict is a discrete one (that is, there are only a few categories in the variable), we typically use a categorical cross-entropy loss function. When the variable to predict has two distinct values within it, the loss function is binary cross-entropy, and when the variable to predict has multiple distinct values within it, the loss function is a categorical cross-entropy.
Here is binary cross-entropy:
(ylog(p)+(1−y)log(1−p))
Here is categorical cross-entropy:
y is the actual value of output p, is the predicted value of the output and n is the total number of data points. For now, let's assume that the outcome that we are predicting in our toy example is continuous. In that case, the loss function value is the mean squared error, which is calculated as follows:
error = 1.235^{2} = 1.52
In the next step, we will try to minimize the loss function value using back-propagation (which we'll learn about in the next section), where we update the weight values (which were initialized randomly earlier) to minimize the loss (error).
How to do it...
In the previous section, we learned about performing the following steps on top of the input data to come up with error values in forward-propagation (the code file is available as Neural_network_working_details.ipynb in GitHub):
- Initialize weights randomly
- Calculate the hidden layer unit values by multiplying input values with weights
- Perform activation on the hidden layer values
- Connect the hidden layer values to the output layer
- Calculate the squared error loss
A function to calculate the squared error loss values across all data points is as follows:
import numpy as np
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs,weights[0])+ weights[1]
hidden = 1/(1+np.exp(-pre_hidden))
out = np.dot(hidden, weights[2]) + weights[3]
squared_error = (np.square(pred_out - outputs))
return squared_error
In the preceding function, we take the input variable values, weights (randomly initialized if this is the first iteration), and the actual output in the provided dataset as the input to the feed-forward function.
We calculate the hidden layer values by performing the matrix multiplication (dot product) of the input and weights. Additionally, we add the bias values in the hidden layer, as follows:
pre_hidden = np.dot(inputs,weights[0])+ weights[1]
The preceding scenario is valid when weights[0] is the weight value and weights[1] is the bias value, where the weight and bias are connecting the input layer to the hidden layer.
Once we calculate the hidden layer values, we perform activation on top of the hidden layer values, as follows:
hidden = 1/(1+np.exp(-pre_hidden))
We now calculate the output at the hidden layer by multiplying the output of the hidden layer with weights that connect the hidden layer to the output, and then adding the bias term at the output, as follows:
pred_out = np.dot(hidden, weights[2]) + weights[3]
Once the output is calculated, we calculate the squared error loss at each row, as follows:
squared_error = (np.square(pred_out - outputs))
In the preceding code, pred_out is the predicted output and outputs is the actual output.
We are then in a position to obtain the loss value as we forward-pass through the network.
While we considered the sigmoid activation on top of the hidden layer values in the preceding code, let's examine other activation functions that are commonly used.
Tanh
The tanh activation of a value (the hidden layer unit value) is calculated as follows:
def tanh(x):
return (exp(x)-exp(-x))/(exp(x)+exp(-x))
ReLu
The Rectified Linear Unit (ReLU) of a value (the hidden layer unit value) is calculated as follows:
def relu(x):
return np.where(x>0,x,0)
Linear
The linear activation of a value is the value itself.
Softmax
Typically, softmax is performed on top of a vector of values. This is generally done to determine the probability of an input belonging to one of the n number of the possible output classes in a given scenario. Let's say we are trying to classify an image of a digit into one of the possible 10 classes (numbers from 0 to 9). In this case, there are 10 output values, where each output value should represent the probability of an input image belonging to one of the 10 classes.
The softmax activation is used to provide a probability value for each class in the output and is calculated explained in the following sections:
def softmax(x):
return np.exp(x)/np.sum(np.exp(x))
Apart from the preceding activation functions, the loss functions that are generally used while building a neural network are as follows.
Mean squared error
The error is the difference between the actual and predicted values of the output. We take a square of the error, as the error can be positive or negative (when the predicted value is greater than the actual value and vice versa). Squaring ensures that positive and negative errors do not offset each other. We calculate the mean squared error so that the error over two different datasets is comparable when the datasets are not the same size.
The mean squared error between predicted values (p) and actual values (y) is calculated as follows:
def mse(p, y):
return np.mean(np.square(p - y))
The mean squared error is typically used when trying to predict a value that is continuous in nature.
Mean absolute error
The mean absolute error works in a manner that is very similar to the mean squared error. The mean absolute error ensures that positive and negative errors do not offset each other by taking an average of the absolute difference between the actual and predicted values across all data points.
The mean absolute error between the predicted values (p) and actual values (y) is implemented as follows:
def mae(p, y):
return np.mean(np.abs(p-y))
Similar to the mean squared error, the mean absolute error is generally employed on continuous variables.
Categorical cross-entropy
Cross-entropy is a measure of the difference between two different distributions: actual and predicted. It is applied to categorical output data, unlike the previous two loss functions that we discussed.
Cross-entropy between two distributions is calculated as follows:
y is the actual outcome of the event and p is the predicted outcome of the event.
Categorical cross-entropy between the predicted values (p) and actual values (y) is implemented as follows:
def cat_cross_entropy(p, y):
return -np.sum((y*np.log2(p)+(1-y)*np.log2(1-p)))
Note that categorical cross-entropy loss has a high value when the predicted value is far away from the actual value and a low value when the values are close.
Building back-propagation from scratch in Python
In forward-propagation, we connected the input layer to the hidden layer to the output layer. In back-propagation, we take the reverse approach.
Getting ready
We change each weight within the neural network by a small amount – one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.
Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.
By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:
- Determine the direction of the weight update
- Determine the magnitude of the weight update
Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.
Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.
This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.
The whole process by which we update weights to reduce error is called a gradient-descent technique.
Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.
Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.
Back-propagation works as follows:
- Calculates the overall cost function from the feedforward process.
- Varies all the weights (one at a time) by a small amount.
- Calculates the impact of the variation of weight on the cost function.
- Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.
If the preceding steps are performed n number of times, it essentially results in n epochs.
In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:
For now, we will have the known function as y = 2x, where we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:
x |
y |
1 |
2 |
2 |
4 |
3 |
6 |
4 |
8 |
If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).
How to do it...
In this section, we will build the back-propagation algorithm by hand so that we clearly understand how weights are calculated in a neural network. In this specific case, we will build a simple neural network where there is no hidden layer (thus we are solving a regression equation). The code file is available as Neural_network_working_details.ipynb in GitHub.
- Initialize the dataset as follows:
x = [[1],[2],[3],[4]]
y = [[2],[4],[6],[8]]
- Initialize the weight and bias values randomly (we have only one weight and one bias value as we are trying to identify the optimal values of a and b in the y = a*x + b equation):
w = [[[1.477867]], [0.]]
- Define the feed-forward network and calculate the squared error loss value:
import numpy as np
def feed_forward(inputs, outputs, weights):
out = np.dot(inputs,weights[0]) + weights[1]
squared_error = (np.square(out - outputs))
return squared_error
In the preceding code, we performed a matrix multiplication of the input with the randomly-initialized weight value and summed it up with the randomly-initialized bias value.
Once the value is calculated, we calculate the squared error value of the difference between the actual and predicted values.
- Increase each weight and bias value by a very small amount (0.0001) and calculate the squared error loss value one at a time for each of the weight and bias updates.
If the squared error loss value decreases as the weight increases, the weight value should be increased. The magnitude by which the weight value should be increased is proportional to the amount of loss value the weight change decreases by.
Additionally, ensure that you do not increase the weight value as much as the loss decrease caused by the weight change, but weigh it down with a factor called the learning rate. This ensures that the loss decreases more smoothly (there's more on how the learning rate impacts the model accuracy in the next chapter).
In the following code, we are creating a function named update_weights, which performs the back-propagation process to update weights that were obtained in step 3. We are also mentioning that the function needs to be run for epochs number of times (where epochs is a parameter we are passing to update_weights function):
def update_weights(inputs, outputs, weights, epochs):
for epoch in range(epochs):
- Pass the input through a feed-forward network to calculate the loss with the initial set of weights:
org_loss = feed_forward(inputs, outputs, weights)
- Ensure that you deepcopy the list of weights, as the weights will be manipulated in further steps, and hence deepcopy takes care of any issues resulting from the change in the child variable impacting the parent variable that it is pointing to:
wts_tmp = deepcopy(weights)
wts_tmp2 = deepcopy(weights)
- Loop through all the weight values, one at a time, and change them by a small value (0.0001):
for i in range(len(weights)):
wts_tmp[-(i+1)] += 0.0001
- Calculate the updated feed-forward loss when the weight is updated by a small amount. Calculate the change in loss due to the small change in input. Divide the change in loss by the number of input, as we want to calculate the mean squared error across all the input samples we have:
loss = feed_forward(inputs, outputs, wts_tmp)
delta_loss = np.sum(org_loss - loss)/(0.0001*len(inputs))
- Update the weights by the change in loss that they are causing. Update the weights slowly by multiplying the change in loss by a very small number (0.01), which is the learning rate parameter (more about the learning rate parameter in the next chapter):
wts_tmp2[-(i+1)] += delta_loss*0.01
wts_tmp = deepcopy(weights)
- The updated weights and bias value are returned:
weights = deepcopy(wts_tmp2)
return wts_tmp2
One of the other parameters in a neural network is the batch size considered in calculating the loss values.
In the preceding scenario, we considered all the data points in order to calculate the loss value. However, in practice, when we have thousands (or in some cases, millions) of data points, the incremental contribution of a greater number of data points while calculating loss value would follow the law of diminishing returns and hence we would be using a batch size that is much smaller compared to the total number of data points we have.
The typical batch size considered in building a model is anywhere between 32 and 1,024.
There's more...
In the previous section, we built a regression formula (Y = a*x + b) where we wrote a function to identify the optimal values of a and b. In this section, we will build a simple neural network with a hidden layer that connects the input to the output on the same toy dataset that we worked on in the previous section.
We define the model as follows (the code file is available as Neural_networks_multiple_layers.ipynb in GitHub):
- The input is connected to a hidden layer that has three units
- The hidden layer is connected to the output, which has one unit in output layer
Let us go ahead and code up the strategy discussed above, as follows:
- Define the dataset and import the relevant packages:
from copy import deepcopy
import numpy as np
x = [[1],[2],[3],[4]]
y = [[2],[4],[6],[8]]
We use deepcopy so that the value of the original variable does not change when the variable to which the original variable's values are copied has its values changed.
- Initialize the weight and bias values randomly. The hidden layer has three units in it. Hence, there are a total of three weight values and three bias values – one corresponding to each of the hidden units.
Additionally, the final layer has one unit that is connected to the three units of the hidden layer. Hence, a total of three weights and one bias dictate the value of the output layer.
The randomly-initialized weights are as follows:
w = [[[-0.82203424, -0.9185806 , 0.03494298]], [0., 0., 0.], [[ 1.0692896 ],[ 0.62761235],[-0.5426246 ]], [0]]
- Implement the feed-forward network where the hidden layer has a ReLU activation in it:
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs,weights[0])+ weights[1]
hidden = np.where(pre_hidden<0, 0, pre_hidden)
out = np.dot(hidden, weights[2]) + weights[3]
squared_error = (np.square(out - outputs))
return squared_error
- Define the back-propagation function similarly to what we did in the previous section. The only difference is that we now have to update the weights in more layers.
In the following code, we are calculating the original loss at the start of an epoch:
def update_weights(inputs, outputs, weights, epochs):
for epoch in range(epochs):
org_loss = feed_forward(inputs, outputs, weights)
In the following code, we are copying weights into two sets of weight variables so that they can be reused in a later code:
wts_new = deepcopy(weights)
wts_new2 = deepcopy(weights)
In the following code, we are updating each weight value by a small amount and then calculating the loss value corresponding to the updated weight value (while every other weight is kept unchanged). Additionally, we are ensuring that the weight update happens across all weights and also across all layers in a network.
The change in the squared loss (del_loss) is attributed to the change in the weight value. We repeat the preceding step for all the weights that exist in the network:
for i, layer in enumerate(reversed(weights)):
for index, weight in np.ndenumerate(layer):
wts_tmp[-(i+1)][index] += 0.0001
loss = feed_forward(inputs, outputs, wts_tmp)
del_loss = np.sum(org_loss - loss)/(0.0001*len(inputs))
The weight value is updated by weighing down by the learning rate parameter – a greater decrease in loss will update weights by a lot, while a lower decrease in loss will update the weight by a small amount:
wts_tmp2[-(i+1)][index] += del_loss*0.01
wts_tmp = deepcopy(weights)
Finally, we return the updated weights:
weights = deepcopy(wts_tmp2)
return wts_tmp2
- Run the function an epoch number of times to update the weights an epoch number of times:
update_weights(x,y,w,1)
The output (updated weights) of preceding code is as follows:
In the preceding steps, we learned how to build a neural network from scratch in Python. In the next section, we will learn about building a neural network in Keras.
Building a neural network in Keras
In the previous section, we built a neural network from scratch, that is, we wrote functions that perform forward-propagation and back-propagation.
How to do it...
We will be building a neural network using the Keras library, which provides utilities that make the process of building a complex neural network much easier.
Installing Keras
Tensorflow and Keras are implemented in Ubuntu, using the following commands:
$pip install --no-cache-dir tensorflow-gpu==1.7
Note that it is preferable to install a GPU-compatible version, as neural networks work considerably faster when they are run on top of a GPU. Keras is a high-level neural network API, written in Python, and capable of running on top of TensorFlow, CNTK, or Theano.
It was developed with a focus on enabling fast experimentation, and it can be installed as follows:
$pip install keras
Building our first model in Keras
In this section, let's understand the process of building a model in Keras by using the same toy dataset that we worked on in the previous sections (the code file is available as Neural_networks_multiple_layers.ipynb in GitHub):
- Instantiate a model that can be called sequentially to add further layers on top of it. The Sequential method enables us to perform the model initialization exercise:
from keras.models import Sequential
model = Sequential()
- Add a dense layer to the model. A dense layer ensures the connection between various layers in a model. In the following code, we are connecting the input layer to the hidden layer:
model.add(Dense(3, activation='relu', input_shape=(1,)))
In the dense layer initialized with the preceding code, we ensured that we provide the input shape to the model (we need to specify the shape of data that the model has to expect as this is the first dense layer).
Additionally, we mentioned that there will be three connections made to each input (three units in the hidden layer) and also that the activation that needs to be performed in the hidden layer is the ReLu activation.
- Connect the hidden layer to the output layer:
model.add(Dense(1, activation='linear'))
Note that in this dense layer, we don't need to specify the input shape, as the model would already infer the input shape from the previous layer.
Also, given that each output is one-dimensional, our output layer has one unit and the activation that we are performing is the linear activation.
The model summary can now be visualized as follows:
model.summary()
A summary of model is as follows:
The preceding output confirms our discussion in the previous section: that there will be a total of six parameters in the connection from the input layer to the hidden layer—three weights and three bias terms—we have a total of six parameters corresponding to the three hidden units. In addition, three weights and one bias term connect the hidden layer to the output layer.
- Compile the model. This ensures that we define the loss function and the optimizer to reduce the loss function and the learning rate corresponding to the optimizer (we will look at different optimizers and loss functions in next chapter):
from keras.optimizers import sgd
sgd = sgd(lr = 0.01)
In the preceding step, we specified that the optimizer is the stochastic gradient descent that we learned about in the previous section and the learning rate is 0.01. Pass the predefined optimizer and its corresponding learning rate as a parameter and reduce the mean squared error value:
model.compile(optimizer=sgd,loss='mean_squared_error')
- Fit the model. Update the weights so that the model is a better fit:
model.fit(np.array(x), np.array(y), epochs=1, batch_size = 4, verbose=1)
The fit method expects that it receives two NumPy arrays: an input array and the corresponding output array. Note that epochs represents the number of times the total dataset is traversed through, and batch_size represents the number of data points that need to be considered in an iteration of updating the weights. Furthermore, verbose specifies that the output is more detailed, with information about losses in training and test datasets as well as the progress of the model training process.
- Extract the weight values. The order in which the weight values are presented is obtained by calling the weights method on top of the model, as follows:
model.weights
The order in which weights are obtained is as follows:
From the preceding output, we see that the order of weights is the three weights (kernel) and three bias terms in the dense_1 layer (which is the connection between the input to the hidden layer) and the three weights (kernel) and one bias term connecting the hidden layer to the dense_2 layer (the output layer).
Now that we understand the order in which weight values are presented, let's extract the values of these weights:
model.get_weights()
Notice that the weights are presented as a list of arrays, where each array corresponds to the value that is specified in the model.weights output.
The output of above lines of code is as follows:
You should notice that the output we are observing here matches with the output we obtaining while hand-building the neural network
- Predict the output for a new set of input using the predict method:
x1 = [[5],[6]]
model.predict(np.array(x1))
Note that x1 is the variable that holds the values for the new set of examples for which we need to predict the value of the output. Similarly to the fit method, the predict method also expects an array as its input.
The output of preceding code is as follows:
Notice that, while the preceding output is incorrect, the output when we run for 100 epochs is as follows:
The preceding output will match the expected output (which are 10, 12) as we run for even higher number of epochs.