Building a Reward Matrix – Designing Your Datasets
Experimenting and implementation comprise the two main approaches of artificial intelligence. Experimenting largely entails trying ready-to-use datasets and black box, ready-to-use Python examples. Implementation involves preparing a dataset, developing preprocessing algorithms, and then choosing a model, the proper parameters, and hyperparameters.
Implementation usually involves white box work that entails knowing exactly how an algorithm works and even being able to modify it.
In Chapter 1, Getting Started with Next-Generation Artifcial Intelligence through Reinforcement Learning, the MDP-driven Bellman equation relied on a reward matrix. In this chapter, we will get our hands dirty in a white box process to create that reward matrix.
An MDP process cannot run without a reward matrix. The reward matrix determines whether it is possible to go from one cell to another, from A to B. It is like a map of a city that...
Designing datasets – where the dream stops and the hard work begins
As in the previous chapter, bear in mind that a real-life project goes through a three-dimensional method in some form or other. First, it's important to think and talk about the problem in need of solving without jumping onto a laptop. Once that is done, bear in mind that the foundation of machine learning and deep learning relies on mathematics. Finally, once the problem has been discussed and mathematically represented, it is time to develop the solution.
First, think of a problem in natural language. Then, make a mathematical description of a problem. Only then should you begin the software implementation.
Designing datasets
The reinforcement learning program described in the first chapter can solve a variety of problems involving unlabeled classification in an unsupervised decision-making process. The Q function can be applied to drone, truck, or car deliveries. It...
Logistic activation functions and classifiers
Now that the value of each location of L = {l1, l2, l3, l4, l5, l6} contains its availability in a vector, the locations can be sorted from the most available to the least available location. From there, the reward matrix, R, for the MDP process described in Chapter 1, Getting Started with Next-Generation Artifcial Intelligence through Reinforcement Learning, can be built.
Overall architecture
At this point, the overall architecture contains two main components:
- Chapter 1: A reinforcement learning program based on the value-action Q function using a reward matrix that will be finalized in this chapter. The reward matrix was provided in the first chapter as an experiment, but in the implementation phase, you'll often have to build it from scratch. It sometimes takes weeks to produce a good reward matrix.
- Chapter 2: Designing a set of 6×1 neurons that represents the flow of products at a...
Summary
Using a McCulloch-Pitts neuron with a logistic activation function in a one-layer network to build a reward matrix for reinforcement learning shows one way to preprocess a dataset.
Processing real-life data often requires a generalization of a logistic sigmoid function through a softmax function, and a one-hot function applied to logits to encode the data.
Machine learning functions are tools that must be understood to be able to use all or parts of them to solve a problem. With this practical approach to artificial intelligence, a whole world of projects awaits you.
This neuronal approach is the parent of the multilayer perceptron that will be introduced starting in Chapter 8, Solving the XOR Problem with a Feedforward Neural Network.
This chapter went from an experimental black box machine learning and deep learning to white box implementation. Implementation requires a full understanding of machine learning algorithms that often require fine-tuning.
...Questions
- Raw data can be the input to a neuron and transformed with weights. (Yes | No)
- Does a neuron require a threshold? (Yes | No)
- A logistic sigmoid activation function makes the sum of the weights larger. (Yes | No)
- A McCulloch-Pitts neuron sums the weights of its inputs. (Yes | No)
- A logistic sigmoid function is a log10 operation. (Yes | No)
- A logistic softmax is not necessary if a logistic sigmoid function is applied to a vector. (Yes | No)
- A probability is a value between –1 and 1. (Yes | No)
Further reading
- The original McCulloch-Pitts neuron 1943 paper: http://www.cse.chalmers.se/~coquand/AUTOMATA/mcp.pdf
- TensorFlow variables: https://www.tensorflow.org/beta/guide/variables