# Introduction to Mathematical Modeling

There is a great deal of interesting work happening in data sciences, especially in the realms of **Machine Learning** (**ML**) and **Deep Learning** (**DL**), and they are popular for good reason. However, the more tried and tested old-timer, mathematical modeling, is not much talked about. Mathematical modeling methods are no less relevant and are complementary to ML. To create successful data products that solve real business problems, we must often deploy the whole breadth of available mathematical tools, far beyond ML.

A model is a simplified representation of a real system and captures the essence of the system. A mathematical model uses variables, operators, functions, equations, and equalities. Under the hood of mathematical models, there are first-principle models based on physical laws, stochastic models based on distributions, averages, and empirical models based on patterns or historical data. Based on the particular type of modeling, qualitative or quantitative recommendations can be made for the system under consideration. A mathematical model facilitates design and prototyping and substantiates decisions. To formulate a mathematical model, one needs the input and output, the constants and variables, the domain and boundary, or initial conditions and constraints. The solution can be analytic or numerical; in either case, it determines the typical behavior and critical parameters of the system, trends, dependency, and operating regimes. Systems can be deterministic, wherein we know the cause-effect relationship, or they may be stochastic, involving probability distributions.

A few mature tools in mathematical modeling are in the following areas:

- Mathematical optimization
- Signal processing
- Control theory

We will explore these mathematical modeling approaches in the following sections. A narrow focus on ML misses out on many relevant features of pure mathematical optimization in many use cases. Successful solutions across disparate domains blend the new world of ML with classical mathematical modeling techniques. For example, one can combine state-space modeling methods with ML to infer unobserved parameters of systems in a parameter estimation problem.

# Mathematical optimization

A branch of applied mathematics is mathematical optimization, popularly known as mathematical programming. It finds applications in fields such as manufacturing, inventory control, scheduling, networks, economics, engineering, and financial portfolio allocation. Almost any classification, regression, or clustering problem can be cast as an optimization problem. Some problems are static, while some are dynamic, wherein the values of system variables change over time.

## Understanding the problem

Mathematical optimization is basically choosing inputs from a set of allowed options to obtain the optimized or best possible output in a given problem. There are variables, which are essentially the decisions we have to make; constraints, which are the business rules we have to adhere to; and objectives, which are the business goals we are aiming to achieve by representing the real-world business problem as an optimization problem. For example, a hospital’s business problem is equipment and facility capacity planning. Medical equipment including beds and testing kits comprise the decision variables in this case; constraints are conventional and crisis capacity levels and regulations; and finally, the objective is to maximize resource utilization and service performance and minimize operating costs at the same time.

The most basic optimization problem consists of an objective function or cost function, which is the output value we try to optimize, in other words, maximize or minimize. The inputs are variables that can be controlled. Variables can be either discrete or continuous. The scale of a problem is pretty much determined by the dimensionality, that is, the number of scalar variables (also called decision variables) on which the search is performed. Constraints or equations place limits on how big or small some variables can get. Some problems have constraints, which can be equality or inequality constraints, while some problems do not have them at all, which implies the unbounded optimization of the function.

A linear programming problem is an optimization problem wherein the objective function and all constraints are linear, that is, the variables have only first-order terms. It was linear programming that led to the development of optimization in the 1940s. If either the function or one or more constraint(s) is non-linear, then we have a non-linear programming problem. For example, optimizing smooth (well-defined gradient, continuous) functions is easier. Knowing the problem type enables the selection of the right tool to solve it.

## Formulation of the problem

The general formulation of a mathematical problem with an objective function *f(x)* represents questions in terms of variables and constraints. A typical form is as follows:

Minimize f ( such that where i = 1, 2, ...., m

The nature of variables and constraints can be quite diverse. The variables may be discrete, continuous, or sets (groups), and the constraints may be deterministic or stochastic. The objective function may also include dynamic aspects.

Sometimes we are interested in finding the global optimum point without any constraints or restrictions on the region in space. Such problems are unconstrained optimization problems. At other times, we have to solve problems subject to certain constraints, such as restrictions on control variables. For example, in the preceding case, we might have to minimize the function subject to (. These are constrained optimization problems.

Example 1:

Let us have multiple (inequalities) constraints with two variables, *x* and *y*, as follows:

*2x + 3y ≤ **34*

*3x + 5y ≤ **54*

*0 ≤ **x, y*

A graphical optimization would be an overlap (dark region) of the graphs, shown in *Figure 1**.1*. Here the constraints are linear, and therefore, the maximum and minimum must lie on the boundary. And it is most likely that the optimum solution occurs at one of the three specified points. With non-linear constraints, the optimum occurs either at the boundaries or between them. In unconstrained optimization, either the function has no boundaries, or if it has, they are soft.

Figure 1.1: Graphical representation of linear constrained optimization

Typical constraints in business problems involve time, money, and resources while attempting to maximize an objective function. The constraints are more particular to the use case at hand while minimizing an objective function. Suppose in the preceding problem the objective function is linear, such as *f(x, y) = 20x + 35y*, and the optimum is found out from the slope of the function. If *f(x, y)* takes a value, the value becomes a boundary, and the constraint plus the boundary make a linear constraint.

With linear constraints, the overlap region is considered to be feasible. Non-linear constraints can be very difficult to visualize as a distorted *x*-*y* plane makes it almost impossible to graph the feasible region.

Example 2:

In non-linear constrained optimization, the first step is to start on the boundary of the feasible region. To minimize the objective function, the vector direction should be chosen so that it decreases the function and stays in the feasible region. If the dot product of the gradient (slope) of the objective function with the vector itself is negative at a point on the boundary, then the vector is said to be moving in the descent direction. Also, a vector that does not violate the constraints is said to move in a feasible direction.

Figure 1.2: Feasible direction in non-linear constrained optimization

The constraint equation on the boundary is *g(x)* =0, shown in *Figure 1**.2*. A feasible vector cannot cause the value of *g(x)* to increase. It must either remain zero or decrease. If the dot product of the gradient of the constraint with the vector itself is negative or zero at the point, then the vector is said to be moving in a feasible direction. For example, say we have the following objective function:

Minimize

And the initial point (4, 2 on a single constraint:

Where and are the variables, in general, standing for a matrix or array. The vector `<-1, 0>`

is in both descent and feasible directions. Since the initial point is randomly chosen, there is a good chance that the overlap between the set of all feasible vectors and the set of all descent vectors is large. However, as we approach the minimum, the overlap gets smaller, and at the minimum or optimum point, there is no overlap at all. At the optimum, one cannot minimize the objective function further without violating the constraint. We know we have reached the optimum when the dot product of the two gradients is negative, and the two vectors have a matrix determinant equal to zero.

Another possibility is that the optimum occurs in the interior of the feasible region rather than on the boundary. In such a case, the gradient of the objective function will be zero at that point. The concavity (non-convexity) of the point is determined by the eigenvalues of Hessian (second differential) of the function.

In optimization problems where the objective function is noisy or its gradient is computed numerically as the gradient is not given (complex boundary value problems, for instance), errors are induced. Even if the objective functions themselves are not noisy, gradient-based optimization may turn out to be noisy. There are different optimizers available as library functions with Scientific Python, or `scipy`

for short, to solve such optimization problems, and we will learn about a few of them in the following chapters. Now that we have learned about the concepts of mathematical optimization, we shall explore another concept in mathematical modeling, which is signal processing.

# Signal processing

Another branch of applied mathematics is signal processing, which finds its application in the engineering field, focusing on analyzing and processing signals such as sound, images, scientific measurements, and filtering out noise. Signal processing deals with the transformation of a signal from time-series to hyper-spectral images, which are obtained from different electromagnetic measurements. Classic transformations of signals such as spectrograms and wavelets are often used with ML techniques. Such representations can also be used as inputs to deep neural networks. The Kalman filter is one classic signal processing filter that uses a series of measurements over time to produce estimates of unknown variables.

## Understanding the problem

A signal is a function of a continuous variable, such as time or space. An analog signal is transformed into a digital signal by sampling it at specified intervals of time called the sampling period, the inverse of which is the sampling rate (per second or Hertz). The sampling rate has to be at least twice as high as the maximum frequency of the analog signal. It establishes a sufficient condition that permits a discrete sequence of samples to encapsulate all the information from a continuous time signal into a discrete time signal.

Figure 1.3: 60 kHz sinusoidal (Hann-windowed) tone burst in the time domain and frequency domain of the signal

The frequency domain representation of a signal is done with the **Discrete Fourier Transform** (**DFT**). The **Fast Fourier Transform** (**FFT**) is an efficient computation method of DFT. FFT is rarely applied over the entire signal (speech signal, for example) at once but rather in frames due to the stochastic nature of the signal, an example of which is illustrated in *Figure 1**.3*. FFT is available as a library function with `scipy`

for the computation of the frequencies of each frame. A type of Fourier transform called the **Short-time Fourier Transform** (**STFT**) is typically applied on each individual frame.

## Formulation of the problem

It is clear that **Discrete-Time Signal Processing** (**DSP**) is meant for sampled signals and establishes a mathematical basis for DSP, which is essentially analyzing and modifying a signal to improve (or optimize) its efficiency or performance. By using DFT, a discrete sequence can be represented as its equivalent frequency ‘ domain. The linearity property of the Fourier transform yields two signals, and :

Where and are the Fourier transforms of and respectively, a concept often used in the filtering of signals, which is the transformation of the time domain to the frequency domain. The duality property of the Fourier transform is useful as it enables solving complex ones that otherwise would be difficult to compute directly. It yields that if x has a Fourier transform , then one can form a new function of time that has a functional form of the transformation, for example:

A time shift affects the frequency, and a frequency shift affects the time of the functions. Let us take an example of a spectrogram to understand DSP.

A spectrogram displays the spectrum of frequencies of a waveform over time and is extensively used in the fields of music and speech processing and radars. It is generated by an optical spectrometer, a Fourier transform, or a wavelet transform and is usually depicted as a heat map wherein the strength or intensity of the signal changes with the color (brightness). To generate a spectrogram, a time-domain signal is divided into chunks of equal lengths that usually overlap, and FFT is applied to each chunk for the calculation of the frequency range. The spectrogram is a plot or graph of the spectrum on each segment or FFT frame, as a frequency *versus* a time image (or a 3D surface), shown in *Figure 1**.4*, and the third dimension (represented by the color bar) indicates the amplitude of a particular frequency at a particular time. This process corresponds to the computation of the squared magnitude of STFT of the signal.

Figure 1.4: Spectrogram

Spectrograms can be used to identify characteristics of non-stationary or non-linear signals as a collection of time-frequency analyses. The parameters in a spectrogram typically are frame count (number of FFTs making it up), frequency range (minimum and maximum), FFT spacing, and FFT width (width of time each FFT represents).

Spectrograms are used with **recurrent neural networks** (**RNNs**) in speech recognition, as a primary example. We learned about how digital signals are free (well, almost) of noise and less distorted in this sub-section, and in the next, we are going to explore control theory, another mathematical modeling technique widely used in industrial processes. Control theory is, in general, useful whenever feedback happens in either regulator or servo mechanisms, for example, navigation systems and industrial production processes.

# Control theory

A branch of mathematics and engineering is control theory, which found its use in social sciences as well, such as economics and psychology. It deals with the behavior or evolution of dynamical systems. It is particularly useful when the dynamics of a system are not arbitrary, that is, we understand the physics of the system. The objective of control is to develop a model from measured data. This model is a mathematical description of inputs applied to drive a system to a desired state, minimizing any delay or error simultaneously and ensuring a level of stability.

The behavior of a dynamical system is influenced by a feedback loop – a controller manipulates the system inputs to obtain the desired effect on the output. An error-controlled regulation is typically carried out with a **proportional-integral-derivative** (**PID**) controller, and as the name suggests, the signal is derived from a weighted sum, integral, and derivative of the error signal. The error, which is the difference between the actual and the desired output, is applied as feedback to the input. The standard terminology for a system is a process, and for a controlled variable is a **process variable** (**PV**), and the objective remains the reduction of the deviation error. Using a negative feedback loop, a measurement of PV (E in *Figure 1**.5*) is deducted from a desired value S (set point or SP) to estimate an error (SP minus PV) in the system, which is used by a regulator R (*Figure 1**.5*) to reduce the gap between the measured value and desired value. The error may be introduced into the system T as a disturbance D, as shown in the closed loop (*Figure 1**.5*) of a controller.

Figure 1.5: Negative feedback controller

Control theory can be linear as well as non-linear. Linear control theory is applied to devices obeying the superposition principle, meaning the output is roughly proportional to the input. Such (close to ideal) systems are tractable by frequency domain mathematical techniques such as Laplace transform, Fourier transform, and the Nyquist stability criterion. Non-linear control theory, on the other hand, applies to real-world systems that do not obey the superposition principle. Such systems are often governed by non-linear differential equations and analyzed using numerical methods. Non-linear systems are studied numerically using simulating operations using a simulation language that mirrors the system processes. However, if solutions in the vicinity of a stable or equilibrium point are only of interest, non-linear control systems can be linearized into approximations using perturbation techniques.

## Understanding the problem

Mathematical techniques are served in either the frequency domain or time domain for analyzing control systems. The state variables in a frequency domain, representing the system’s input, output, and feedback, are functions of frequency. The transfer function, system function, or network function is a mathematical model of the relationship between the input and output, on the basis of differential equations governing or describing the system. The input and transfer functions are converted from functions of time to functions of frequency by a mathematical transformation. In this domain, the differential equations are replaced by algebraic equations, which are simpler to solve. The state variables in a time domain are functions of time, and the system is described by one or more differential equations.

Time domain techniques are used to explore and analyze real non-linear systems because frequency domain techniques can only be used to study (ideal) linear systems. Although the equations for non-linear systems are difficult to solve, computer simulation methods have made their analyses commonplace. A critical application of the control loop is in industrial process control systems design, as shown in *Figure 1**.6*.

Figure 1.6: Industrial control showing continuously modulated process flow

The building block of industrial processes is the control loop, which consists of all elements to measure and control a process value at a desired SP in the presence of perturbances. The controller may be an isolated piece of hardware or, within a large distributed control system, a **programmable logic controller** (**PLC**) system and SP inputs can be manually set or cascaded from another source. The green text in *Figure 1**.6* are tags that describe the function and identify a component and are unique (strings) within a plant representing the equipment components or elements. An associated sensor essentially captures the data of such tags.

## Formulation of the problem

Modern control theory utilizes state-space methods (time-domain representation), unlike classical control theory, which uses transform methods (frequency-domain representation) such as the Laplace transform, which encodes all system information. In the state-space approach, a mathematical model is a set of first-order differential equations governing the related set of input, output, and state variables of the system. These variables are expressed as vectors, and the differential equations have a matrix format, which is more convenient to tackle. On the contrary, algebraic equations representing the behavior of a linear dynamical system are written in matrix form.

The state-space approach is not limited to linear systems and provides a convenient and compact way of modeling and analyzing mostly non-linear systems with multiple inputs and outputs. State space refers to a space whose axes are state variables, and the system state is expressed as a vector within that space.

A plant or process is the part of the system that is controlled, and the controller (or simply filter) makes up the rest. Inputs to the process have an effect on the outputs, and the effect is measured with sensors and processed by the controller. The control signal is fed back to the input, thus closing the loop. Such a typical architecture is the PID controller, which is by and large the most used industrial design, shown in *Figure 1**.7*. It calculates an error value *e(t)* continuously, the error being the difference between the desired SP and measured PV, and applies a correction on the basis of proportional, integral, and derivative terms.

Figure 1.7: u(t) is the control signal sent to the system, e(t) = r(t) – y(t) is the error

When such a process is monitored by multiple controllers, it becomes a distributed control system with a decentralized control loop. Decentralization is useful as it helps the control systems to operate over a large area while interaction happens through communication channels.

Some of the main control techniques extensively used in industries include adaptive control, hierarchical control, optimal control, robust control, and stochastic control. Apart from these, intelligent control uses **artificial intelligence** (**AI**) and ML approaches such as fuzzy logic, neural networks, and so on to control a dynamic system. Industry 4.0 is revolutionizing the way manufacturers are integrating AI into their operations and production facilities.

# Summary

In this chapter, we introduced the concepts of mathematical modeling via the important areas it is largely implemented in or applied to, such as optimization, signal processing, control systems, and control engineering. Mathematical modeling or mathematical programming is the art of transforming a problem into a clear mathematical formulation. Its subsequent algorithmic implementation generates actionable insights and helps build further knowledge about the domain.

The chapter helped us learn the formulation of a mathematical optimization problem in order to arrive at an optimal solution, the formulation being dependent on the domain we intend to investigate. A mathematical optimization model is like a digital twin of a real-world business scenario. It mirrors the business landscape in a strictly mathematical and programming setup, and such an environment becomes particularly relevant for the interpretability of business processes to support high-stake decisions.

In the next chapter, we will find out how mathematical models emphasize the importance of both data and domain knowledge. Additionally, we will learn how ML models can be cast as optimization problems.