# Introduction to Mathematical Modeling

There is a great deal of interesting work happening in data sciences, especially in the realms of **Machine Learning** (**ML**) and **Deep Learning** (**DL**), and they are popular for good reason. However, the more tried and tested old-timer, mathematical modeling, is not much talked about. Mathematical modeling methods are no less relevant and are complementary to ML. To create successful data products that solve real business problems, we must often deploy the whole breadth of available mathematical tools, far beyond ML.

A model is a simplified representation of a real system and captures the essence of the system. A mathematical model uses variables, operators, functions, equations, and equalities. Under the hood of mathematical models, there are first-principle models based on physical laws, stochastic models based on distributions, averages, and empirical models based on patterns or historical data. Based on the particular type of modeling, qualitative or quantitative recommendations can be made for the system under consideration. A mathematical model facilitates design and prototyping and substantiates decisions. To formulate a mathematical model, one needs the input and output, the constants and variables, the domain and boundary, or initial conditions and constraints. The solution can be analytic or numerical; in either case, it determines the typical behavior and critical parameters of the system, trends, dependency, and operating regimes. Systems can be deterministic, wherein we know the cause-effect relationship, or they may be stochastic, involving probability distributions.

A few mature tools in mathematical modeling are in the following areas:

- Mathematical optimization
- Signal processing
- Control theory

We will explore these mathematical modeling approaches in the following sections. A narrow focus on ML misses out on many relevant features of pure mathematical optimization in many use cases. Successful solutions across disparate domains blend the new world of ML with classical mathematical modeling techniques. For example, one can combine state-space modeling methods with ML to infer unobserved parameters of systems in a parameter estimation problem.

# Mathematical optimization

A branch of applied mathematics is mathematical optimization, popularly known as mathematical programming. It finds applications in fields such as manufacturing, inventory control, scheduling, networks, economics, engineering, and financial portfolio allocation. Almost any classification, regression, or clustering problem can be cast as an optimization problem. Some problems are static, while some are dynamic, wherein the values of system variables change over time.

## Understanding the problem

Mathematical optimization is basically choosing inputs from a set of allowed options to obtain the optimized or best possible output in a given problem. There are variables, which are essentially the decisions we have to make; constraints, which are the business rules we have to adhere to; and objectives, which are the business goals we are aiming to achieve by representing the real-world business problem as an optimization problem. For example, a hospital’s business problem is equipment and facility capacity planning. Medical equipment including beds and testing kits comprise the decision variables in this case; constraints are conventional and crisis capacity levels and regulations; and finally, the objective is to maximize resource utilization and service performance and minimize operating costs at the same time.

The most basic optimization problem consists of an objective function or cost function, which is the output value we try to optimize, in other words, maximize or minimize. The inputs are variables that can be controlled. Variables can be either discrete or continuous. The scale of a problem is pretty much determined by the dimensionality, that is, the number of scalar variables (also called decision variables) on which the search is performed. Constraints or equations place limits on how big or small some variables can get. Some problems have constraints, which can be equality or inequality constraints, while some problems do not have them at all, which implies the unbounded optimization of the function.

A linear programming problem is an optimization problem wherein the objective function and all constraints are linear, that is, the variables have only first-order terms. It was linear programming that led to the development of optimization in the 1940s. If either the function or one or more constraint(s) is non-linear, then we have a non-linear programming problem. For example, optimizing smooth (well-defined gradient, continuous) functions is easier. Knowing the problem type enables the selection of the right tool to solve it.

## Formulation of the problem

The general formulation of a mathematical problem with an objective function *f(x)* represents questions in terms of variables and constraints. A typical form is as follows:

Minimize f ( such that where i = 1, 2, ...., m

The nature of variables and constraints can be quite diverse. The variables may be discrete, continuous, or sets (groups), and the constraints may be deterministic or stochastic. The objective function may also include dynamic aspects.

Sometimes we are interested in finding the global optimum point without any constraints or restrictions on the region in space. Such problems are unconstrained optimization problems. At other times, we have to solve problems subject to certain constraints, such as restrictions on control variables. For example, in the preceding case, we might have to minimize the function subject to (. These are constrained optimization problems.

Example 1:

Let us have multiple (inequalities) constraints with two variables, *x* and *y*, as follows:

*2x + 3y ≤ **34*

*3x + 5y ≤ **54*

*0 ≤ **x, y*

A graphical optimization would be an overlap (dark region) of the graphs, shown in *Figure 1**.1*. Here the constraints are linear, and therefore, the maximum and minimum must lie on the boundary. And it is most likely that the optimum solution occurs at one of the three specified points. With non-linear constraints, the optimum occurs either at the boundaries or between them. In unconstrained optimization, either the function has no boundaries, or if it has, they are soft.

Figure 1.1: Graphical representation of linear constrained optimization

Typical constraints in business problems involve time, money, and resources while attempting to maximize an objective function. The constraints are more particular to the use case at hand while minimizing an objective function. Suppose in the preceding problem the objective function is linear, such as *f(x, y) = 20x + 35y*, and the optimum is found out from the slope of the function. If *f(x, y)* takes a value, the value becomes a boundary, and the constraint plus the boundary make a linear constraint.

With linear constraints, the overlap region is considered to be feasible. Non-linear constraints can be very difficult to visualize as a distorted *x*-*y* plane makes it almost impossible to graph the feasible region.

Example 2:

In non-linear constrained optimization, the first step is to start on the boundary of the feasible region. To minimize the objective function, the vector direction should be chosen so that it decreases the function and stays in the feasible region. If the dot product of the gradient (slope) of the objective function with the vector itself is negative at a point on the boundary, then the vector is said to be moving in the descent direction. Also, a vector that does not violate the constraints is said to move in a feasible direction.

Figure 1.2: Feasible direction in non-linear constrained optimization

The constraint equation on the boundary is *g(x)* =0, shown in *Figure 1**.2*. A feasible vector cannot cause the value of *g(x)* to increase. It must either remain zero or decrease. If the dot product of the gradient of the constraint with the vector itself is negative or zero at the point, then the vector is said to be moving in a feasible direction. For example, say we have the following objective function:

Minimize

And the initial point (4, 2 on a single constraint:

Where and are the variables, in general, standing for a matrix or array. The vector `<-1, 0>`

is in both descent and feasible directions. Since the initial point is randomly chosen, there is a good chance that the overlap between the set of all feasible vectors and the set of all descent vectors is large. However, as we approach the minimum, the overlap gets smaller, and at the minimum or optimum point, there is no overlap at all. At the optimum, one cannot minimize the objective function further without violating the constraint. We know we have reached the optimum when the dot product of the two gradients is negative, and the two vectors have a matrix determinant equal to zero.

Another possibility is that the optimum occurs in the interior of the feasible region rather than on the boundary. In such a case, the gradient of the objective function will be zero at that point. The concavity (non-convexity) of the point is determined by the eigenvalues of Hessian (second differential) of the function.

In optimization problems where the objective function is noisy or its gradient is computed numerically as the gradient is not given (complex boundary value problems, for instance), errors are induced. Even if the objective functions themselves are not noisy, gradient-based optimization may turn out to be noisy. There are different optimizers available as library functions with Scientific Python, or `scipy`

for short, to solve such optimization problems, and we will learn about a few of them in the following chapters. Now that we have learned about the concepts of mathematical optimization, we shall explore another concept in mathematical modeling, which is signal processing.

# Signal processing

Another branch of applied mathematics is signal processing, which finds its application in the engineering field, focusing on analyzing and processing signals such as sound, images, scientific measurements, and filtering out noise. Signal processing deals with the transformation of a signal from time-series to hyper-spectral images, which are obtained from different electromagnetic measurements. Classic transformations of signals such as spectrograms and wavelets are often used with ML techniques. Such representations can also be used as inputs to deep neural networks. The Kalman filter is one classic signal processing filter that uses a series of measurements over time to produce estimates of unknown variables.

## Understanding the problem

A signal is a function of a continuous variable, such as time or space. An analog signal is transformed into a digital signal by sampling it at specified intervals of time called the sampling period, the inverse of which is the sampling rate (per second or Hertz). The sampling rate has to be at least twice as high as the maximum frequency of the analog signal. It establishes a sufficient condition that permits a discrete sequence of samples to encapsulate all the information from a continuous time signal into a discrete time signal.

Figure 1.3: 60 kHz sinusoidal (Hann-windowed) tone burst in the time domain and frequency domain of the signal

The frequency domain representation of a signal is done with the **Discrete Fourier Transform** (**DFT**). The **Fast Fourier Transform** (**FFT**) is an efficient computation method of DFT. FFT is rarely applied over the entire signal (speech signal, for example) at once but rather in frames due to the stochastic nature of the signal, an example of which is illustrated in *Figure 1**.3*. FFT is available as a library function with `scipy`

for the computation of the frequencies of each frame. A type of Fourier transform called the **Short-time Fourier Transform** (**STFT**) is typically applied on each individual frame.

## Formulation of the problem

It is clear that **Discrete-Time Signal Processing** (**DSP**) is meant for sampled signals and establishes a mathematical basis for DSP, which is essentially analyzing and modifying a signal to improve (or optimize) its efficiency or performance. By using DFT, a discrete sequence can be represented as its equivalent frequency ‘ domain. The linearity property of the Fourier transform yields two signals, and :

Where and are the Fourier transforms of and respectively, a concept often used in the filtering of signals, which is the transformation of the time domain to the frequency domain. The duality property of the Fourier transform is useful as it enables solving complex ones that otherwise would be difficult to compute directly. It yields that if x has a Fourier transform , then one can form a new function of time that has a functional form of the transformation, for example:

A time shift affects the frequency, and a frequency shift affects the time of the functions. Let us take an example of a spectrogram to understand DSP.

A spectrogram displays the spectrum of frequencies of a waveform over time and is extensively used in the fields of music and speech processing and radars. It is generated by an optical spectrometer, a Fourier transform, or a wavelet transform and is usually depicted as a heat map wherein the strength or intensity of the signal changes with the color (brightness). To generate a spectrogram, a time-domain signal is divided into chunks of equal lengths that usually overlap, and FFT is applied to each chunk for the calculation of the frequency range. The spectrogram is a plot or graph of the spectrum on each segment or FFT frame, as a frequency *versus* a time image (or a 3D surface), shown in *Figure 1**.4*, and the third dimension (represented by the color bar) indicates the amplitude of a particular frequency at a particular time. This process corresponds to the computation of the squared magnitude of STFT of the signal.

Figure 1.4: Spectrogram

Spectrograms can be used to identify characteristics of non-stationary or non-linear signals as a collection of time-frequency analyses. The parameters in a spectrogram typically are frame count (number of FFTs making it up), frequency range (minimum and maximum), FFT spacing, and FFT width (width of time each FFT represents).

Spectrograms are used with **recurrent neural networks** (**RNNs**) in speech recognition, as a primary example. We learned about how digital signals are free (well, almost) of noise and less distorted in this sub-section, and in the next, we are going to explore control theory, another mathematical modeling technique widely used in industrial processes. Control theory is, in general, useful whenever feedback happens in either regulator or servo mechanisms, for example, navigation systems and industrial production processes.