You're reading from Hands-On Data Science with Anaconda

Product type Book

Published in May 2018

Publisher Packt

ISBN-13 9781788831192

Pages 364 pages

Edition 1st Edition

Languages

Python

Concepts

Data Science

Authors (2):

Yuxing Yan

James Yan

View More author details

Optimization in Anaconda

Optimization plays a very important role in the area of data science. For example, in finance, investors are constantly seeking a trade-off between risk and return. To diversify their investment, they would like to invest in different industries or buy several stocks. Thus, for an expected portfolio return, how do they choose appropriate stocks to minimize their portfolio risk? For this objective, we could apply some kind of portfolio optimization technique.

Another application is linked to the government's tax policy. We know that cutting the corporate tax rate would encourage companies that are considering more capital investment (that is, long-term investment), such as in equipment. However, at the same time, the government's revenue might fall. If this is true, the government would have to cut many programs intended to help unfortunate people...

Why optimization is important

In our lives, people face all kinds of choices. In a sense, we implement, either consciously or subconsciously, various kinds of implied optimization procedures. For example, when a high school junior or senior is looking for a college, they might have many choices, such as good schools, local schools, public schools, or private schools. When making a decision with a couple of offers, these high school students usually have some objectives in their mind. These objectives might include the ranking and cost of attending a school, scholarships, the reputation, and name recognition of the program, or even the fame of the football team. For corporations, they have to make all kinds of optimal or reasonable decisions. For instance, what kinds of products they should produce, what are the quantities and at what prices, and to whom they should target. Since...

General issues for optimization problems

There are several issues in optimization. The most important one is how to choose an appropriate objective function. For some cases, the objective function is obvious. Unfortunately, for other cases, it is not that crystal clear. Since choosing a good objective depends on the specific situation, we will discuss it further, but remember that an appropriate objective function might make our task much easier.

In many cases, an inappropriate objective function might cause the following problems:

It is difficult to find a feasible solution
We might end up with a local solution
We might have a corner solution
It takes a long time to converge (that is, too much computation time to find a good solution)

Let's look at a convex function; the code and corresponding graph are given here:

x<-seq(-10,10,0.1) 
a<-4 
b<- -2 
c<-10 
y...

Quadratic optimization

If the highest power is 1, then we call it a linear model. On the other hand, if the highest power is 2, we call it a quadratic function. The R optim() function can be used to find a solution for a minimization problem. For example, we have the following objective function:

Since there is only one variable, we can solve it manually. Take the first-order derivative and set it to:

x<-seq(-10,10,0.1) 
a<--2 
b<-10 
c<-5 
y<-a*x^2+b*x+c 
plot(x,y,type='l')

The related graph is shown here:

From the graph, we know that we could get a maximum y value when x is zero:

y<-20-3.5*x^2 
a<--2 
b<-10 
c<-5 
f<-function(x)-(a*x^2+b*x+c)

In the preceding formula, we use a negative function since the R optim() function would get a minimum value instead of a maximum one:

> optim(0.3,f) 
$par 
[1] 2.500078 
$value 
[1] -17.5 ...

Example #1 – stock portfolio optimization

Sometimes we refer to single-period portfolio optimization as Markowitz portfolio optimization. Our input datasets include the expected returns, the standard deviations, and the correlation matrix between financial assets, and our output will be an efficient frontier formed by those assets. In the rest of the chapter, we will use historical returns to represent expected returns and use the historical correlation in place of expected correlation.

In the following examples, we use an R package called fPortfolio. We use the following code to install the package:

install.packages("fPortfolio")

To load various embedded datasets, we use the data() function (see the following example code):

library(fPortfolio)
data(GCCINDEX.RET)
dim(GCCINDEX.RET)
 [1] 824  11

The following table lists the embedded datasets:

Name

Dimension...

Example #2 – optimal tax policy

Another example is the optimal taxation level in an LQ economy. Here, LQ stands for Linear Quadratic (model). This example is borrowed from Thomas J. Sargent and John Stachurski. Their webpage is at https://lectures.quantecon.org/py/lqramsey.html. They modify a well-known model of Robert Lucas and Nancy Stokey so that convenient formulas for solving linear-quadratic models can be applied to simplify the calculations. There are two types of player in the economy: the household and a benevolent government. The government finances an exogenous stream of government purchases with state-contingent loans and a linear tax on labor income. The household maximizes its utility function by choosing paths for consumption and labor, taking prices and the government's tax rate and borrowing plans as given. Note that to maximize attainable utility...

Packages for optimization in R

There are many R packages available for various types of optimization, such as optimization, MlBayesOpt, rgenoud, colf, and mize. The following table offers a partial list:

#	Name	Description
1	`dtwclust`	Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance
2	`CVXR`	Disciplined Convex Optimization
3	`IROmiss`	Imputation Regularized Optimization Algorithm
4	`subplex`	Unconstrained Optimization using the Subplex Algorithm
5	`GPareto`	Gaussian Processes for Pareto Front Estimation and Optimization
6	`OOR`	Optimistic Optimization in R
7	`ROI`	R Optimization Infrastructure
8	`lbreg`	Log-Binomial Regression with Constrained Optimization
9	`PEIP`	Geophysical Inverse Theory and Optimization
10	`dfoptim`	Derivative-Free Optimization
11	`SPOT`	Sequential...

Packages for optimization in Python

From Chapter 6, Managing Packages, we know that to find all Python packages, we go to the website at https://pypi.python.org/. The following table shows a list of Python packages/models related to optimization after we type Optimization as the keyword:

Package	Wt*	Description
`heuristic-optimization 0.4.3`	7	Heuristics for derivative-free optimization
`streams-optimization 1.0.0.dev3`	7	A library for LHCb trigger/stripping streams optimization
`adjointShapeOptimizationFlux 1.0`	6	Python frontend of the `adjointShapeOptimizationFoam`
`bayesian-optimization 0.6.0`	6	Bayesian Optimization package
`emzed_optimizations 0.6.0`	6	particular optimizations for speeding up `emzed`
`scikits.optimization 0.3`	6	A python module for numerical optimization
`asprin 3.0.2`	5	Qualitative and quantitative optimization...

Packages for optimization in Octave

To find packages for optimization in Octave, we go to the web page at https://octave.sourceforge.io/packages.php. Then, we can search these packages by using the keyword optimization; see the first package called ga in this screenshot:

The second one is the optim package:

Packages for optimization in Julia

Similarly, for packages for optimization in Julia, we go to the web page at https://pkg.julialang.org/. Then, we can search these packages by using the keyword optimization; see one package called JuMP in this screenshot:

There are about 45 matches for the keyword optimization. To save space, we won't show the other packages. This web page, titled Optimization packages for Julia language, might be quite useful: http://www.juliaopt.org/.

Summary

In this chapter, we have discussed several topics around optimization, such as general issues for optimization problems, expressing various kinds of optimization problems as LPPs, and quadratic optimization. Several examples were offered to make our discussion more practice-oriented, such as how to choose an optimal stock portfolio, optimize wealth and resources to promote sustainable development, and how much the government really should tax. In addition, we introduced several packages for optimization in R, Python, Julia, and Octave.

In the next chapter, we will discuss unsupervised learning. In particular, we will explain hierarchical clustering and k-means clustering. For R and Python, we will explain in detail several related packages. For R, we will discuss Rattle, randomUniformForest, and Rmixmod. For Python, we will cover SciPy, Contrastive, milk, Scikit-learn...

Review questions and exercises

What does optimization mean?
What is an LPP? What are its uses?
What is the difference between a global solution and a local solution?
In what situations would our LPP program not converge? Give a few simple examples and possible solutions.
Explain why we have the following weird result:

> f<-function(x)-2*x^2+3*x+1 
> optim(13,f) 
$par 
[1] 2.352027e+75 
$value 
[1] -1.106406e+151 
$counts 
function gradient  
     502       NA  
$convergence 
[1] 1 
$message 
NULL

What does quadratic equation mean?
From where could we search all the R packages targeting optimization issues?
What is the usage of the task view related to optimization?
According to the related task view, how many R packages are associated with optimization, and how do we install them all at once?
From the Prof. French Data Library, download the return data for 10 industries...