Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Data Science with Anaconda

You're reading from  Hands-On Data Science with Anaconda

Product type Book
Published in May 2018
Publisher Packt
ISBN-13 9781788831192
Pages 364 pages
Edition 1st Edition
Languages
Authors (2):
Yuxing Yan Yuxing Yan
Profile icon Yuxing Yan
James Yan James Yan
Profile icon James Yan
View More author details

Table of Contents (15) Chapters

Preface Ecosystem of Anaconda Anaconda Installation Data Basics Data Visualization Statistical Modeling in Anaconda Managing Packages Optimization in Anaconda Unsupervised Learning in Anaconda Supervised Learning in Anaconda Predictive Data Analytics – Modeling and Validation Anaconda Cloud Distributed Computing, Parallel Computing, and HPCC References Other Books You May Enjoy

Optimization in Anaconda

Optimization plays a very important role in the area of data science. For example, in finance, investors are constantly seeking a trade-off between risk and return. To diversify their investment, they would like to invest in different industries or buy several stocks. Thus, for an expected portfolio return, how do they choose appropriate stocks to minimize their portfolio risk? For this objective, we could apply some kind of portfolio optimization technique.

Another application is linked to the government's tax policy. We know that cutting the corporate tax rate would encourage companies that are considering more capital investment (that is, long-term investment), such as in equipment. However, at the same time, the government's revenue might fall. If this is true, the government would have to cut many programs intended to help unfortunate people...

Why optimization is important

In our lives, people face all kinds of choices. In a sense, we implement, either consciously or subconsciously, various kinds of implied optimization procedures. For example, when a high school junior or senior is looking for a college, they might have many choices, such as good schools, local schools, public schools, or private schools. When making a decision with a couple of offers, these high school students usually have some objectives in their mind. These objectives might include the ranking and cost of attending a school, scholarships, the reputation, and name recognition of the program, or even the fame of the football team. For corporations, they have to make all kinds of optimal or reasonable decisions. For instance, what kinds of products they should produce, what are the quantities and at what prices, and to whom they should target. Since...

General issues for optimization problems

There are several issues in optimization. The most important one is how to choose an appropriate objective function. For some cases, the objective function is obvious. Unfortunately, for other cases, it is not that crystal clear. Since choosing a good objective depends on the specific situation, we will discuss it further, but remember that an appropriate objective function might make our task much easier.

In many cases, an inappropriate objective function might cause the following problems:

  • It is difficult to find a feasible solution
  • We might end up with a local solution
  • We might have a corner solution
  • It takes a long time to converge (that is, too much computation time to find a good solution)

Let's look at a convex function; the code and corresponding graph are given here:

x<-seq(-10,10,0.1) 
a<-4 
b<- -2 
c<-10 
y...

Quadratic optimization

If the highest power is 1, then we call it a linear model. On the other hand, if the highest power is 2, we call it a quadratic function. The R optim() function can be used to find a solution for a minimization problem. For example, we have the following objective function:

Since there is only one variable, we can solve it manually. Take the first-order derivative and set it to:

x<-seq(-10,10,0.1) 
a<--2 
b<-10 
c<-5 
y<-a*x^2+b*x+c 
plot(x,y,type='l') 

The related graph is shown here:

From the graph, we know that we could get a maximum y value when x is zero:

y<-20-3.5*x^2 
a<--2 
b<-10 
c<-5 
f<-function(x)-(a*x^2+b*x+c) 

In the preceding formula, we use a negative function since the R optim() function would get a minimum value instead of a maximum one:

> optim(0.3,f) 
$par 
[1] 2.500078 
$value 
[1] -17.5 ...

Example #1 – stock portfolio optimization

Sometimes we refer to single-period portfolio optimization as Markowitz portfolio optimization. Our input datasets include the expected returns, the standard deviations, and the correlation matrix between financial assets, and our output will be an efficient frontier formed by those assets. In the rest of the chapter, we will use historical returns to represent expected returns and use the historical correlation in place of expected correlation.

In the following examples, we use an R package called fPortfolio. We use the following code to install the package:

install.packages("fPortfolio") 

To load various embedded datasets, we use the data() function (see the following example code):

library(fPortfolio)
data(GCCINDEX.RET)
dim(GCCINDEX.RET) [1] 824 11

The following table lists the embedded datasets:

#
Name
Dimension...

Example #2 – optimal tax policy

Another example is the optimal taxation level in an LQ economy. Here, LQ stands for Linear Quadratic (model). This example is borrowed from Thomas J. Sargent and John Stachurski. Their webpage is at https://lectures.quantecon.org/py/lqramsey.html. They modify a well-known model of Robert Lucas and Nancy Stokey so that convenient formulas for solving linear-quadratic models can be applied to simplify the calculations. There are two types of player in the economy: the household and a benevolent government. The government finances an exogenous stream of government purchases with state-contingent loans and a linear tax on labor income. The household maximizes its utility function by choosing paths for consumption and labor, taking prices and the government's tax rate and borrowing plans as given. Note that to maximize attainable utility...

Packages for optimization in R

There are many R packages available for various types of optimization, such as optimization, MlBayesOpt, rgenoud, colf, and mize. The following table offers a partial list:

#
Name
Description

1

dtwclust

Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

2

CVXR

Disciplined Convex Optimization

3

IROmiss

Imputation Regularized Optimization Algorithm

4

subplex

Unconstrained Optimization using the Subplex Algorithm

5

GPareto

Gaussian Processes for Pareto Front Estimation and Optimization

6

OOR

Optimistic Optimization in R

7

ROI

R Optimization Infrastructure

8

lbreg

Log-Binomial Regression with Constrained Optimization

9

PEIP

Geophysical Inverse Theory and Optimization

10

dfoptim

Derivative-Free Optimization

11

SPOT

Sequential...

Packages for optimization in Python

From Chapter 6, Managing Packages, we know that to find all Python packages, we go to the website at https://pypi.python.org/. The following table shows a list of Python packages/models related to optimization after we type Optimization as the keyword:

Package
Wt*
Description

heuristic-optimization 0.4.3

7

Heuristics for derivative-free optimization

streams-optimization 1.0.0.dev3

7

A library for LHCb trigger/stripping streams optimization

adjointShapeOptimizationFlux 1.0

6

Python frontend of the adjointShapeOptimizationFoam

bayesian-optimization 0.6.0

6

Bayesian Optimization package

emzed_optimizations 0.6.0

6

particular optimizations for speeding up emzed

scikits.optimization 0.3

6

A python module for numerical optimization

asprin 3.0.2

5

Qualitative and quantitative optimization...

Packages for optimization in Octave

To find packages for optimization in Octave, we go to the web page at https://octave.sourceforge.io/packages.php. Then, we can search these packages by using the keyword optimization; see the first package called ga in this screenshot:

The second one is the optim package:

Packages for optimization in Julia

Similarly, for packages for optimization in Julia, we go to the web page at https://pkg.julialang.org/. Then, we can search these packages by using the keyword optimization; see one package called JuMP in this screenshot:

There are about 45 matches for the keyword optimization. To save space, we won't show the other packages. This web page, titled Optimization packages for Julia language, might be quite useful: http://www.juliaopt.org/.

Summary

In this chapter, we have discussed several topics around optimization, such as general issues for optimization problems, expressing various kinds of optimization problems as LPPs, and quadratic optimization. Several examples were offered to make our discussion more practice-oriented, such as how to choose an optimal stock portfolio, optimize wealth and resources to promote sustainable development, and how much the government really should tax. In addition, we introduced several packages for optimization in R, Python, Julia, and Octave.

In the next chapter, we will discuss unsupervised learning. In particular, we will explain hierarchical clustering and k-means clustering. For R and Python, we will explain in detail several related packages. For R, we will discuss Rattle, randomUniformForest, and Rmixmod. For Python, we will cover SciPy, Contrastive, milk, Scikit-learn...

Review questions and exercises

  1. What does optimization mean?
  2. What is an LPP? What are its uses?
  3. What is the difference between a global solution and a local solution?
  4. In what situations would our LPP program not converge? Give a few simple examples and possible solutions.
  5. Explain why we have the following weird result:
> f<-function(x)-2*x^2+3*x+1 
> optim(13,f) 
$par 
[1] 2.352027e+75 
$value 
[1] -1.106406e+151 
$counts 
function gradient  
     502       NA  
$convergence 
[1] 1 
$message 
NULL
  1. What does quadratic equation mean?
  2. From where could we search all the R packages targeting optimization issues?
  3. What is the usage of the task view related to optimization?
  4. According to the related task view, how many R packages are associated with optimization, and how do we install them all at once?
  5. From the Prof. French Data Library, download the return data for 10 industries...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Data Science with Anaconda
Published in: May 2018 Publisher: Packt ISBN-13: 9781788831192
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}