(For more resources related to this topic, see here.)
In this article, we will cover the following topics:

Generating random numbers from standard normal distribution and normal distribution

Generating random numbers from a uniform distribution

A simple application: estimate pi by the Monte Carlo simulation

Generating random numbers from a Poisson distribution

Bootstrapping with/without replacements

The lognormal distribution and simulation of stock price movements

Simulating terminal stock prices

Simulating an efficient portfolio and an efficient frontier
Generating random numbers from a standard normal distribution
Normal distributions play a central role in finance. A major reason is that many finance theories, such as option theory and applications, are based on the assumption that stock returns follow a normal distribution. It is quite often that we need to generate n random numbers from a standard normal distribution. For this purpose, we have the following two lines of code:
>>>import scipy as sp >>>x=sp.random.standard_normal(size=10)
The basic random numbers in SciPy/NumPy are created by Mersenne Twister PRNG in the numpy.random function. The random numbers for distributions in numpy.random are in cython/pyrex and are pretty fast. To print the first few observations, we use the print() function as follows:
>>>print x[0:5] [0.55062594 0.51338547 0.04208367 0.66432268 0.49461661] >>>
Alternatively, we could use the following code:
>>>import scipy as sp >>>x=sp.random.normal(size=10)
This program is equivalent to the following one:
>>>import scipy as sp >>>x=sp.random.normal(0,1,10)
The first input is for mean, the second input is for standard deviation, and the last one is for the number of random numbers, that is, the size of the dataset. The default settings for mean and standard deviations are 0 and 1. We could use the help() function to find out the input variables. To save space, we show only the first few lines:
>>>help(sp.random.normal) Help on builtin function normal: normal(...) normal(loc=0.0, scale=1.0, size=None)
Drawing random samples from a normal (Gaussian) distribution
The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently, is often called the bell curve because of its characteristic shape; refer to the following graph:
Again, the density function for a standard normal distribution is defined as follows:
(1)
Generating random numbers with a seed
Sometimes, we like to produce the same random numbers repeatedly. For example, when a professor is explaining how to estimate the mean, standard deviation, skewness, and kurtosis of five random numbers, it is a good idea that students could generate exactly the same values as their instructor. Another example would be that when we are debugging our Python program to simulate a stock's movements, we might prefer to have the same intermediate numbers. For such cases, we use the seed() function as follows:
>>>import scipy as sp >>>sp.random.seed(12345) >>>x=sp.random.normal(0,1,20) >>>print x[0:5] [0.20470766 0.47894334 0.51943872 0.5557303 1.96578057] >>>
In this program, we use 12345 as our seed. The value of the seed is not important. The key is that the same seed leads to the same random values.
Generating n random numbers from a normal distribution
To generate n random numbers from a normal distribution, we have the following code:
>>>import scipy as sp >>>sp.random.seed(12345) >>>x=sp.random.normal(0.05,0.1,50) >>>print x[0:5] [ 0.02952923 0.09789433 0.00194387 0.00557303 0.24657806] >>>
The difference between this program and the previous one is that the mean is 0.05 instead of 0, while the standard deviation is 0.1 instead of 1. The density of a normal distribution is defined by the following equation, where μ is the mean and σ is the standard deviation. Obviously, the standard normal distribution is just a special case of the normal distribution shown as follows:
(2)
Histogram for a normal distribution
A histogram is used intensively in the process of analyzing the properties of datasets. To generate a histogram for a set of random values drawn from a normal distribution with specified mean and standard deviation, we have the following code:
>>>import scipy as sp >>>import matplotlib.pyplot as plt >>>sp.random.seed(12345) >>>x=sp.random.normal(0.08,0.2,1000) >>>plt.hist(x, 15, normed=True) >>>plt.show()
The resultant graph is presented as follows:
Graphical presentation of a lognormal distribution
When returns follow a normal distribution, the prices would follow a lognormal distribution. The definition of a lognormal distribution is as follows:
(3)
The following code shows three different lognormal distributions with three pairs of parameters, such as (0, 0.25), (0, 0.5), and (0, 1.0). The first parameter is for mean (), while the second one is for standard deviation, :
import scipy.stats as sp import numpy as np import matplotlib.pyplot as plt x=np.linspace(0,3,200) mu=0 sigma0=[0.25,0.5,1] color=['blue','red','green'] target=[(1.2,1.3),(1.7,0.4),(0.18,0.7)] start=[(1.8,1.4),(1.9,0.6),(0.18,1.6)] for i in range(len(sigma0)): sigma=sigma0[i] y=1/(x*sigma*sqrt(2*pi))*exp((log(x)mu)**2/(2*sigma*sigma)) plt.annotate('mu='+str(mu)+', sigma='+str(sigma),
xy=target[i], xytext=start[i], arrowprops=dict(facecolor=color[i],shrink=0.01),) plt.plot(x,y,color[i]) plt.title('Lognormal distribution') plt.xlabel('x') plt.ylabel('lognormal density distribution') plt.show()
The corresponding three graphs are put together to illustrate their similarities and differences:
Generating random numbers from a uniform distribution
When we plan to randomly choose m stocks from n available stocks, we could draw a set of random numbers from a uniform distribution. To generate 10 random numbers between one and 100 from a uniform distribution, we have the following code. To guarantee that we generate the same set of random numbers, we use the seed() function as follows:
>>>import scipy as sp >>>sp.random.seed(123345) >>>x=sp.random.uniform(low=1,high=100,size=10)
Again, low, high, and size are the three keywords for the three input variables. The first one specifies the minimum, the second one specifies the high end, while the size gives the number of the random numbers we intend to generate. The first five numbers are shown as follows:
>>>print x[0:5] [ 30.32749021 20.58006409 2.43703988 76.15661293 75.06929084] >>>
Using simulation to estimate the pi value
It is a good exercise to estimate pi by the Monte Carlo simulation. Let's draw a square with 2R as its side. If we put the largest circle inside the square, its radius will be R. In other words, the areas for those two shapes have the following equations:
(4)
(5)
By dividing equation (4) by equation (5), we have the following result:
In other words, the value of pi will be 4* Scircle/Ssquare. When running the simulation, we generate n pairs of x and y from a uniform distribution with a range of zero and 0.5. Then we estimate a distance that is the square root of the summation of the squared x and y, that is, . Obviously, when d is less than 0.5 (value of R), it will fall into the circle. We can imagine throwing a dart that falls into the circle. The value of the pi will take the following form:
(6)
The following graph illustrates these random points within a circle and within a square:
The Python program to estimate the value of pi is presented as follows:
import scipy as sp n=100000 x=sp.random.uniform(low=0,high=1,size=n) y=sp.random.uniform(low=0,high=1,size=n) dist=sqrt(x**2+y**2) in_circle=dist[dist<=1] our_pi=len(in_circle)*4./n print ('pi=',our_pi) print('error (%)=', (our_pipi)/pi)
The estimated pi value would change whenever we run the previous code as shown in the following code, and the accuracy of its estimation depends on the number of trials, that is, n:
('pi=', 3.15) ('error (%)=', 0.0026761414789406262) >>>
Generating random numbers from a Poisson distribution
To investigate the impact of private information, Easley, Kiefer, O'Hara, and Paperman (1996) designed a (PIN) Probability of informed trading measure that is derived based on the daily number of buyerinitiated trades and the number of sellerinitiated trades. The fundamental aspect of their model is to assume that order arrivals follow a Poisson distribution. The following code shows how to generate n random numbers from a Poisson distribution:
import scipy as sp import matplotlib.pyplot as plt x=sp.random.poisson(lam=1, size=100) #plt.plot(x,'o') a = 5. # shape n = 1000 s = np.random.power(a, n) count, bins, ignored = plt.hist(s, bins=30) x = np.linspace(0, 1, 100) y = a*x**(a1.) normed_y = n*np.diff(bins)[0]*y plt.plot(x, normed_y) plt.show()
Selecting m stocks randomly from n given stocks
Based on the preceding program, we could easily choose 20 stocks from 500 available securities. This is an important step if we intend to investigate the impact of the number of randomly selected stocks on the portfolio volatility as shown in the following code:
import scipy as sp n_stocks_available=500 n_stocks=20 x=sp.random.uniform(low=1,high=n_stocks_available,size=n_stocks) y=[] for i in range(n_stocks): y.append(int(x[i])) #print y final=unique(y) print final print len(final)
In the preceding program, we select 20 numbers from 500 numbers. Since we have to choose integers, we might end up with less than 20 values, that is, some integers appear more than once after we convert real numbers into integers. One solution is to pick more than we need. Then choose the first 20 integers. An alternative is to use the randrange() and randint() functions. In the next program, we choose n stocks from all available stocks. First, we download a dataset from http://canisius.edu/~yany/yanMonthly.pickle:
n_stocks=10 x=load('c:/temp/yanMonthly.pickle') x2=unique(np.array(x.index)) x3=x2[x2<'ZZZZ'] # remove all indices sp.random.seed(1234567) nonStocks=['GOLDPRICE','HML','SMB','Mkt_Rf','Rf','Russ3000E_D','US_DEBT', 'Russ3000E_X','US_GDP2009dollar','US_GDP2013dollar'] x4=list(x3) for i in range(len(nonStocks)): x4.remove(nonStocks[i]) k=sp.random.uniform(low=1,high=len(x4),size=n_stocks) y,s=[],[] for i in range(n_stocks): index=int(k[i]) y.append(index) s.append(x4[index]) final=unique(y) print final print s
In the preceding program, we remove nonstock data items. These nonstock items are a part of data items. First, we load a dataset called yanMonthly.pickle that includes over 200 stocks, gold price, GDP, unemployment rate, SMB (Small Minus Big), HML (High Minus Low), riskfree rate, price rate, market excess rate, and Russell indices.
The .pickle extension means that the dataset has a type from Pandas. Since x.index would present all indices for each observation, we need to use the unique() function to select all unique IDs. Since we only consider stocks to form our portfolio, we have to move all market indices and other nonstock securities, such as HML and US_DEBT. Because all stock market indices start with a carat (^), we use less than ZZZZ to remove them. For other IDs that are between A and Z, we have to remove them one after another. For this purpose, we use the remove() function available for a list variable. The final output is shown as follows:
Bootstrapping with/without replacements
Assume that we have the historical data, such as price and return, for a stock. Obviously, we could estimate their mean, standard deviation, and other related statistics. What are their expected annual mean and risk next year? The simplest, maybe naïve way is to use the historical mean and standard deviation. A better way is to construct the distribution of annual return and risk. This means that we have to find a way to use historical data more effectively to predict the future. In such cases, we could apply the bootstrapping methodology. For example, for one stock, we have its last 20year monthly returns, that is, 240 observations.
To estimate next year's 12 monthly returns, we need to construct a return distribution. First, we choose 12 returns randomly from the historical return set without replacements and estimate their mean and standard deviations. We repeat this procedure 5,000 times. The final output will be our returnstandard distribution. Based on such a distribution, we could estimate other properties as well. Similarly, we could do so with replacements.
One of the useful functions present in SciPy is called permutation(). Assume that we have 10 numbers from one to 10 (inclusive of one and 10). We could call the permutation() function to reshuffle them as follows:
import numpy as np x=range(1,11) print x for i in range(5): y=np.random.permutation(x) print y
The output of this code is shown as follows:
Based on the permutation() function, we could define a function with three input variables: data, number of observations we plan to choose from the data randomly, and whether we choose to bootstrap with or without replacement as shown in the following code:
import numpy as np def boots_f(data,n_obs,replacement=None): n=len(data) if (n<n_obs): print "n is less than n_obs" else: if replacement==None: y=np.random.permutation(data) return y[0:n_obs] else: y=[] for i in range(n_obs): k=np.random.permutation(data) y.append(k[0]) return y
The constraint specified in the previous program is that the number of given observations should be larger than the number of random returns we plan to pick up. This is true for the bootstrapping without the replacement method. For the bootstrapping with the replacement method, we could relax this constraint; refer to the related exercise.
Distribution of annual returns
It is a good application to estimate annualized return distribution and represent it as a graph. To make our exercise more meaningful, we download Microsoft's daily price data. Then, we estimate its daily returns and convert them into annual ones. Based on those annual returns, we generate its distribution by applying bootstrapping with replacements 5,000 times as shown in the following code:
from matplotlib.finance import quotes_historical_yahoo import matplotlib.pyplot as plt import numpy as np import scipy as sp # Step 1: input area ticker='MSFT' # input value 1 begdate=(1926,1,1) # input value 2 enddate=(2013,12,31) # input value 3 n_simulation=5000 # input value 4 # Step 2: retrieve price data and estimate log returns x=quotes_historical_yahoo(ticker,begdate,enddate,
asobject=True,adjusted=True) logret = log(x.aclose[1:]/x.aclose[:1]) # Step 3: estimate annual returns date=[] d0=x.date for i in range(0,size(logret)): date.append(d0[i].strftime("%Y")) y=pd.DataFrame(logret,date,columns=['logret'],dtype=float64) ret_annual=exp(y.groupby(y.index).sum())1 ret_annual.columns=['ret_annual'] n_obs=len(ret_annual) # Step 4: estimate distribution with replacement sp.random.seed(123577) final=zeros(n_obs,dtype=float) for i in range(0,n_obs): x=sp.random.uniform(low=0,high=n_obs,size=n_obs) y=[] for j in range(n_obs): y.append(int(x[j])) z=np.array(ret_annual)[y] final[i]=mean(z) # step 5: graph plt.title('Mean return distribution: number
of simulations ='+str(n_simulation)) plt.xlabel('Mean return') plt.ylabel('Frequency') mean_annual=round(np.mean(np.array(ret_annual)),4) plt.figtext(0.63,0.8,'mean annual='+str(mean_annual)) plt.hist(final, 50, normed=True) plt.show()
The corresponding graph is shown as follows:
Simulation of stock price movements
We mentioned in the previous sections that in finance, returns are assumed to follow a normal distribution, whereas prices follow a lognormal distribution. The stock price at time t+1 is a function of the stock price at t, mean, standard deviation, and the time interval as shown in the following formula:
(7)
In this formula, is the stock price at t+1, is the expected stock return, is the time interval (), T is the time (in years), n is the number of steps, ε is the distribution term with a zero mean, and σ is the volatility of the underlying stock. With a simple manipulation, equation (4) can lead to the following equation that we will use in our programs:
(8)
In a riskneutral work, no investors require compensation for bearing risk. In other words, in such a world, the expected return on any security (investment) is the riskfree rate. Thus, in a riskneutral world, the previous equation becomes the following equation:
(9)
If you want to learn more about the riskneutral probability, refer to Options, Futures and Other Derivatives, 7th edition, John Hull, Pearson, 2009. The Python code to simulate a stock's movement (path) is as follows:
import scipy as sp stock_price_today = 9.15 # stock price at time zero T =1. # maturity date (in years) n_steps=100. # number of steps mu =0.15 # expected annual return sigma = 0.2 # volatility (annualized) sp.random.seed(12345) # seed() n_simulation = 5 # number of simulations dt =T/n_steps S = sp.zeros([n_steps], dtype=float) x = range(0, int(n_steps), 1) for j in range(0, n_simulation): S[0]= stock_price_today for i in x[:1]: e=sp.random.normal() S[i+1]=S[i]+S[i]*(mu0.5*pow(sigma,2))*dt+
sigma*S[i]*sp.sqrt(dt)*e; plot(x, S) figtext(0.2,0.8,'S0='+str(S0)+',mu='+str(mu)+',sigma='+str(sigma)) figtext(0.2,0.76,'T='+str(T)+', steps='+str(int(n_steps))) title('Stock price (number of simulations = %d ' % n_simulation +')') xlabel('Total number of steps ='+str(int(n_steps))) ylabel('stock price') show()
To make our graph more readable, we deliberately choose just five simulations. Since the seed() function is applied, you can replicate the following graph by running the previous code:
Graphical presentation of stock prices at options' maturity dates
Up to now, we have discussed that options are really pathindependent, which means the option prices depend on terminal values. Thus, before pricing such an option, we need to know the terminal stock prices. To extend the previous program, we have the following code to estimate the terminal stock prices for a given set of values: S0 (initial stock price), n_simulation (number of terminal prices), T (maturity date in years), n_steps (number of steps), mu (expected annual stock returns), and sigma (volatility):
from scipy import zeros, sqrt, shape import scipy as sp S0 = 9.15 # stock price at time zero T =1. # years n_steps=100. # number of steps mu =0.15 # expected annual return sigma = 0.2 # volatility (annual) sp.random.seed(12345) # fix those random numbers n_simulation = 1000 # number of simulation dt =T/n_steps S = zeros([n_simulation], dtype=float) x = range(0, int(n_steps), 1) for j in range(0, n_simulation): tt=S0 for i in x[:1]: e=sp.random.normal() tt+=tt*(mu0.5*pow(sigma,2))*dt+sigma*tt*sqrt(dt)*e; S[j]=tt title('Histogram of terminal price') ylabel('Number of frequencies') xlabel('Terminal price') figtext(0.5,0.8,'S0='+str(S0)+',mu='+str(mu)+',sigma='+str(sigma)) figtext(0.5,0.76,'T='+str(T)+', steps='+str(int(n_steps))) figtext(0.5,0.72,'Number of terminal prices='+str(int(n_simulation))) hist(S)
The histogram of our simulated terminal prices is shown as follows:
Finding an efficient portfolio and frontier
In this section, we show you how to use the Monte Carlo simulation to generate returns for a pair of stocks with known means, standard deviations, and correlation between them. By applying the maximize function, we minimize the portfolio risk of this twostock portfolio. Then, we change the correlations between the two stocks to illustrate the impact of correlation on our efficient frontier. The last one is the most complex one since it constructs an efficient frontier based on n stocks.
Finding an efficient frontier based on two stocks
The following program aims at generating an efficient frontier based on two stocks with known means, standard deviations, and correlation. We have just six input values: two means, two standard deviations, the correlation (), and the number of simulations. To generate the correlated y1 and y2 time series, we generate the uncorrelated x1 and x2 series first. Then, we apply the following formulae:
(10A)
(10B)
Another important issue is how to construct an objective function to minimize. Our objective function is the standard deviation of the portfolio in addition to a penalty that is defined as the scaled absolute deviation from our target portfolio mean. In other words, we minimize both the risk of the portfolio and the deviation of our portfolio return from our target return as shown in the following code:
import numpy as np import scipy as sp import pandas as pd from datetime import datetime as dt from scipy.optimize import minimize # Step 1: input area mean_0=(0.15,0.25) # mean returns for 2 stocks std_0= (0.10,0.20) # standard deviations for 2 stocks corr_=0.2 # correlation between 2 stocks n=1000 # number of simulations (returns) for each stock # Step 2: Generate two uncorrelated time series n_stock=len(mean_0) sp.random.seed(12345) # could generate the same random numbers x1=sp.random.normal(loc=mean_0[0],scale=std_0[0],size=n) x2=sp.random.normal(loc=mean_0[1],scale=std_0[1],size=n) if(any(x1)<=1.0 or any(x2)<=1.0): print ('Error: return is <=100%') # Step 3: Generate two correlated time series index_=pd.date_range(start=dt(2001,1,1),periods=n,freq='d') y1=pd.DataFrame(x1,index=index_) y2=pd.DataFrame(corr_*x1+sqrt(1corr_**2)*x2,index=index_) # step 4: generate a return matrix called R R0=pd.merge(y1,y2,left_index=True,right_index=True) R=np.array(R0) # Step 5: define a few functions def objFunction(W, R, target_ret): stock_mean=np.mean(R,axis=0) port_mean=np.dot(W,stock_mean) # portfolio mean cov=np.cov(R.T) # varcovar matrix port_var=np.dot(np.dot(W,cov),W.T) # portfolio variance penalty = 2000*abs(port_meantarget_ret) # penalty 4 deviation return np.sqrt(port_var) + penalty # objective function # Step 6: estimate optimal portfolio for a given return out_mean,out_std,out_weight=[],[],[] stockMean=np.mean(R,axis=0) for r in np.linspace(np.min(stockMean), np.max(stockMean), num=100): W = ones([n_stock])/n_stock # start equal w b_ = [(0,1) for i in range(n_stock)] # bounds c_ = ({'type':'eq', 'fun': lambda W: sum(W)1. })# constraint result=minimize(objFunction,W,(R,r),method='SLSQP',
constraints=c_, bounds=b_) if not result.success: # handle error raise BaseException(result.message) out_mean.append(round(r,4)) # a few decimal places std_=round(np.std(np.sum(R*result.x,axis=1)),6) out_std.append(std_) out_weight.append(result.x) # Step 7: plot the efficient frontier title('Simulation for an Efficient Frontier from given 2 stocks') xlabel('Standard Deviation of the 2stock Portfolio (Risk)') ylabel('Return of the 2stock portfolio') figtext(0.2,0.80,' mean = '+str(stockMean)) figtext(0.2,0.75,' std ='+str(std_0)) figtext(0.2,0.70,' correlation ='+str(corr_)) plot(np.array(std_0),np.array(stockMean),'o',markersize=8) plot(out_std,out_mean,'',linewidth=3)
The corresponding graph is shown as follows:
Summary
In this article, we discussed several types of distributions: normal, standard normal, lognormal, and Poisson. Since the assumption that stocks follow a lognormal distribution and returns follow a normal distribution is the cornerstone for option theory, the Monte Carlo simulation is used to price European options. Under certain scenarios, Asian options might be more effective in terms of hedging. Exotic options are more complex than the vanilla options since the former have no closedform solution, while the latter could be priced by the BlackScholesMerton option model.One way to price these exotic options is to use the Monte Carlo simulation. The Python programs to price an Asian option and lookback options are discussed in detail.
Resources for Article:
Further resources on this subject:
 Web Scraping with Python [Article]
 Python Testing: Installing the Robot Framework [Article]
 Python Multimedia: Fun with Animations using Pyglet [Article]