The stochastic model

We have discussed the deterministic model, where a single outcome with quantitative input values has no randomness. The word stochastic is derived from the Greek word called Stochastikos. It means skillful at guessing or chance. The antonym of this is "certain", "deterministic", or "sure". A stochastic model predicts a set of possible outcomes weighted by their likelihoods or probabilities. For instance, a coin when flipped in the air will "surely" land on earth eventually, but whether it lands heads or tails is "random".

Monte Carlo simulation

Monte Carlo simulation, which is also viewed as a probability simulation, is a technique used to understand the impact of risk and uncertainty in any forecasting model. The Monte Carlo method was invented in 1940 by Stanislaw Ulam when he was working on several nuclear weapon projects at the Los Alamos National Laboratory. Today, with computers, you can generate random numbers and run simulations pretty fast, but he amazingly found this method many years ago when computing was really hard.

In a forecasting model or any model that plans ahead for the future, there are assumptions made. These may be assumptions about the investment return on a portfolio, or how long it will take to complete a certain task. As these are future projections, the best thing to do is estimate the expected value.

What exactly is Monte Carlo simulation?

Monte Carlo simulation is a method to iteratively evaluate a deterministic model with sets of random numbers as inputs. This method is often used when the model is complex, nonlinear, or involves more than just a couple of uncertain parameters. A simulation can typically involve more than 100,000 or even a million evaluations of the model. Let's take a look at the difference between a deterministic model and a stochastic model. A deterministic model will have actual inputs that are deterministic to produce consistent results, as shown in the following image:

What exactly is Monte Carlo simulation?

Let's see how a probabilistic model is different from the deterministic model.

Stochastic models have inputs that are probabilistic and come from a probabilistic density function, producing a result that is also probabilistic. Here is how a stochastic model looks:

What exactly is Monte Carlo simulation?

Now, how do we describe the preceding diagram in words?

First, create a model. Let's say that you have determined three random inputs: x1, x2, and x3, determined a method: f(x1, x2 , x3), and generated a set of 10,000 random values of inputs (in some cases, it could be less or more). Evaluate the model for these inputs, repeat it for these 10,000 random inputs, and record these as yi, where i runs from 1 to 10,000. Analyze the results and pick one that is most likely.

For instance, if we were to find an answer to the question, that is, "what is the probability that the Los Angeles Clippers will win the seventh game?" With some random inputs in the context of basketball that make reasonable sense to the question, you can find an answer by running Monte Carlo simulation and obtain an answer: there is a 45 percent chance that they will win. Well, actually they lost.

Monte Carlo simulation depends heavily on random number generators; therefore, it makes sense to figure out what is the fastest and efficient way to perform Monte Carlo simulation? Hans Petter Langtangen has performed an outstanding task that shows that Monte Carlo simulation can be made much more efficient by porting the code to Cython at http://hplgit.github.io/teamods/MC_cython/sphinx/main_MC_cython.html, where he also compares it with pure C implementation.

Let's consider several examples to illustrate Monte Carlo simulation. The first example shows an inventory problem. Later, we will discuss an example in sports (Monte Carlo simulation is applicable to many sports analytics.

An inventory problem in Monte Carlo simulation

A fruit retail salesman sells some fruit and places an order for Y units everyday. Each unit that is sold gives a profit of 60 cents and units not sold at the end of the day are thrown out at a loss of 40 cents per unit. The demand, D, on any given day is uniformly distributed on [80, 140]. How many units should the retailer order to maximize the expected profit?

Let's denote the profit as P. When you attempt to create equations based on the preceding problem description, s denotes the number of units sold, whereas d denotes the demand, as shown in the following equation:

An inventory problem in Monte Carlo simulation

Using this representation of profit, the following Python program shows maximum profit:

import numpy as np
from math import log

import matplotlib.pyplot as plt

x=[]
y=[]

#Equation that defines Profit
def generateProfit(d):

   global s

   if d >= s: 
     return 0.6*s
   else:
     return 0.6*d - 0.4*(s-d)

# Although y comes from uniform distribution in [80,140]
# we are running simulation for d in [20,305]

maxprofit=0

for s in range (20, 305):

  # Run a simulation for n = 1000 
  # Even if we run for n = 10,000 the result would
  # be almost the same
  for i in range(1,1000):

     # generate a random value of d 
     d = np.random.randint(10,high=200)

     # for this random value of d, find profit and
     # update maxprofit
     profit = generateProfit(d)
     if profit > maxprofit:
        maxprofit = profit
  
  #store the value of s to be plotted along X axis 
  x.append(s)

  #store the value of maxprofit plotted along Y axis 
  y.append(log(maxprofit)) # plotted on log scale

plt.plot(x,y)
print "Max Profit:",maxprofit

# Will display this
Max Profit: 119.4

The following plot shows that the profit increases when the number of units are sold, but when the demand is met, the maximum profit stays constant:

An inventory problem in Monte Carlo simulation

The preceding plot is shown on the log scale, which means that the maximum profit was 119.4 from a simulation run for n=1000. Now, let's try to take a look at an analytical solution and see how close the simulation result is to the one from the analytical method.

As the demand (D) is uniformly distributed in [80,140], the expected profit is derived from the following integral:

An inventory problem in Monte Carlo simulation

The answer using the analytical method is 116, and Monte Carlo simulation also produces somewhere around this figure: 119. Sometimes, it produces 118 or 116. This depends on the number of trials.

Let's consider another simple example and seek an answer to the question: "in a classroom full of 30 students, what is the probability that more than one person will have the same birthday?" We will assume that it is not a leap year and instead of using the month and day, we will just use the days in the calendar year, that is, 365 days. The logic is pretty simple. The following code shows how one can calculate the probability of more than one person having the same birthday in a room of 30 students:

import numpy as np
numstudents = 30
numTrials = 10000
numWithSameBday = 0

for trial in range(numTrials):
    year = [0]*365

    for i in range(numstudents):
        newBDay = np.random.randint(365)
        year[newBDay] = year[newBDay] + 1

    haveSameBday = False
    for num in year:
        if num > 1:
           haveSameBday = True

    if haveSameBday == True:
        numWithSameBday = numWithSameBday + 1

prob = float(numWithSameBday) / float(numTrials)
print("The probability of a shared birthday in a class of ", numstudents, " is ", prob) 

('The probability of a shared birthday in a class of ', 30, ' is ', 0.7055)

In other words, there is a 70 percent chance that two people have the same birthday in a class of 30 students. The following example illustrates how Monte Carlo simulation can be applied to know the likelihood of winning the game in a situation that continues to occur more often today than in the past.

Monte Carlo simulation in basketball

Let's consider an example in a basketball game that is being addressed at the Khan Academy using JavaScript. The question was, "when down by three and left with only 30 seconds, is it better to attempt a hard 3-point shot or an easy 2-point shot and get another possession?"(asked by Lebron James).

Most readers may understand the basketball game, but we will only highlight some of the important rules. Each team has a 24-second possession of the ball. Within this time, if they score (even in less time), the opposing team gets possession for the next 24 seconds. However, as there is only 30 seconds left, if a player can make a quick 3-point shot in probably less than 10 seconds, then there is about 20 seconds left for the opposing team. Basketball, similar to any other sport, is very competitive, and since the player's goal is to reduce the trailing points, it is in their best interest to get a 3-point score. Let's try to write a Python program to answer the question.

Before showing the simulation program, let's take a look at some of the parameters that are involved in this problem domain. In order to determine whether shooting a 3-point shot is better in terms of the probability of winning, the following statistics of the player and also the opposing team is very important. The threePtPercent and twoPtPercent of the player not only determines his strength, but also determines the opposing team's percentage of scoring a 2-point labeled oppTwoPtPercent, and the opposing team's strength in a free throw percentage which is labeled oppFtPercent.

There are other combinations too, but to keep it simple, we will stop here. The higher the opposing team's free throw percentage, the more our answer is inclined towards making a 3-point shot. You can lower the value of oppFtPercent and see what is being discussed here. For this example, we will show snippets in different bits and pieces, and you can put them together in whichever way you are comfortable and use them to run. First, we will need the NumPy package because here, we intend to use the random number generator from this package, and matplotlib is required to plot, as shown in the following code:

import numpy as np
import matplotlib.pyplot as plt

In many examples, we will use colors that are standard colors used in tableau, and you may want to put this in a separate file. The following array of color codes can be used in any visualization plot:

colors = [(31, 119, 180), (174, 199, 232), (255, 127,  14),
   (255, 187, 120), (44, 160, 44), (214,  39,  40), (148,103,189),
   (152, 223, 138), (255,152,150), (197, 176, 213), (140, 86, 75),
   (196, 156, 148), (227,119,194), (247, 182, 210), (127,127,127),
   (199, 199, 199),(188,189, 34),(219, 219, 141), (23, 190,207),
   (158, 218, 229),(217,217,217)]

# Scale RGB values to the [0, 1] range, format matplotlib accepts.
for i in range(len(colors)):
  r, g, b = colors[i]
  colors[i] = (r / 255., g / 255., b / 255.)

Let's take a look at the three-point attempt. If threePtPercent is larger than the random number and there is more probability of an overtime, then a win is guaranteed. Take a look at the following code:

def attemptThree():
  if np.random.randint(0, high=100) < threePtPercent:
    if np.random.randint(0, high=100) < overtimePercent:
      return True #We won!!
  return False #We either missed the 3 or lost in OT

The logic for the two-point attempt is a little bit involved because it is all about how much time is left and who has the possession of the ball. Assuming that on an average, it takes only 5 seconds to attempt a two-point shot and the two-point scoring percent labeled twoPtPercent of the player is pretty high, then they score a two-point shot, which will be deducted from the value in the pointsDown variable. The following function is for a two-point scoring attempt:

def attemptTwo():
  havePossession = True
  pointsDown = 3
  timeLeft = 30
  while (timeLeft > 0):
    #What to do if we have possession
    if (havePossession):
      #If we are down by 3 or more, we take the
      #2 quickly.  If we are down by 2 or less
      #We run down the clock first
      if (pointsDown >= 3):
        timeLeft -= timeToShoot2
      else:
        timeLeft = 0

      #Do we make the shot?
      if (np.random.randint(0, high=100) < twoPtPercent):
        pointsDown -= 2
        havePossession = False
    else:
      #Does the opponent team rebound?
      #If so, we lose possession.
      #This doesn't really matter when we run
      #the clock down
      if (np.random.randint(0, high=100) >= offenseReboundPercent):
        havePossession = False
      else:   #cases where we don't have possession
        if (pointsDown > 0):  #foul to get back possession

          #takes time to foul
          timeLeft -= timeToFoul

          #opponent takes 2 free throws
          if (np.random.randint(0, high=100) < oppFtPercent):
            pointsDown += 1

          if (np.random.randint(0, high=100) < oppFtPercent):
            pointsDown += 1
            havePossession = True
        else:
          if (np.random.randint(0, high=100) >= ftReboundPercent):
            #you were able to rebound the missed ft
            havePossession = True
          else:  
            #tied or up so don't want to foul; 
            #assume opponent to run out clock and take
            if (np.random.randint(0, high=100) < oppTwoPtPercent):
              pointsDown += 2 #They made the 2
            timeLeft = 0


  if (pointsDown > 0):
    return False
  else:
    if (pointsDown < 0):
      return True
    else:
      if (np.random.randint(0, high=100) < overtimePercent):
        return True
      else:
        return False

For the sake of comparison, we will choose five players who have either a good 3-point average or a 2-point average or both, as shown in the following code:

plt.figure(figsize=(14,14))
names=['Lebron James', 'Kyrie Irving', 'Steph Curry', 
       'Kyle Krover', 'Dirk Nowitzki']
threePercents = [35.4,46.8,44.3,49.2, 38.0]
twoPercents = [53.6,49.1,52.8, 47.0,48.6]
colind=0

for i in range(5):  # can be run individually as well
  x=[]
  y1=[]
  y2=[]
  trials = 400 #Number of trials to run for simulation
  threePtPercent = threePercents[i] # % chance of making 3-pt shot
  twoPtPercent = twoPercents[i] # % chance of making a 2-pt shot
  oppTwoPtPercent = 40 #Opponent % chance making 2-pter
  oppFtPercent = 70 #Opponent's FT %
  timeToShoot2 = 5 #How many seconds elapse to shoot a 2
  timeToFoul = 5 #How many seconds elapse to foul opponent
  offenseReboundPercent = 25 #% of regular offense rebound
  ftReboundPercent = 15 #% of offense rebound after missed FT
  overtimePercent = 50 #% chance of winning in overtime

  winsTakingThree = 0
  lossTakingThree = 0
  winsTakingTwo = 0
  lossTakingTwo = 0
  curTrial = 1

  while curTrial < trials:
    #run a trial take the 3
    if (attemptThree()):
      winsTakingThree += 1
    else:
      lossTakingThree += 1
      #run a trial taking a 2
      if attemptTwo() == True :
        winsTakingTwo += 1
      else:
        lossTakingTwo += 1

      x.append(curTrial)
      y1.append(winsTakingThree)
      y2.append(winsTakingTwo)
      curTrial += 1


  plt.plot(x,y1, color=colors[colind], label=names[i]+" Wins Taking Three Point", linewidth=2)
  plt.plot(x,y2, color=colors[20], label=names[i]+" Wins Taking Two Point", linewidth=1.2)
  colind += 2

legend = plt.legend(loc='upper left', shadow=True,)
for legobj in legend.legendHandles:
    legobj.set_linewidth(2.6)
plt.show()

This was run for individual players by setting the range 1 and only including that player's name and statistics. In all the cases, as the opponent team's 2-point percent was high (70 percent), for all the players, Monte Carlo simulation resulted in suggesting wins by taking a 3-point score. Let's take a look at the results when one of them is plotted individually and all together.

We have picked players with a reasonably good 3-point percentage from the latest statistics available at http://www.basketball-reference.com/teams/. The statistics is current as of May 12, 2015. In all the cases, taking an attempt to score 3 points has a better chance of winning. If the opposing team has a lower average of free point throws, then the result will be different.

The following two plots shows the results for an individual player and five chosen players from the NBA League (2015):

Monte Carlo simulation in basketball

The preceding screenshot shows the three-point and two-point attempts by Lebron. The following plot shows the attempts of the other four players for comparison:

Monte Carlo simulation in basketball

The volatility plot

We have seen many useful Python packages so far. Time and again, we have seen the use of matplotlib, but here, we will show the use of pandas with a very few lines of code so that you can achieve financial plots quickly. Standard deviation is a statistical term that measures the amount of variation or dispersion around an average. This is also a measure of volatility.

By definition, dispersion is the difference between the actual value and the average value. For plotting volatility along with the closed values, this example illustrates how you can see from a given start date, how a particular stock (such as IBM) is performing, and take a look at the volatility with the following code:

import pandas.io.data as stockdata
import numpy as np
r,g,b=(31,  119, 180)
colornow=(r/255.,g/255.,b/255.)
ibmquotes = stockdata.DataReader(name='IBM', data_source='yahoo', start='2005-10-1')
ibmquotes['Volatility'] = np.log(ibmquotes['Close']/
    ibmquotes['Close'].shift(1))
ibmquotes[['Close', 'Volatility']].plot(figsize=(12,10), 
    subplots=True, color=colornow) 

The following screenshot is a result of the volatility plot:

The volatility plot

Now, let's see how our volatility is measured. Volatility is the measure of variation of price, which can have various peaks. Moreover, the exponential function lets us plug in time and gives us growth; logarithm (inverse of exponential) lets us plug in growth and gives us the time measure. The following snippet shows logarithm plot of measuring volatility:

%time
ibmquotes['VolatilityTest'] = 0.0
for I in range(1, len(ibmquotes)):
  ibmquotes['VolatilityTest'] = 
    np.log(ibmquotes['Close'][i]/ibmquotes['Close'][i-1])

If we time this, the preceding snippet will take the following:

CPU times: user 1e+03 ns, sys: 0 ns, total: 1e+03 ns Wall time: 5.01 µs

To break down and show how we did is using %time and assigning volatility measure using the ratio of close value against the change in close value, as shown here:

%time
ibmquotes['Volatility'] = np.log(ibmquotes['Close']/ ibmquotes['Close'].shift(1))

If we time this, the preceding snippet will take the following:

CPU times: user 2 µs, sys: 3 µs, total: 5 µs Wall time: 5.01 µs.

The higher the variance in their values, the more volatile it turns out. Before we attempt to plot some of the implied volatility of volatility against the exercise price, let's take a look at the VSTOXX data. This data can be downloaded from http://www.stoxx.com or http://www.eurexchange.com/advanced-services/. Sample rows of the VSTOXX data is shown here:

            V2TX   V6I1   V6I2   V6I3   V6I4   V6I5   V6I6   V6I7   V6I8 
  Date                                                                     
2015-05-18  21.01  21.01  21.04    NaN  21.12  21.16  21.34  21.75 21.84 
2015-05-19  20.05  20.06  20.15  17.95  20.27  20.53  20.83  21.38 21.50 
2015-05-20  19.57  19.57  19.82  20.05  20.22  20.40  20.63  21.25 21.44 
2015-05-21  19.53  19.49  19.95  20.14  20.39  20.65  20.94  21.38 21.55
2015-05-22  19.63  19.55  20.07  20.31  20.59  20.83  21.09  21.59 21.73

This data file consists of Euro Stoxx volatility indices, which can all be plotted via one simple filter mechanism of dates between Dec 31, 2011 to May 1, 2015. The following code can be used to plot the VSTOXX data:

import pandas as pd

url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
vstoxx_index = pd.read_csv(url, index_col=0, header=2,
                           parse_dates=True, dayfirst=True,
                           sep=',')
vstoxx_short = vstoxx_index[('2011/12/31' < vstoxx_index.index)
                            & (vstoxx_index.index < '2015/5/1')]
# to plot all together
vstoxx_short.plot(figsize=(15,14))

When the preceding code is run, it creates a plot that compares Euro Stoxx volatility index:

The volatility plot

The preceding plot shows all the indices plot together, but if they are to be plotted on separate subplots, you may have to set the following subplots to True:

# to plot in subplots separately
vstoxx_short.plot(subplots=True, grid=True, color='r',
     figsize=(20,20), linewidth=2)
The volatility plot

Implied volatilities

The Black–Scholes–Merton model is a statistical model of a financial market. From this model, one can find an estimate of the price of European style options. This formula is widely used, and many empirical tests have shown that the Black–Scholes price is "fairly close" to the observed prices. Fischer Black and Myron Scholes first published this model in their 1973 paper, The Pricing of Options and Corporate Liabilities. The main idea behind the model is to hedge the option by buying and selling the underlying asset in just the right way.

In this model, the price of a European call option on a nondividend paying stock is as follows:

Implied volatilities
Implied volatilities

For a given European call option, Cg, the implied volatility is calculated from the preceding equation (the standard deviation of log returns). The partial derivative of the option pricing formula with respect to the volatility is called Vega. This is a number that tells in what direction and to what extent the option price will move if there is a positive 1 percent change in the volatility and only in the volatility, as shown in the following equation:

Implied volatilities

The volatility model (such as BSM) forecasts the volatility and what the financial uses of this model entail, forecasting the characters of the future returns. Such forecasts are used in risk management and hedging, market timing, portfolio selection, and many other financial activities. American Call or Put Option provides you the right to exercise at any time, but for European Call or Put Option, one can exercise only on the expiration date.

There is no closed form solution to Black–Scholes–Merton (BSM), but with the Newton's method (also known as the Newton-Raphson method), one can obtain an approximate value by iteration. Whenever an iterative method is involved, there is a certain amount of threshold that goes in to determine the terminating condition of the iteration. Let's take a look at the Python code to find the values by iteration (Newton's method) and plot them:

Implied volatilities
from math import log, sqrt, exp
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt

colors = [(31, 119, 180), (174, 199, 232), (255,128,0), 
 (255, 15, 14), (44, 160, 44), (152, 223, 138), (214, 39, 40), 
 (255, 152, 150),(148, 103, 189), (197, 176, 213), (140, 86, 75),
(196, 156, 148),(227, 119, 194), (247, 182, 210), (127, 127, 127),
(199, 199, 199),(188, 189, 34), (219, 219, 141), (23, 190, 207),
(158, 218, 229)]

# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(colors)):
  r, g, b = colors[i]
  colors[i] = (r / 255., g / 255., b / 255.)

def black_scholes_merton(S, r, sigma, X, T):

  S = float(S) # convert to float
  logsoverx = log (S/X)
  halfsigmasquare = 0.5 * sigma ** 2
  sigmasqrtT = sigma * sqrt(T)

  d1 = logsoverx + ((r + halfsigmasquare) * T) / sigmasqrtT
  d2 = logsoverx + ((r - halfsigmasquare) * T) / sigmasqrtT

  # stats.norm.cdf —> cumulative distribution function
  value = (S * stats.norm.cdf(d1, 0.0, 1.0) – 
     X * exp(-r * T) *   stats.norm.cdf(d2, 0.0, 1.0))

  return value

def vega(S, r, sigma, X, T):

  S = float(S)
  logsoverx = log (S/X)
  halfsigmasquare = 0.5 * sigma ** 2
  sigmasqrtT = sigma * sqrt(T) 
  d1 = logsoverx + ((r + halfsigmasquare) * T) / sigmasqrtT
  vega = S * stats.norm.cdf(d1, 0.0, 1.0) * sqrt(T)

  return vega 

def impliedVolatility(S, r, sigma_est, X, T, Cstar, it):

  for i in range(it):
    numer = (black_scholes_merton(S, r, sigma_est, X, T) - Cstar)
    denom = vega(S,r, sigma_est, X, T)
    sigma_est -= numer/denom

  return sigma_est

We have these functions ready to be used, which can either be used in a separate file and imported, or to run this only once, be embedded in code altogether. The input file is obtained from stoxx.com in a file called vstoxx_data.h5, as shown in the following code:

h5 = pd.HDFStore('myData/vstoxx_data_31032014.h5', 'r')

futures_data = h5['futures_data'] # VSTOXX futures data
options_data = h5['options_data'] # VSTOXX call option data

h5.close()

options_data['IMP_VOL'] = 0.0
V0 = 17.6639  # the closing value of the index
r=0.04        # risk free interest rate
sigma_est=2
tol = 0.5     # tolerance level for moneyness

Now, let's run the iteration with the options_data and futures_data values form:

for option in options_data.index:
  # iterating over all option quotes
  futureval = futures_data[futures_data['MATURITY'] == 
      options_data.loc[option]['MATURITY']]['PRICE'].values[0]

  # picking the right futures value 
  if (futureval * (1 - tol) < options_data.loc[option]['STRIKE'] 
    < futureval * (1 + tol)):
    impliedVol = impliedVolatility(V0,r,sigma_est,
            options_data.loc[option]['STRIKE'], 
            options_data.loc[option]['TTM'], 
            options_data.loc[option]['PRICE'],  #Cn
            it=100)                             #iterations 
    options_data['IMP_VOL'].loc[option] = impliedVol
  
plot_data = options_data[options_data['IMP_VOL'] > 0]
maturities = sorted(set(options_data['MATURITY']))

plt.figure(figsize=(15, 10))

i=0
for maturity in maturities:

  data = plot_data[options_data.MATURITY == maturity]

  # select data for this maturity
  plot_args = {'lw':3, 'markersize': 9}
  plt.plot(data['STRIKE'], data['IMP_VOL'], label=maturity.date(),
       marker='o', color=colors[i], **plot_args)
  i += 1

plt.grid(True)

plt.xlabel('Strike rate $X$', fontsize=18)
plt.ylabel(r'Implied volatility of $sigma$', fontsize=18)
plt.title('Short Maturity Window (Volatility Smile)', fontsize=22)

plt.legend()
plt.show()

The following plot is the result of running the preceding program that demonstrates implied volatility against the strike rate with data downloaded from http://vstoxx.com. Alternatively, this can be downloaded at http://knapdata.com/python/vstoxx_data_31032014.h5. The following plot shows implied volatility against the strike rate of Euro Stoxx:

Implied volatilities

The portfolio valuation

The common sense of portfolio valuation of an entity is to estimate its current worth. Valuations are usually applicable on a financial asset or liability and can be performed on stocks, options, business enterprises, or intangible assets. For the purposes of understanding valuation and their visualization methods, we will pick mutual funds and plot them, compare them, and find the correlation.

Let's assume that we value all the portfolios denominated in a single currency. This simplifies the aggregation of values in a portfolio significantly.

We will pick three funds from the Vanguard, such as Vanguard US Total (vus.to), Vanguard Canadian Capped (vre.to), and Vanguard Emerging Markets (vee.to). The following code shows the comparison of three Vanguard funds.

import pandas as pd   #gets numpy as pd.np

from pandas.io.data import get_data_yahoo
import matplotlib.pyplot as plt

# get data
data = get_data_yahoo(["vus.to","vre.to","vee.to"], 
  start = '2014-01-01')['Adj Close']

data.plot(figsize=(10,10), lw=2) 
plt.show()

There is also another alternative to obtain data using get_data_yahoo() from pandas.io.data, as shown in the following screenshot:

The portfolio valuation

Besides plotting them, one may also get the correlation matrix after converting prices to log returns in order to scale the values, as shown in the following code:

#convert prices to log returns
retn=data.apply(pd.np.log).diff()

# make corr matrix
retn.corr()

#make scatterplot to show correlation
pd.tools.plotting.scatter_matrix(retn, figsize=(10,10))
plt.show()

# some more stats
retn.skew()
retn.kurt()
# Output
vee.to    0.533157 
vre.to    3.717143 
vus.to    0.906644 
dtype: float64

The correlation plot is shown in the following image. This was obtained using the scatter_matrix function from pandas after applying the skew() and kurt() correlation:

The portfolio valuation

The simulation model

A model is a representation of a construction and system's functions. A model is similar to the system it represents and is easier to understand. A simulation of a system is a working model of the system. The model is usually reconfigurable to allow frequent experimentation. The operation of the model can be useful to study the model. Simulation is useful before an existing system is built to reduce the possibility of failure in order to meet specifications.

When is a particular system suitable for the simulation model? In general, whenever there is a need to model and analyze randomness in a system, simulation is the tool of choice.

Geometric Brownian simulation

Brownian motion is an example of a random walk, which is widely used to model physical processes, such as diffusion and biological processes and social and financial processes (such as the dynamics of a stock market).

Brownian motion is a sophisticated method. This is based on a process in plants discovered by R. Brown in 1827. It has a range of applications, including modeling noise in images, generating fractals, the growth of crystals, and the stock market simulation. For the purposes of the relevance of the contents here, we will pick the latter, that is, the stock market simulation.

M.F.M Osborne studied the logarithms of common stock prices and the value of money and showed that they have an ensemble of impact in statistical equilibrium. Using statistics and the prices of stock choice at random times, he was able to derive a distribution function that closely resembles the distribution for a particle in Brownian motion.

Definition of geometric Brownian motion:

A stochastic process (St) is said to follow a geometric Brownian motion if it satisfies the following stochastic differential equation:

Geometric Brownian simulation

Integrating both sides and applying the initial condition: St = So, the solution to the preceding equation can be arrived at as follows:

Geometric Brownian simulation

Using the preceding derivation, we can plug in the values to obtain the following Brownian motion:

import matplotlib.pyplot as plt
import numpy as np

'''
Geometric Brownian Motion with drift!
u=drift factor 
sigma: volatility 
T: time span
dt: length of steps
S0: Stock Price in t=0
W: Brownian Motion with Drift N[0,1] 
'''
rect = [0.1, 5.0, 0.1, 0.1]
fig = plt.figure(figsize=(10,10))

T = 2
mu = 0.1
sigma = 0.04
S0 = 20
dt = 0.01
N = round(T/dt)
t = np.linspace(0, T, N)

# Standare normal distrib
W = np.random.standard_normal(size = N) 
W = np.cumsum(W)*np.sqrt(dt) 

X = (mu-0.5*sigma**2)*t + sigma*W

#Brownian Motion 
S = S0*np.exp(X) 

plt.plot(t, S, lw=2)
plt.xlabel("Time t", fontsize=16)
plt.ylabel("S", fontsize=16)
plt.title("Geometric Brownian Motion (Simulation)",
  fontsize=18)
plt.show()

The result of the Brownian motion simulation is shown in the following screenshot:

Geometric Brownian simulation

The simulating stock prices using Brownian motion is also shown in the following code:

import pylab, random

class Stock(object):
    def __init__(self, price, distribution):
        self.price = price
        self.history = [price]
        self.distribution = distribution
        self.lastChange = 0

    def setPrice(self, price):
        self.price = price
        self.history.append(price)

    def getPrice(self):
        return self.price

    def walkIt(self, marketBias, mo):
        oldPrice = self.price
        baseMove = self.distribution() + marketBias
        self.price = self.price * (1.0 + baseMove)
        if mo:
            self.price = self.price + random.gauss(.5, .5)*self.lastChange
        if self.price < 0.01:
            self.price = 0.0
        self.history.append(self.price)
        self.lastChange = oldPrice - self.price

    def plotIt(self, figNum):
        pylab.figure(figNum)
        pylab.plot(self.history)
        pylab.title('Closing Price Simulation Run-' + str(figNum))
        pylab.xlabel('Day')
        pylab.ylabel('Price')


def testStockSimulation():
    def runSimulation(stocks, fig, mo):
        mean = 0.0
        for s in stocks:
            for d in range(numDays):
                s.walkIt(bias, mo)
            s.plotIt(fig)
            mean += s.getPrice()
        mean = mean/float(numStocks)
        pylab.axhline(mean)
    pylab.figure(figsize=(12,12))
    numStocks = 20
    numDays = 400
    stocks = []
    bias = 0.0
    mo = False
    startvalues = [100,500,200,300,100,100,100,200,200, 300,300,400,500,00,300,100,100,100,200,200,300]
    for i in range(numStocks):
        volatility = random.uniform(0,0.2)
        d1 = lambda: random.uniform(-volatility, volatility)
        stocks.append(Stock(startvalues[i], d1))
    runSimulation(stocks, 1, mo)

testStockSimulation()
pylab.show()

The results of the closing price simulation using random data from the uniform distribution is shown in the following screenshot:

Geometric Brownian simulation

The diffusion-based simulation

Stochastic models provide a more detailed understanding of the reaction diffusion processes. Such a description is often necessary for the modeling of biological systems. There are a variety of simulation models that have been studied, and to restrict ourselves within the context of this chapter, we will consider the square-root diffusion.

The square-root diffusion, popularized for finance by Cox, Ingersoll, and Ross (1985) is used to model mean reverting quantities (such as interest rates and volatility). The stochastic differential equation of this process is as follows:

The diffusion-based simulation

The values of xt have chi-squared distribution, but in the discrete version, they can be approximated by normal distribution. By discrete version, we mean applying Euler's numerical method of approximation using the iterative approach, as shown in the following equation:

The diffusion-based simulation
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as npr

S0 = 100 # initial value
r = 0.05 
sigma = 0.25 
T = 2.0 

x0=0
k=1.8
theta=0.24
i = 100000
M = 50
dt = T / M
def srd_euler():
  xh = np.zeros((M + 1, i))
  x1 = np.zeros_like(xh)
  xh[0] = x0
  x1[0] = x0
  for t in range(1, M + 1):
   xh[t] = (xh[t - 1]
    + k * (theta - np.maximum(xh[t - 1], 0)) * dt
    + sigma * np.sqrt(np.maximum(xh[t - 1], 0)) * np.sqrt(dt) 
    * npr.standard_normal(i))
  x1 = np.maximum(xh, 0)
  return x1
x1 = srd_euler()

plt.figure(figsize=(10,6))
plt.hist(x1[-1], bins=30, color='#98DE2f', alpha=0.85)
plt.xlabel('value')
plt.ylabel('frequency')
plt.grid(False)

plt.figure(figsize=(12,10))
plt.plot(x1[:, :10], lw=2.2)
plt.title("Square-Root Diffusion - Simulation")
plt.xlabel('Time', fontsize=16)
plt.ylabel('Index Level', fontsize=16)
#plt.grid(True)
plt.show()
The diffusion-based simulation
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset