Inconsistent data formatting is not an issue with the data itself, but with how different sets of data are presented. The most common form of discrepancies comes in ordering data and in data frequency. Historical pricing data is typically recorded daily; however, some sources of data display the most recent trade date on top, while other sources list the information in an ascending-date format. Other data fields may be produced or provided on a weekly, monthly, or yearly basis. When aggregating source data to set up a model, typically a number of different sources of data are necessary, and organizing the data, even when all the data sets are of high quality, can be difficult. Most formatting issues can be handled by using sorting functions in Excel and by recording macros to implement them automatically. In the Model Builders for this chapter, we will cover techniques to deal with some common data-quality issues.
Many companies increase the number of equity shares they have outstanding through either secondary issuances, stock dividends, or share splits. All of these will influence the value of the shares and need to be handled differently depending on the type of model that is being implemented. Splits are the most simple. In a stock split, shares are exchanged in a defined ratio that does not change the economic position of the shareholder. So, for example, a holder of 10 shares of a $5 stock, after a 2-for-1 stock split, holds 20 shares of a $2.50 stock. The split needs to be accounted for by dividing all price history for the stock previous to the effective date of the split, also known as the 'ex-dividend date', by 2. Typically, most available stock price data sets have already made this change.
Dividends, whether in stock or in cash, reduce the value of a stock after the stock trades ex-dividend. Note that the ex-dividend date is not the date that the dividend 'settles' or is received by the stockholder, it is the date that the stock begins to trade without the dividend included, typically a few days earlier. The component of the price move that is accounted for in the dividend may be included in the analysis or excluded, depending on the goal of the analysis. For a risk analysis, for example, the reduction in equity price due to dividends should be included. A dividend payment will increase the risk for a bondholder because money that could otherwise go to pay debt service has “leaked out” to equity. As a result, there are fewer assets available to make the required payments, and the risk increases accordingly. However, for a price movement analysis such as value at risk, or VaR, price movements due to dividend payments should not be included. A holder of a stock that pays dividends will see a decrease in the nominal value of the stock due to dividends, but the holder will be entitled to this cash outflow, so it is inappropriate to assume that such a payment will increase the holder's loss on the position.
Stock dividends are when corporations grant additional shares of stock to shareholders. These should usually be treated as splits, because effectively additional stock is being issued without any additional economic consideration being received by the holder of the security. However, in companies with complex capital structures and the possibility of meaningful dilution, a stock dividend should be treated as cash dividends with the value of the stock on the day of the dividend (since the new stock could be sold that day).
While accounting for splits and dividends appears to be straightforward, a number of complexities need to be taken into account, especially if historical percentage changes in price are the desired output. The most accurate way to do this is to start with a single share of the stock at the beginning of the time period, and add dividends and extra stocks received (or fractional stocks lost in a reverse split). While this gives the most accurate historical view of the day-to-day returns on a position, it is difficult to implement in a VaR scenario because at a certain point the previously paid dividend stops being part of the basis on which we calculate percentage change.
This is a difficult concept, so we will walk through an example. Let us suppose we held 1 share of stock XYZ that was marked on March 31, 2009, at $4/share. Assume XYZ had previously announced that it would pay a $1/share dividend with an ex-dividend date of April 1, 2009 (the next day). The stock then closed trading on April 1 at a value of $3.12.
If we naively calculated the change in asset value as the difference in the closing prices, we would calculate that the change in price was equal to ($3.12 – $4.00)/($4.00) = –22 percent, a sizable loss! Of course, this isn't an accurate picture of the change in our investment's value, since we have received a $1 dividend (we will not discount the future dividend payment for simplicity). Including the dividend, the value of our investment will be $3.12 + $1.00 = $4.12, which translates to a gain of ($4.12 – $4.00)/($4.00) = 3 percent.
However, imagine that one month later, on April 30, the stock was marked at $2.50 and closes at $2.25. This is a drop of ($2.25 – $2.50)/($2.50) = –10 percent. However, if we are considering the long-term value of the investment and include previous dividend payments by adding $1.00 to the price before and after (since a long-term holder of the investment on April 30 has one stock worth $2.50 and $1.00 of paid dividends), we would say the change in the value of our investment in XYZ, “all-in,” was from $3.50 to $3.25, or about –7.14 percent. Additionally, marking the bonds at $3.25 will cause an inaccurate weighting when we calculate the value at risk in a portfolio, as we will overweight the value of high-dividend stocks compared to low-dividend stocks.
A simpler estimation is to start with the current stock price and to adjust the previous stock value for any splits or dividend payments. We will do this for a VaR implementation in Model Builder 7.1.
Sub correctPrices() Dim stockPrices(), dividendsSplits(), cumDividendsSplits() As Double Dim i, j, numDays, numDiv As Integer 'collect raw stock info Range(“a9780470931226b4”).Select Range(Selection, Selection.End(xlDown)).Select numDays = Selection.Rows.Count ReDim stockPrices(1 To numDays, 2) For i = 1 To numDays stockPrices(i, 1) = Range(“a9780470931226b3”).Offset(i, 0) stockPrices(i, 2) = Range(“a9780470931226b3”).Offset(i, 1) Next i 'collect raw dividend and split info Range(“F4”).Select Range(Selection, Selection.End(xlDown)).Select numDiv = Selection.Rows.Count ReDim dividendsSplits(1 To numDiv, 3) For i = 1 To numDiv dividendsSplits(i, 1) = Range(“F3”).Offset(i, 0) dividendsSplits(i, 2) = Range(“F3”).Offset(i, 1) dividendsSplits(i, 3) = Range(“F3”).Offset(i, 2) / Range(“F3”) .Offset(i, 3) Next i
We are creating two arrays, stockPrices() and dividendsSplits(), which will contain all of the information from the sheet. We use the “Range(Selection, Selection.End(xlDown)).Select” method to select the full set of data, no matter how many days of price data or dividend payments/splits there are.
ReDim cumDividendsSplits(0 To numDays, 2) j = 1 cumDividendsSplits(0, 1) = 0 cumDividendsSplits(0, 2) = 1 For i = 1 To numDays If stockPrices(i, 1) = dividendsSplits(j, 1) Then cumDividendsSplits(i, 2) = cumDividendsSplits(i-1, 2)
* dividendsSplits(j, 3) cumDividendsSplits(i, 1) = cumDividendsSplits(i-1, 1) + dividendsSplits(j, 2) * cumDividendsSplits(i, 2) j = j + 1 Else cumDividendsSplits(i, 2) = cumDividendsSplits(i-1, 2) cumDividendsSplits(i, 1) = cumDividendsSplits(i-1, 1) End If Next i
In this segment of code, we are tracking dividends in column 1 of the array, and splits in column 2. We expand the array to “0 to numDays” (most of our arrays start at 1, current day, and track back as we move down the array) because we want to be able to account for splits or dividends that may have happened today. We then go backwards through the days, and when a date matches the date of a split or dividend (which is tracked in dividendsSplits() column 1, we enter an adjustment. For a split, we take the previous factor and multiply by the inverse of the split. A 3:1 split reduces the factor of the stock before the split to one-third. This is because if we hold one stock today, it is the equivalent to holding one-third of a share before the split.
For the dividends, we sum together the paid amounts on the ex-dividends dates, but we again must adjust for the splits. We multiply the amount of the dividend by the cumulative split factor.
'adjust prices for splits and dividends For i = 1 To numDays stockPrices(i, 2) = (stockPrices(i, 2) * cumDividendsSplits (i, 2))-cumDividendsSplits(i, 1) Next i 'print Range(“K4”).Select Selection.Resize(numDays, 3).Select Selection = stockPrices End Sub
Additional data issues come from illiquid securities. Certain assets, typically securities issued by smaller or closely held entities, do not trade frequently. This can cause analysis problems, in particular when a full set of data is needed to estimate the volatility or the historical VaR of a security.
With illiquid products, one key qualitative decision that an analyst must make is whether the information provided by the market is enough to result in meaningful analysis. The assumption that markets give us useful data about the risk of a security is predicated on the existence of a wide base of investors who are informed about the security and are driving the security's price toward a fair value with some level of efficiency. If this is not the case and stock prices are dominated by the relationships between a few stakeholders, then pricing information is not useful in the determination of risk, and the analyst should treat the security as if there were no pricing data available.
If the determination is made that there is enough data to begin an analysis, there is a question as to how summary data can be obtained from the incomplete data sets. A security that trades only every other day may appear to have a lower volatility than a liquid security, since days with extreme price movements may not have included a trade that is listed in the data set. Additionally, since transaction costs are typically larger for illiquid securities, the net capital received in the sale of an illiquid security may be meaningfully less than the gross selling price.
The more conservative way to create matching datasets is 'Pairwise Deletion'. In this method, all data that is available for only one security is deleted; only the dates when there is data available for both securities do we use price information. This allows the analyst to compare volatilities on an apples-to-apples basis and to calculate correlations between assets, but it has the downside of eliminating much useful data. See Figure 7.2
In B4 enter the most recent date for the liquid security, and in C4 enter the price for that day. In our proxy data, the most recent date is 12/8/2010, and the most recent price for the liquid security is 5.18. Enter the dates and prices in descending order below this. In F4 enter the most recent valuation date for the illiquid security, and in G4 enter the price. In our example, the most recent date is 12/8/2010 and the most recent price is 22.90. Again, enter dates and prices for this security in descending order. Since the illiquid security does not trade every day, there are going to be gaps between dates.
Sub estCorrelation() Dim fullPrices(), illiquidPrices(), fullReturns(), illiquidReturns() As Double Dim i, j, numfullpricedays, numIlliquidDays As Integer 'collect full price info Range(“a9780470931226b4”).Select Range(Selection, Selection.End(xlDown)).Select numfullpricedays = Selection.Rows.Count ReDim fullPrices(1 To numfullpricedays, 2) For i = 1 To numfullpricedays fullPrices(i, 1) = Range(“a9780470931226b3”).Offset(i, 0) fullPrices(i, 2) = Range(“a9780470931226b3”).Offset(i, 1) Next i 'collect illiquid price info Range(“F4”).Select Range(Selection, Selection.End(xlDown)).Select numIlliquidDays = Selection.Rows.Count ReDim illiquidPrices(1 To numIlliquidDays, 5) ReDim fullReturns(1 To numIlliquidDays-1, 0) ReDim illiquidReturns(1 To numIlliquidDays-1, 0) For i = 1 To numIlliquidDays illiquidPrices(i, 1) = Range(“F3”).Offset(i, 0) illiquidPrices(i, 2) = Range(“F3”).Offset(i, 1) Next i
'Only take days where there are prices for both securities j = 1 For i = 1 To numfullpricedays If fullPrices(i, 1) = illiquidPrices(j, 1) Then illiquidPrices(j, 3) = fullPrices(i, 2) If j <> nullliquidDays then j = j + 1 End If End If
'Calculate changes in price For i = 1 To numIlliquidDays-1 illiquidReturns(i, 0) = (illiquidPrices(i, 2)-illiquidPrices (i + 1, 2)) / illiquidPrices(i + 1, 2) fullReturns(i, 0) = (illiquidPrices(i, 3)-illiquidPrices(i + 1, 3)) / illiquidPrices(i + 1, 3) Next i
'print Range(“J11”).Value = Application.WorksheetFunction.Correl(illiquidReturns, fullReturns) Range(“J12”).Value = Application.WorksheetFunction.StDev(fullReturns) Range(“J13”).Value = Application.WorksheetFunction.StDev(illiquidReturns) Range(“L4”).Select Selection.Resize(numIlliquidDays, 4).Select Selection = illiquidPrices
While the previously mentioned process is useful, we are eliminating a lot of very useful data, which is why many modelers use more aggressive data techniques. Imputation is the practice of inserting additional data to complete a data set. It is used in the social sciences via a number of different methods, and not without controversy! When there is relative confidence that the data available is complete (a data completion standard might be that there is price data for 75 percent of the dates in question, or a minimum of 500 holders of the security, or some other measure of liquidity or information), then we will impute data to allow us to use our models on securities with incomplete datasets.
In finance, there are two general methods by which data is imputed: using an expectation-maximization algorithm, and using a partial proxy. We will cover a partial proxy methodology here. This is a practitioner's attempt to make do with an incomplete dataset without running through a processing intensive procedure. We will build off of the estCorrelation() subroutine that we were working with previously.
Sub imputeValues() Dim allPrices(), allReturns(), trueVol(), estLiqVol, trueLiqVol, estIlliqVol, registry(), cholesky1, cholesky2 As Double Dim i, j, numfullpricedays, numIlliquidDays As Integer 'collect full price info Range(“a9780470931226b4”).Select Range(Selection, Selection.End(xlDown)).Select numfullpricedays = Selection.Rows.Count ReDim allPrices(1 To numfullpricedays, 3) ReDim registry(1 To numfullpricedays, 3) ReDim allReturns(1 To numfullpricedays-1, 3) ReDim trueVol(1 To numfullpricedays-1, 1) For i = 1 To numfullpricedays allPrices(i, 1) = Range(“a9780470931226b3”).Offset(i, 0) allPrices(i, 2) = Range(“a9780470931226b3”).Offset(i, 1) Next i 'collect illiquid price info Range(“F4”).Select Range(Selection, Selection.End(xlDown)).Select numIlliquidDays = Selection.Rows.Count j = 1 For i = 1 To numfullpricedays If Range(“F3”).Offset(j, 0) = allPrices(i, 1) Then allPrices(i, 3) = Range(“F3”).Offset(j, 1) registry(i, 1) = i j = j + 1 Else
allPrices(i, 3) = 0 registry(i, 1) = registry(i-1, 1) End If Next i
One important array that is new that we need to track is the registry() array. In this macro, when we are collecting price information in the allPrices() array, we are not deleting liquid security pricing. Instead, we are taking all of the liquid security prices and matching them up with illiquid security prices. Where we do not have illiquid security prices, we are leaving a blank space (specifically, the lines “Else allPrices(I,3) = 0”). When we do not have a price for the illiquid security, we want to track what period the next price will be in. We are tracking this in the registry() array in column 1 (remember, since we are going back in time as we go down the array, period i – 1 is the period after period i.
'Create Returns for Illiquid Security 'Create registry of 'previous price points' For i = 2 To numfullpricedays If registry(numfullpricedays + 1-i, 1) = numfullpricedays + 1-i Then registry(numfullpricedays + 1-i, 2) = registry(numfullpricedays + 1-i, 1) Else registry(numfullpricedays + 1-i, 2) = registry(numfullpricedays + 2-i, 2) End If Next i
'create mean prices for missing dates For i = 1 To numfullpricedays If registry(i, 1) = i Then registry(i, 3) = allPrices(i, 3) Else If registry(i, 2) = 0 Then registry(i, 3) = registry(i-1, 3) Else registry(i, 3) = allPrices(registry(i, 1), 3) + ((i-registry(i, 1)) / (registry(i, 2)-registry (i, 1)) * (allPrices(registry(i, 2), 3)-allPrices (registry(i, 1), 3))) End If End If Next i
The key line in this section of code is repeated here for further analysis:
registry(i, 3) = allPrices(registry(i, 1), 3) + ((i-registry(i, 1)) / (registry(i, 2)-registry(i, 1)) * (allPrices(registry(i, 2), 3)- allPrices(registry(i, 1), 3)))
What we are doing, for dates without a price, is creating a price based on the previous and next dates for which we have illiquid security prices. “Registry(i, 1)” is the next date when a price is available, and the value of that price is obtained by searching for that period in the allPrices() array via the command “allPrices(registry(i, 1), 3)”. Then we will fractionally increase this price so it is linearly moving toward the price at the previous value we have in our dataset, a process known as linear interpolation. So if we have a price in period 6, and the next price is in period 2, we determine the prices for periods 3, 4, and 5 through equations 7.1, 7.2, and 7.3, respectively, which are represented in this code.
In order to use this formula, we must find the volatility of the illiquid security. As discussed previously, analysts usually do this in two ways: by taking the historical volatility of the stock, or by using option prices to estimate future volatility. Unfortunately, we do not have daily historical volatility (if the stock traded daily, we would have no need for this exercise in the first place), and it is likely that there will not be a liquid options market for an illiquid stock. Instead, we will use the limited historical volatility that we determined using the dataset that incorporated pairwise deletion, and then we will take the relationship between the liquid stock's estimated volatility and its true volatility, and use that to calculate a daily volatility for the illiquid stock.
As we do this, we will also create and decompose the correlation matrix for the two securities. Since there are only two securities, we will only have to create a 2 × 2 matrix in I22:J23. In I22 and J23, we enter the number 1, in the cells not on the diagonal (J22 and I23) we enter “=J11” to take the previously estimated correlation. Below this we will create the Cholesky decomposition of this matrix by selecting I26:J27, entering “=Cholesky(I22:J23)” and holding Control and Shift and pressing Enter.
The resulting decomposed matrix in I26:J27 is what we would use to forecast the movements of correlated securities, as we did in Chapter 6. However, here we have a slightly different goal—we want to take the already observed movement of the liquid stock and use that to generate the missing prices for the illiquid security. To do this, we will take the second row of this matrix and make I27 the liquid security's factor. We will assign the variable name “cholesky1”. J27 will be the factor assigned to the illiquid security's uncorrelated draw, which we will call “cholesky2”.
'Collect Vol and correlation info and Estimate Volatility For i = 1 To numfullpricedays-1 allReturns(i, 1) = Range(“a9780470931226b3”).Offset(i, 0) allReturns(i, 2) = (allPrices(i, 2)-allPrices(i + 1, 2)) / allPrices(i + 1, 2) trueVol(i, 1) = (allPrices(i, 2)-allPrices(i + 1, 2)) / allPrices(i + 1, 2) Next i cholesky1 = Range(“I27”) cholesky2 = Range(“J27”) trueLiqVol = Application.WorksheetFunction.StDev(trueVol) estLiqVol = Range(“J12”).Value estIlliqVol = (Range(“J13”).Value * trueLiqVol) / estLiqVol
'add in error For i = 1 To numfullpricedays-1 If registry(i, 1) = i Then allPrices(i, 3) = registry(i, 3) Else allPrices(i, 3) = registry(i, 3) * (1 + estIlliqVol * (cholesky1 * Application.WorksheetFunction.NormDist (allReturns(i, 2), 0, trueLiqVol, True) + cholesky2 * Application.WorksheetFunction.NormSInv(Rnd()))) End If Next i
When we do not have a price, we will start with the mean price and then add on a correlated draw. However, we will not actually create a draw for the liquid security, because we already have a real return for each period. Instead, we use the volatility and the actual return, and we use that to figure out what draw would have resulted in that return. To do this we use the NormDist worksheet function, which is the inverse of the NormSInv function we used heavily in Chapter 6 for our Monte Carlo simulations.
'print Range(“J18”).Value = trueLiqVol Range(“J19”).Value = estIlliqVol Range(“P4”).Select Selection.Resize(numfullpricedays, 4).Select Selection = allPrices End Sub