Chapter 16. Forecasting Portfolio Performance

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

16 Forecasting Portfolio Performance

Chapter Query

Softeck has offered an investment consultancy firm a software package. The company representative assures the investment firm consistent returns since the package uses non-linear tools to predict stock returns. The software has the additional capability of incorporating intra-day prices into its database and provides possible buy and sell points for expected returns supplied as an input. The investment firm has been offered a 2 week trial period and the following portfolios had been suggested by the package. A hypothetical buy/sell scenario was built to examine the feasibility of the package in actual practice.

The following observations have been made during the first two weeks:

Should the consultant firm buy the software package at Rs 3,00,000 with an yearly update fee of Rs 50,000?

Chapter Goal

The chapter introduces non-linear tools available for investors. Theories such as chaos theory, fuzzy logic, heuristics, artificial neural networks, and genetic algorithms are introduced. The applicability and adoption of these models for stock market data are also discussed. Though the actual application in real markets is very scarce, these non-linear tools provide ample scope for the investor to understand and time the market accurately.

The investor can gain an interest into the functioning of these theories so that sophisticated software tools can be understood and interpreted easily.

Market theories such as the Markowitz Model, the CAPM and the APT are based on the fact that the stock market is efficient in one form or the other. In the recent developments in the field of capital market research, doubt has been cast on the measurement of beta and questions have been raised on whether beta actually measures the riskiness of a share. A second group of critics believe that more than linearity, nonlinear dynamics will be able to explain the share price behaviour, and chaos theory is seen as an emerging explanation of this thought. Finally, there are also arguments questioning the rationality behaviour of investors. They suggest that investors are not always rational and that irrational behaviour might also explain the actions of investors.

This led to increasing interest in non-linear dynamics; especially in deterministic chaotic dynamics. This has come about because the frequency of large moves in stock markets is greater than would be expected under a normal distribution. The attractiveness of chaotic dynamics is its ability to generate large movements that appear to be random. Chaotic theory can explain the fluctuations in the economy and financial markets, which appear to be random. As a result, several research papers have tried to prove the existence of chaotic and non-linear behaviour of share prices. The simple chaotic process is the tent map, where points move from 0 to 1 and back to 0. This is illustrated in Figure 16.1. A simplistic movement of share price from a low to the high and back to the low level. The extension of this complexity, according to the chaotic model explains the movement of share prices.

Figure 16.1 Tent map

Chaotic application, to explain the stock return variation, is not a new field and many researches have tried to prove the chaotic behavior of stock prices, for instance, Baumol W., and Benhabib J., (1989), Brock W., and Sayers C., (1988), Hinich M., and Patterson D. (1985), Pemberton J., and Tong H., (1981), Scheinkman J., and Le Baron B., (1984), Schwert G.W., and Seguin P.J., (1990), White H., and Hsieh D.A., (1991).

Many studies have used logistic functions to prove the chaotic behaviour of the stock market. The logistic function can be of the following form: f(x) = mx (1 - x). This function can be used to display the behavior of stock markets. The fixed points of the logistic function are 0 and 1–1/m. The graph off(x) is given in Figure 16.2 for m = 3.3 and the fixed points can be obtained graphically as the points of intersection of the curve f(x) and the line y = x. In the graph, the x and y axis range from 0 to 1.

Figure 16.2 Function y = 3.3 x(1-x)

The fixed points for this graph are 0 and 1–1/3.3 = 0.69697, which is where the lines y = x and y = 3.3x (1- x) cross. This is shown in the graph in Figure 16.3.

Figure 16.3 Fixed points

The behaviour of the iterations depends critically on the value of the parameter m. For functions ranging from m = 1.5 to m = 3.9, Figure 16.4 shows the logistic function behaviour.

The period 2 point of a particular logistic function can be found by solving the quadratic:

y² − (1 + 1/m)y + (1/m + 1/m²) = 0

To find further cycles one can look at the graph, eliminate the known fixed points, and iterate the values of the remaining points where the two lines cross. Successive iteration will be attracted to the cycle so long as the starting value is in or near the cycle. An example of this is shown in Figure 16.5, in the graph of the 4^th iteration where m = 3.5.

Figure 16.4 Function plots for different values of m

Figure 16.5 Mathematical market price replication

This model can be made use of by investors in forecasting the security/portfolio returns. The use and application of computer technology to help investors make meaningful and profitable opportunities from the market is unbound. Computer technology helps investors to apply individual trading strategies that fit specific risk-return characteristics. Computer technology has made possible the consideration of a large number of factors that influence share prices. Other theories such as artificial neural networks, fuzzy logic, and heuristics, including genetic algorithms, can be applied successfully in stock market investment decisions.

ARTIFICIAL NEURAL NETWORKS

Artificial neural networks are information processing models that attempt to mimic the human brain’s processesing of information. Neural networks utilise a distributed processing approach, in which many processing elements, or neurons, communicate through a network of interconnected links with associated network weights. Information that is stored in the network is represented as a pattern of variable weights and information is incorporated into the learning process by making changes to these weights. Neural networks are trained to behave in a desired way. Similar to human learning, neural networks are capable of learning different types of behaviours by being exposed to examples of those behaviour. Neural networks are also capable of generalising to previously unseen but related behaviour.

Neural networks are capable of solving problems by learning mathematical models. Data represented numerically can be used as inputs into a neural network. Therefore, technical and fundamental data related to a specific share, as well as related market information, can be incorporated as inputs into neural networks.

Various aspects of neural network development for financial forecasting require an understanding of the following topics:

Paradigms
Architecture
Fact selection
Training and testing

Paradigms

A security selection problem involves prediction. Two of the most widely used paradigms in financial analysis and forecasting are recurrent back-propagation networks and feed-forward back-propagation networks.

Recurrent Back-Propagation Networks

Recurrent back-propagation networks are useful for forecasting time-series data. This type of network consists of a single functional layer of neurons that are fully connected to themselves through a time delay. Figure 16.6 shows a two layer representation of the network architecture. Each neuron in the first layer is fully connected to each neuron in the second layer. The neurons in the second layer feed back with a one-to-one mapping into the first layer. The second layer represents a time delay of data through the network. This type of architecture helps the network to learn chronological relationships.

Since the network can feed back upon itself, information is learned as a result of the sequential order in which the facts are presented. There is no need to encode the data when supplied as input. This does not necessitate the need for preprocessing steps associated with designing feed-forward back-propagation networks.

Hidden Layers In a recurrent network the layers pass on the information on a one (many) to one (many) basis. The hidden layers on the other hand serve as a circular reference tool. The hidden layer passes information unto itself and is then connected to the next layer. In a way it acts as a pseudo layer. Hidden layers can be expressed through Figure 16.7.

The hidden layers, separating the input and the output layers, are so named because they are not directly accessible to the network’s user. This means that each neuron in the input layer has a connection to each neuron in the hidden layer, with similar connections between the neurons in the hidden and output layers. Hidden layers are used in feed forward and recurrent networks.

Figure 16.6 Recurrent back-propagation network

Figure 16.7 Hidden layer

Feed-forward Back-propagation Networks

A feed-forward network that trains through back-propagation of error in a multi-layered network is referred to as a back-propagation or a back-prop network. This is the most popular network paradigm for financial market analysis. An illustrative back-prop network architecture is shown in Figure 16.8. The primary difference between feed-forward and recurrent back-prop networks is that a feed-forward network is generally not designed to directly handle time series. Hence, time must be encoded into facts when given to the network. To accomplish this, a data processing is done to convert time-series data into the requisite format for training a feed-forward back-prop network.

Figure 16.8 Feed-forward back-prop network architecture

For instance, to present facts to the network that contain the differences in the daily close for the past six days, a fact set must be created by constructing an input vector containing six values (close price difference) and an output for the next week. This would be created for each fact-week to be presented to the network, thus encoding the time series data into the facts themselves.

This data, in the case of a recurrent net, would be presented sequentially as a single difference. For every recurrent network there is a corresponding feed-forward network that is designed with identical behaviour, hence, a feed-forward back-propagation network is mostly used in forecasts of share prices.

Feed forward network learns by being given examples of inputs and expected outputs. To arrive at the expected output the network computes an error measure between its generated output and the desired output. This is computed for each output in the output layer. The error is then averaged over the entire set of facts, and propagated backward through the network, layer-by-layer, to be used in altering the weight connections between neurons.

The errors are fed-forward and facts are repeatedly presented to the network until the error associated with the set of outputs is reduced to an acceptable level. Error can be measured through error sum of squares, root mean square error and so on.

Architecture

Neural network architecture decides the transfer function that is to be used between layers, the number of inputs that must be used in the network, the hidden layers that can be incorporated into the network, and the output format of the neural network.

Back-prop networks are constituted with an input layer, one or more hidden layers, and an output layer. For each individual neuron, input data (X1-Xn) is multiplied by the weight (V1-Vn) associated with the connection to the neuron. The products are summed and the result is then passed through a transfer function that converts the sum to a value in a specified interval, for instance, between zero and one. The output from each neuron is then multiplied by another weight (W1-Wn) and fed into the next neuron. If the next neuron is in the output layer, as shown in Figure 16.9, then the output is compared with expected output to measure the error level.

Figure 16.9 Network architecture

The transfer function connects an individual neuron’s inputs to an output. The neuron’s input signals are multiplied by their respective weights, summed, and then passed through the transfer function to an output. In general, the transfer function for a back-prop network is a non-linear function. This enables the network to perform the non-linear statistical modelling, to forecast prices in the financial markets.

The commonly used non-linear transfer functions are the logistic function and the tan h function, also known as the hyperbolic tangent function. Both functions are very similar, being of sigmoidal shape. The logistic function varies in height from zero to one, whereas the tan h function ranges from minus one to one. Figure 16.10 shows the hyperbolic tangent function.

Another decision is the number of layers and neurons per layer in the network that must be selected for the neural network. For instance, in order to predict the change in the price for a share based on say three criteria, the net could be given three input neurons (26 day MA, 12 day MA and 9 day MA) and one output neuron.

For predicting prices of a share, the network would require at least one hidden layer. There are no standard rules for determining the best number of hidden layers in a back-propagation network. While a back-prop net can have more than one hidden layer, at least one is theoretically necessary to approximate any non-linear function’s input-to-output mapping. Usually, trial and error process helps in determining the best configuration for a specific network. Typically, more complex problems require a larger number of hidden layers. However, too many hidden layers may also cause a network to be over fitted to the training data, with poor performance on forecast sets.

Figure 16.10 Hyperbolic tangent function

A neural network has to focus on defined outputs and the more clarity it has in the output structure, the more easily it can be trained. For instance, instead of designing one share forecast network with two outputs, the next month’s high and the next month’s low, it may be preferable to design two independent networks, each with its own output.

The performance of neural networks depends on other aspects such as data selection and quality, data preprocessing techniques, optimisation of training parameters, and testing protocols.

Data inputs can vary according to the investment strategy used by the investor. A market theory such as CAPM or APT can be used for determining the input variables. Similarly, either fundamental analysis or technical analysis can be used as input sources. A neural network incorporating all these and other aspects, such as market sentiment, can also be designed.

Data selection identifies appropriate input data sources for a network. Data selection must be performed judiciously to avoid the inclusion of irrelevant information. A neural network’s performance is dependent on the quality and relevance of its input data. Failure to include relevant data inputs can pull down the network’s performance.

Market condition and perspective on markets influences the selection of input data. Technical analysis would suggest the use of only market price data as inputs. Fundamental analysis focuses on data that reflects macroeconomic factors, such as economic reports that have an effect on the financial market. Neither approach by itself might be sufficient for financial forecasting. Instead, a combination of both data, through the use of neural networks, could be used. The result would be a multidimensional quantitative framework that overcomes the shortcomings of one viewpoint. Use of multiple data inputs reflecting a broad range of interdependent and interrelated information content allow patterns and relationships in the data to be recognised often before they become so obvious.

For instance, a neural network designed to predict the next month’s share price would have as its input the index and volume data on market. This can be expanded to include open, low, close, and open interest information. This helps in identifying general patterns and characteristics of the share market. Additionally, related fundamental information, such as the GDP growth rate, can be included as inputs into the network. Other inputs can be identified through the application of various statistical analysis tools that determine correlations between data. Research involving sensitivity analysis, in which data inputs are varied, can be conducted to find the best mix of technical and fundamental data to use as inputs. Additionally, details such as currency rates, interest rates, world market indices and so on, can also be included as input.

To help a neural network produce accurate forecasts, the selected raw input data must be prepro-cessed. Two widely used preprocessing methods are known as transformation and normalisation.

Transformation is used to manipulate one or more raw data inputs to generate a single network input. Normalisation is a transformation that distributes data evenly and scales it into an acceptable range for network usage. Two simple preprocessing methods involve the computation of differences between price points or ratios of inputs. These minimise the required number of input neurons and facilitate learning.

The noise content in actual share price data tends to obscure underlying relationships between input data sources and slows down the training process. Smoothing techniques, such as moving averages, that help reduce the noise entering the network are useful transformations. One obvious disadvantage of smoothing however is that some relevant information may get lost. Additionally, smoothing can turn a leading indicator, such as an oscillator, into a lagging indicator.

Data normalisation is done to ensure that the statistical distribution of values for each net input and output is roughly uniform. If this is not done, and instead an input with a normal distribution and a small variance is used, then the net will only identify a small number of occurrences of facts away from the central tendency. Such a net will not perform well on actual data in the future. The input data hence, should also be scaled to match the range of the input neurons.

Hence, in addition to data transformations performed on network inputs, data input should be normalised. Normalisation methods may be linear scaling, statistical normalisation, or exponential normalisation.

A simple method of normalisation is a linear method of scaling data. Linear scaling requires that the minimum and maximum values associated with the facts for a single data input be found (Vmin and Vmax). Additionally, the input range required for the network must be determined. If it is assumed that the input range is from Imin to Imax, then the formula for transforming each data value X to an input value I is: I = Imin + (Imax - Imin)*(X - Vmin)/(Vmax - Vmin). This method of normalisation scales input data into the appropriate range.

The second normalisation method (statistical normalisation) uses a statistical measure of central tendency and variance to help remove outliers, and spread out the distribution of the data. Doing this tends to increase uniformity. This, too is a relatively simple method of normalisation. First, the mean and standard deviation for the input data are determined. Vmin is then set to the mean minus desired number of standard deviations. For instance, if the mean is 50, the standard deviation is 20, and two standard deviations are chosen, then the Vmin value would be 10 (50–2*20). Vmax is conversely 90 (mean plus two standard deviations). All data values less than Vmin are set to Vmin, while all data values greater than Vmax are set to Vmax. A linear scaling is then performed on the data. By reassigning the ends of the distribution in this manner, outliers are removed, causing the data to be more uniformly distributed. Assuming a normal distribution, two standard deviations would result in about 95 per cent of the data being unchanged, while three standard deviations would leave about 99 per cent data unchanged.

In order to normalise non-linear data an “exponential normalisation” routine can be used. In this method, normalised log ratios are calculated by subtracting the value for the fitted line from the observed log ratio for each value.

Due to the normalisation prodedure, the output produced also needs to be denormalised. Hence normalisation should be reversible with little or no loss in accuracy. Normalisation methods that remove/ recode outlier values are sometimes not possible to reverse. For instance, assume that during training all output values greater than 90 are assigned a value of 90 regardless of their original value. Then, during testing, if the net produces an output of 90, this simply indicates that the net’s output is 90 or more. If this level of detail is acceptable for a market situation, then the normalisation method used is acceptable. In other words, certain oscillators use the 80 per cent and 20 per cent level as the upper and lower bounds, respectively. Here such normalisation procedures will not affect the investment decision though recoding has been done on the original data and, hence, original data structure has been lost.

Transformation and normalisation are used to help improve a network’s performance. After the network architecture has been selected and the raw data inputs have been chosen and preprocessed, fact sets must be created.

Fact Selection

A fact as stated earlier, is a single input vector with its associated output vector. A fact is represented as a row of numbers where the first n numbers correspond to the n network inputs and the last m numbers correspond to the m network outputs. For instance, assume that a network has been designed to predict the change in the price of the market index. The network can be built on the differences in both the highs and the lows for each month and a 20 day moving average of the closing share index. Each fact would be composed of a three-valued input vector high, low, moving average and a single-valued output vector of market index. The three input values would correspond to the differences in the highs and differences in the lows for a month and a 20 day moving average of the closing index. The single-valued output vector would represent the change in the index over the next month.

The decision about what data to include in a fact set is important since facts which best represent the problem space that the neural network is to model should be selected. For instance, the data set must consider a bullish phase and a bearish phase. When the input data chosen does not span a long enough time frame, the likelihood of overlooking significant market characteristics or including too few examples of them increases considerably. For instance, if a time duration that does not include a bear market is chosen as input data, the network model might not be able to predict a stock market accurately in bearish market periods.

Since bearish market data is not supported by fact sets, the network may not be able to learn how to recognise it in future. Its absence in the fact set might cause a bias that could reduce the overall accuracy of the system during actual trading periods. Data availability and sufficient representation of various market conditions are important considerations in decisions regarding neural network fact selection. Hence, selection of data for a fact set must be decided judiciously, also keeping in mind data from a period does not represent an extremely infrequent occurrence.

Training and Testing Fact Sets

The fact set is divided into two subsets, one for training and the other for testing. Back-propagation networks operate in two modes: learning or training mode, and the recall or testing mode. In the learning mode, the network modifies the values of its interconnection weights between neurons to adapt its internal representation, in an effort to improve the mapping of inputs to outputs. In the testing mode the network is given new inputs and uses the representation it had previously learned to generate associated outputs without changing the weights. Since neural networks operate in these two modes, facts should be separated into at least two subsets: the training set and the testing set. The training set’s facts are used during the network’s learning mode, while the testing set’s facts are used during the network’s recall mode.

Care must be taken to determine the composition of training and testing sets. First, they should be mutually exclusive, so that a specific fact does not represent both subsets. Thus, if two facts have the same input and output values, one of these facts should be removed from the fact set before it is separated into subsets. For these reasons, tools that automatically split the fact set may not be very helpful.

For instance, in an 80/20 split, automatic data separation tools may place every fifth fact into the test set. If the facts are in chronological order before the split, all data representing one day of the week, such as a Monday or a Friday, could be assigned to the test set, while the rest of the data representing the remaining trading days would be assigned to the training set. Since doing this would adversely affect the network’s performance, the facts should be carefully arranged before splitting them into subsets.

The fact set can be split in such a way that the training and testing subsets will have similar distributions, relative to certain important characteristics such as cyclical movements. Statistical analysis or clustering algorithms can be used for this purpose. A careful analysis of the fact set also allows outliers to be identified and eliminated.

The training process requires a separate training fact set. The first task performed during training is to initialise the weights. Weights change during the training phase, as the network adapts its internal representation to model a financial market. Initially, relatively small random weights are used in the network. Since initial weights can affect the performance of networks, training can also be done with specific sets of initial weights. Changing the weights at each stage is done to reduce output errors. Neural network training is represented in Figure 16.11. When the actual is lesser than the desired, the weights (A, B, C, D) are changed so that the actual exceeds and finally comes near the desired output.

The learning process is based on error information propagated backward throughout the network from the output layer. One cycle of presenting all facts to the network is commonly referred to as an epoch. With each change in the weights, the network takes a step on a multidimensional surface, which is a representation of the overall error space. During training the network searches is this multidimensional surface to find the lowest point, or minimum error. Weight changes are proportional to a training parameter called the learning rate.

Figure 16.11 Neural network training

Initially a learning rate should be selected, which does not result in oscillation. A simple example of oscillation, is that a network’s current weight place it halfway down a valley on a two dimensional error surface, as depicted in Figure 16.12. If the learning rate is too large, the network’s next step may place it on the other side of the valley, as opposed to moving it closer toward the bottom. The following step may return it to the original side. Due to this oscillation, the network tends to bounce back and forth from one side of the valley to the other without moving towards the bottom where the minimum error is found. On the other hand, if the learning rate is too small, implying that the steps the network takes are very small, it could take could take a long time to reach the solution.

Figure 16.12 Learning rate (through reduction in oscillation)

Since each financial forecast has its own unique error surface, it is necessary to vary the learning rate to find the best balance between training time and overall error reduction.

Neural network training can be accomplished using different methods. Simulated annealing is one of them.

This method simulates the annealing process by including a step process that directly affects the learning rate. The step begins relatively high. This allows the network to move quickly over the error surface. The step then decreases as training progresses. Learning slows down and the network identifies a near optimum solution. The use of simulated annealing also reduces the likelihood of oscillation. Figure 16.13 depicts a two-dimensional example of simulated annealing, in which the step size is reduced to avoid oscillation while finding a minimum point on the error surface.

In network training, one of the major pitfalls that must be avoided is over training. This is similar to the problem of over optimising rule based trading systems. Over training occurs when a network has learned not just the basic mapping associated with the input and output data presented to it, but also the subtle changes and even the errors specific to the training set (See Figure 16.14 and Figure 16.15). An overtrained network performs very well on the training set by simply memorising it, but performs poorly on out-of-sample test data and subsequently during actual trading since the network is unable to generalise to the new data.

Over training can be avoided using automated training/testing routine in which testing is an integral part of the training process, rather than a procedure that is performed after training is complete. Here, network training is halted periodically at predetermined intervals. The network then operates in recall mode on the test set to evaluate the network’s performance on selected error criteria. Thereafter, training is resumed from the point at which it was halted. This alternating process continues iteratively, with interim results that meet the error criteria being retained for later analysis. When the performance on the test set starts to show higher error levels, it can be assumed that the network is beginning to over train. The best saved network configurations up to this point will then be used as the network model.

Figure 16.13 Simulated annealing

Figure 16.14 Normal training

Figure 16.15 Over training

There are numerous measures to evaluate a network’s performance on test data. For instance, assume that a network has been designed to predict the close for the next day. One error measure might be the difference between the actual close and the network’s output. This value would be determined for each fact in the test set, and summed and divided by the number of facts in the test set. This is the average error measure.

This method has a disadvantage since the positive errors could cancel the negative errors. Another measure is average absolute error, in which the absolute value of the error for each fact in the test set is summed and then divided by the number of facts in the test set.

Other error metrics based on the distance from the target value include sum-of-squares error and root mean squared (RMS) error. The sum-of-squares error is computed by squaring the error for each fact and then summing those squared errors over the entire test set. The RMS error is the square root of the average of the squared errors. The RMS and sum-of-squares error measures weight larger errors more heavily than the average absolute error. Other measures can be used to calculate how often the network predicts a movement in the right direction, or how well network predictions match the shape of the actual price movement over the same time period.

Additionally, if a neural network is developed to generate trading signals rather than make price predictions, criteria such as net profit and per cent profitable trades can be used as error measures.

Performance expectations for any type of financial forecasting application depend on one’s perspective concerning the underlying dynamics of the market. For instance, if a neural net is designed to forecast a completely random time series, then large prediction errors should be expected since such a time series is unpredictable.

A stock market could be driven by both random and deterministic factors. Most of the times, only the deterministic component is predictable. Certainly no investor expects to achieve zero error since this would require a model that could account for every possible variable affecting the markets. On the other hand, just because something is currently unpredictable it need not also necessarily indicate random movements.

Neural networks are used to devise overall trading strategy for the investors. It can pinpoint the timing of trading and increase the value for investors.

Neural networks can be used to implement information technology systems that generate forecasts, such as price forecasts, predictions about market direction or turning points, or predictions of risk/return for various assets over a specific time period. In this context, the forecasted information can be used alone, or in conjunction with other available information. Information systems can be made up of a single neural network or a multi-network system.

In multi-network systems, by designing each network to include just one output, large networks can be integrated and are not needed to perform all the forecasts. Instead, predictions derived from networks at the primary level of the hierarchy are incorporated as inputs into a network, at the secondary level. This kind of hierarchical architecture facilitates faster training, since all networks at the primary level of the hierarchy can be trained simultaneously as each network focuses only on a single output.

Neural network generated trend forecasts can be used to reduce the lag associated with traditional moving average methods. For instance, instead of calculating the value for today’s moving average, the forecasted moving average value for two to four days in the future can be used in the neural network system. This reduces the lag, since the future moving average is a prediction of its value at a point in time in the future, and essentially not a calculated value as at present.

Neural networks can also developed to suit a specific investment style. Neural networks can be trained to forecast trading signals that could be consistent with the investor’s style, risk propensity, investment time horizon, and market capitalisation.

Neural networks can be used to build hybrid trading systems. The neural network could generate predictive information that could be used along with a set of rules that generate trading signals. This approach combines an information system on the front end with a rule based system on the back end. The rule based portion of the system could range from relatively simple mathematical models to sophisticated expert systems.

Neural networks can also be used to minimise diversifiable risk in a portfolio comprising of various global asset classes, by determining the non-linear relationships and correlations between these asset classes, and forecasting the risk and return for each asset class, including derivative instruments, over various time frames.

By using neural networks, the portfolio can be rebalanced to provide higher returns for equivalent risk, or lower risk for equivalent returns. Other technologies, such as expert systems, and genetic algorithms, can be used to identify investor characteristics, thus improving performance even further.

FUZZY THEORY

Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth, wherein truth is evaluated as a value that ranges between “completely true” and “completely false”. It was introduced by Dr Lotfi Zadeh, in the 1960’s, as a means to model the uncertainty of natural language.

Fuzzy Subsets

The mapping of the sets and subsets is represented as a set of ordered pairs, with exactly one ordered pair present for each element of S (universe). The first element of the ordered pair is an element of the set S, and the second element is an element of the fuzzy subset {0,1}. Here, the value zero is used to represent non-membership, and the value one is used to represent membership. The values in between are used to represent intermediate degrees of membership. Similarly, the truth or falsity of the statement x in a universe is determined by finding the ordered pair whose first element is x. The statement is true if the second element of the ordered pair is 1, and the statement is false if it is 0.

The set (S) is referred to as the universe for the fuzzy subset (F). Frequently, the mapping is described as a membership function of F. The degree to which the statement x is in subset (F) is true is determined by finding the ordered pair whose first element is x. The degree of truth of the statement is the second element of the ordered pair. Assume the securities and their “high” returns as an illustration for the fuzzy concept application in investment decisions. In this case the set S (universe) is the set of securities. A fuzzy subset is defined as “high”, ie, the degree to which a security’s return is high. This variable “high” is described as a linguistic variable that represents the investors’ cognitive category of “high”. To each security in the universe, the fuzzy theory assigns a degree of membership in the fuzzy subset “high”. The following membership function, based on the security’s return, explains this.

high(x) = { 0, if return(x) < 5%, (return(x)-5%)/12%, if 5% <= return(x) <= 17%, 1, if return(x) > 17%. } A graph of this is similar to the one in Figure 16.16.

Based on the above function illustrations of determining the degree of ‘High’ is given in the following table.

Besides one criteria, a membership function can consider several criteria. In the security selection example, a membership function can be built on both “high” returns and “low” risk. This is perfectly desired by investors, and occasionally used in practice. It’s referred to as a two dimensional membership function, or a “fuzzy relation”. It is also possible to have even more criteria, or to have the membership function depend on elements from two completely different universes.

Figure 16.16 Membership function

The standard definitions in fuzzy logic are:

truth (not x) = 1.0 – truth (x)

truth (x and y) = minimum (truth(x), truth(y))

truth (x or y) = maximum (truth(x), truth(y))

Assume, for example, the fuzzy subset “low” risk is defined by the following membership function:

low(x) = {0, if risk(x) < 18%, (risk(x)−18%)/42%, if 18% <= risk(x) <= 60% 1, if risk(x) > 60%}

Also let us define the probable situations for a security X as:

a = X is high and X is low

b = X is high or X is low

c = not (X is high)

We can tabulate the values as shown in the following table.

These fuzzy numbers are fuzzy subsets of the real data. They have a peak or plateau with membership grade 1, over which the members of the universe are completely in the set. The membership function is increasing towards the peak and decreasing away from it. Fuzzy numbers are used very widely in fuzzy control applications. A typical case is the triangular fuzzy number which is given in Figure 16.17.

Figure 16.17 Fuzzy movement

Slope and trapezoidal functions, exponential curves, similar to Gaussian probability densities are also used. The membership values determination methods could be stated as follows:

Subjective evaluation and elicitation: As fuzzy sets model people’s cognitive states, they can be determined from either simple or sophisticated response elicitation procedures. Here, investors simply draw or otherwise specify different membership curves appropriate to a given market scenario.
Ad-hoc forms: Even when there is a vast array of possible membership function forms, selection on ad-hoc basis draw from a very small set of different curves. For instance, simple forms of fuzzy numbers such as choosing just the central value and the slope on either side.
Converted frequencies or probabilities: Here, information in the form of frequency histograms or other probability curves are used as the basis to construct a membership function. There are a variety of possible conversion methods, each with its own mathematical and methodological strengths and weaknesses.
Physical measurement: Fuzzy logic may apply physical measurement of grades, but may not obtain the membership grade directly. Instead, a membership function is provided by another method, and then the physical measurement of grades of data are calculated from it.

BEHAVIOURAL MODELS

While there are a number of such new financial techniques and tools available for investors to quantify their investment decisions, there is no one number or single solution that will work wonders in the secondary market. Mostly due to the fact, behaviour of individuals itself is chaotic, and because the study of individual behaviour in the share market is next to impossible.

Prospect Theory

There are a number of studies in the field of behavioural psychology that strongly suggest why investing in shares is uncomfortable. Work in this field encompasses two broad areas. The first is called Prospect Theory, which reveals how people react to judgments about uncertain events when the probability of events is defined. The other is called heuristics. Studies in this area investigate how people process information about the future to try to establish a range of possibilities for likely events. The remarkable insights that emerge from this work are that both of these cognitive processes, establish a sense of the probability of future events and, reacting to them, reveal systematic biases or distortions.

Secondary market investments, dominated by our orientation towards future events, is subject to our emotions and therefore subject to cognitive biases also. The psychology of the stock market is based on how investors form judgments about uncertain events and how they react to these judgments.

One of the most important findings of Prospect Theory is that people do not react consistently when faced with risk. Instead, they exhibit certain systematic biases. If given two options, either to accept Rs 3,000 right now with 100 per cent certainty or accept the possibility of receiving Rs 4,000 with an 80 per cent probability of success, people tend to choose the former. This is obvious, since it confirms to the traditional feelings about investment risk aversion. When seeking gains, people require more gain to compensate for an increase of risk.

In the first instance the expected gain is (3,000 * 1), ie, 3,000, while in the latter case the expected gain is (4000 * .8), ie, 3,200. Though the expected gain is higher in the second instance, people tend to avoid it because the compensatory gain, ie, Rs 200 is very little.If the options given to the investors are a sure loss of Rs 3,000 with 100 per cent certainty and a loss of Rs 4,000 with 80 per cent probability, the majority would show an inclination towards the latter choice, ie, a loss of Rs 4,000 with 80 per cent probability. The expected gain in these instances are (−3,000 * 1), ie, −3,000 and (−4,000 * .8) −3,200. Here, people tend to choose a higher loss expectancy, thus choosing a risky option. In general investors seek risk to avoid loss yet avoid risk when seeking gains.

Investors’ tolerance for loss is very small as compared to the pleasure of gains. The central tenet of Prospect Theory, to explain this phenomenon, is that individuals evaluate potential outcomes not from the perspective of total wealth but rather from the perspective of change in wealth from the present. Kahneman D., and Tversky A., (1979), Tversky A., and Kahneman D., (1981), Sitkin S.B., (1992), Thaler R.H., and Johnson L.E., (1990), and Sitkin S.B., and Patto A.L., (1992) provide further evidence for these dynamic effects on risk taking. This attitude itself can lead to inefficiencies in the stock market. The implication is that prospect theory explains, in part, the behaviour of investors when they tend to purchase a share at a higher price assuming the immediate gains would be more and sell a share immediately if they feel the future loss is high.

Heuristics Theory

Heuristics enable comparison of a number of alternatives and outcomes simultaneously. Research results though unable to explain any uniform decision heurisitic of individuals, indicate that most of the heuristics, including some which ‘ignored’ probability information, regularly selected alternatives with the highest expected values and almost never selected alternatives with lowest expected values. Thorngate W., (1980) and Kleinmutz D.N., (1985) provide evidence in support of this.

Genetic Algorithms (GAs)

Genetic algorithms do not require the assignment of a priori probabilities. Rather, they determine the probability distributions implicitly by evolving a population of solutions over time. They use simple mechanisms analogous to those used in genetics to breed populations of superior solutions.

Genetic algorithms are highly recommended as a general search method for forecasting. They do not require any special initial conditions, and make no requirements on the smoothness of the data inputs. They are a very general class of optimisation algorithms that are quite robust and widely applicable.

Genetic algorithms can also be used to train a neural network by evolving populations of weight matrices. In this case, back-propagation of errors is not needed. Only the forward-propagation of facts, through the net and subsequent evaluation of the fact errors, is required.

SUMMARY

Forecasting using computerised algorithms and heuristics may help investors to discover new trading rules in the market, which could not be identified using the traditional portfolio methods. Due to advancements in technology, the innumerous stock data along with various factors that influence the price behaviour can be subject to analysis together. Several tools that help explain the erratic movement or sudden disturbance has been tested by researchers on the stock market data.

The Chaotic Theory, the Fuzzy Logic concept, heuristics, genetic algorithms, and artificial neural networks can be built on stock data to forecast portfolio performance. All these applications help in capturing the underlying pattern of movement in stock prices. Hence, there is a scope for increasing market returns by examining price patterns in depth, using advanced technology.

CONCEPTS

• Fuzzy theory	• Chaotic Theory
• Neural network	• Heuristics
• Genetic algorithm	• Behavioural models
• Prospect theory	• Simulated annealing

SHORT QUESTIONS

What is a hybrid trading system?
What are genetic algorithms?
What is simulated annealing?
How is a Fuzzy Theory different from Chaotic Theory?
What are the steps to be undertaken to build a neural network?
What is a learning rate?
What are paradigms?

ESSAY QUESTIONS

What are the general data processing requisites to run a neural network?
Explain how fuzzy theory is relevant to investment decisions?
Explain risk-return behaviour of investors using the Prospect Theory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 16. Forecasting Portfolio Performance

Create new playlist

Sign In

Sign Up

16

Forecasting Portfolio Performance