Support vector regression

Most of the applications using support vector machines are related to classification. However, the same technique can be applied to regression problems. Luckily, as with classification, LIBSVM supports two formulations for support vector regression:

  • ∈-VR (sometimes called C-SVR)
  • υ-SVR

For the sake of consistency with the two previous cases, the following test uses the ∈ (or C) formulation of the support vector regression.

An overview

The SVR introduces the concept of error insensitive zone and insensitive error, ε. The insensitive zone defines a range of values around the predictive values, y(x). The penalization component C does not affect the data point {xi,yi} that belongs to the insensitive zone [8:14].

The following diagram illustrates the concept of an error insensitive zone using a single variable feature x and an output y. In the case of a single variable feature, the error insensitive zone is a band of width (ε is known as the insensitive error). The insensitive error plays a similar role to the margin in the SVC.

An overview

The visualization of the support vector regression and insensitive error

For the mathematically inclined, the maximization of the margin for nonlinear models introduces a pair of slack variables. As you may remember, the C-support vector classifiers use a single slack variable. The preceding diagram illustrates the minimization formula.

Note

M9: The ε-SVR formulation is defined as:

An overview

Here, ε is the insensitive error function.

M10: The ε-SVR regression equation is given by:

An overview

Let's reuse the SVM class to evaluate the capability of the SVR, compared to the linear regression (refer to the Ordinary least squares regression section in Chapter 6, Regression and Regularization).

SVR versus linear regression

This test consists of reusing the example on single-variate linear regression (refer to the One-variate linear regression section in Chapter 6, Regression and Regularization). The purpose is to compare the output of the linear regression with the output of the SVR for predicting the value of a stock price or an index. We select the S&P 500 exchange traded fund, SPY, which is a proxy for the S&P 500 index.

The model consists of the following:

  • One labeled output: SPY-adjusted daily closing price
  • One single variable feature set: the index of the trading session (or index of the values SPY)

The implementation follows a familiar pattern:

  1. Define the configuration parameters for the SVR (the C cost/penalty function, GAMMA coefficient for the RBF kernel, EPS for the convergence criteria, and EPSILON for the regression insensitive error).
  2. Extract the labeled data (the SPY price) from the data source (DataSource), which is the Yahoo financials CSV-formatted data file.
  3. Create the linear regression, SingleLinearRegression, with the index of the trading session as the single variable feature and the SPY-adjusted closing price as the labeled output.
  4. Create the observations as a time series of indices, xt.
  5. Instantiate the SVR with the index of trading session as features and the SPY-adjusted closing price as the labeled output.
  6. Run the prediction methods for both SVR and the linear regression and compare the results of the linear regression and SVR, collect.

The code will be as follows:

val path = "resources/data/chap8/SPY.csv"
val C = 12
val GAMMA = 0.3
val EPSILON = 2.5

val config = SVMConfig(new SVRFormulation(C, EPSILON), 
    new RbfKernel(GAMMA)) //45
for {
  price <-  DataSource(path, false, true, 1) get close
  (xt, y) <- getLabeledData(price.size)  //46
  linRg <- SingleLinearRegression[Double](price, y) //47
  svr <- SVM[Double](config, xt, price)
} yield {
  collect(svr, linRg, price)
}

The formulation in the configuration has the SVRFormulation type (line 45). The DataSource class extracts the price of the SPY ETF. The getLabeledData method generates the xt input features and the y labels (or expected values) (line 46):

type LabeledData = (Vector[DblArray], DblVector)
def getLabeledData(numObs: Int): Try[LabeledData ] = Try {
    val y = Vector.tabulate(numObs)(_.toDouble)
    val xt = Vector.tabulate(numObs)(Array[Double](_))
    (xt, y)
}

The single variate linear regression, SingleLinearRegression, is instantiated using the price input and y labels as inputs (line 47).

Finally, the collect method executes the two pfSvr and pfLinr regression partial functions:

def collect(svr: SVM[Double], 
   linr: SingleLinearRegression[Double], price: DblVector){
  
  val pfSvr = svr |>
  val pfLinr = linr |>
  for {
    if( pfSvr.isDefinedAt(n.toDouble))
    x <- pfSvr(n.toDouble) 
    if( pfLin.isDefinedAt(n))
    y <- pfLinr(n)
  } yield  {  ... }
}

Note

isDefinedAt

It is a good practice to validate whether a partial function is defined for a specific value of the argument or not. This preemptive approach allows the developer to select an alternative method or a full function. It is an efficient alternative to catch a MathErr exception.

The results are displayed in the following graph, which are generated using the JFreeChart library. The code to plot the data is omitted because it is not essential to the understanding of the application.

SVR versus linear regression

A comparative plot of linear regression and SVR

The support vector regression provides a more accurate prediction than the linear regression model. You can also observe that the L2 regularization term of the SVR penalizes the data points (the SPY price) with a high deviation from the mean of the price. A lower value of C will increase the L2-norm penalty factor as λ =1/C.

Note

SVR and L2 regularization

You are invited to run the use case with a different value of C to quantify the impact of the L2 regularization on the predictive values of the SVR.

There is no need to compare SVR with the logistic regression, as the logistic regression is a classifier. However, the SVM is related to the logistic regression; the hinge loss in the SVM is similar to the loss in the logistic regression [8:15].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset