Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 8
Optimizing Parameters for Machine Learning Models and Decisions in Production

We can define optimization as the act, process, or method that is employed to find the “best” element from a set of alternatives based on the objectives, such as maximizing yield, while satisfying all the constraints. Within the context of a current business problem, this “best” element can be far-reaching: from a system, decision, or technical design. “Problems” can include but are not limited to:

Capital allocation across portfolios to maximize yield
Efficient use of resources to minimize waste
Optimal decision path generation with the least amount of information

For example, to optimize capital across portfolios, the allocation of credit risk mitigants to credit risk exposures under different regulatory regimes can be done with a specific objective in mind. Here, the allocation is optimized in a way that allows a firm to reduce their regulatory capital requirements within the context of a regulatory framework. Here, the regulatory requirements are the constraints within the optimization problem.

In mathematical terms, optimization refers to the act of minimizing or maximizing a value function subjected to constraints as an expression. The function and constraints can be linear or nonlinear. Mathematical optimization is integral and extensively used in AI and machine learning (e.g., minimizing a “loss function” subject to constraints). This is achieved by applying an optimization routine to improve the accuracy of the algorithm based on the provided training data by reducing the error in predictions.¹

Mathematicians have developed optimization algorithms for many decades. For the modern-day risk practitioner, selecting the best algorithm and specification can be a daunting task. The choice of algorithm impacts the processing time, efficiency, and accuracy of the outcome. But before we describe the use of optimization for machine learning, it is best to describe what is meant by “optimizing a value function that is subject to constraints” as mentioned earlier.

Taking a practical risk management example, a commercial bank may want to minimize the variance of returns (the objective function) while keeping their loss rates within risk appetite over t years (the constraints) and optimize the AI or machine learning accordingly.

OPTIMIZATION FOR MACHINE LEARNING

Even though the range of optimization applications in AI and machine learning is wide, two main optimization problems depend on whether the loss function is convex or nonconvex.²

Convex optimization has only one “best” solution, as the assumption is that a single local optimum exists—that is also the global optimum. Such a minimum exists that is determinable using a variety of well-tested and validated methods. Thus, the search area for convex optimization (Figure 8.1) in itself takes on a convex shape, whereby following the negative gradient a local minimum is reached.

Nonconvex optimization has multiple locally optimal solutions for the problem. Historically, most optimization problems in a machine learning context were considered convex; however, in recent years, with the application of neural networks and deep learning, the use of nonconvex optimization methods has grown. These algorithms work in the nonconvex landscape (Figure 8.2) (i.e., where there is more than one local minimum and the optimization of these is to efficiently find the local minima, because in this case a global minimum is impossible to prove).

Of course, there are also times when the objective function is concave. These are simply the negative of a convex function where the same principles apply.

When optimizing machine learning, the machine learning fit function is approximated by optimizing the parameters so that the objective function is either minimized or maximized. To do that, there may be times when the input variables are restricted. These restrictions are constraints. Constraints define the feasible space over which the objective function is optimized. If the constraint requires a constant change applied to each variable, then the constraint is an equality constraint that decomposes to a linear constraint. Conversely, if the constraint is unequal for variables and can be achieved unilaterally, then the constraint is an inequality constraint. When applying these concepts together for machine learning optimization we get the following:

Schematic illustration of convex optimization. The search area for a convex optimization takes on a convex shape where the objective function finds a local minima that is itself a global minimum. — **Figure 8.1** Convex optimization. The search area for a convex optimization takes on a convex shape where the objective function finds a local minima that is itself a global minimum.

Schematic illustration of conconvex optimization. The search area for a nonconvex optimization has more than one local minima. — **Figure 8.2** Nonconvex optimization. The search area for a nonconvex optimization has more than one local minima.

A problem is convex if the objective is convex, equality constraints are linear if present, and all inequality constraints are concave.
A problem is nonconvex if it is not convex. At least one is true: the objective is nonconvex, nonlinear equality constraints are present, or at least one nonlinear inequality constraint is not concave.

Of course, there are myriad ways to solve convex or nonconvex optimization problems. This is because the choice of algorithm to use as the solver depends on the application as well as the analytical method. The intention of the section is not to explain all algorithms available as solvers for convex and nonconvex optimization. By way of an introduction, we'll focus on machine learning function optimization that involves the use of solvers, the values of hyperparameters, as well as model fit optimization using stochastic gradient descent. Here are three types of solvers:

Large-scale nonlinear constrained solvers. Interior-point and active-set algorithms are good examples. These tend to be applied to business models that have nontrivial constraints.
Large-scale unconstrained solvers. Examples include the nonlinear conjugate gradient (CG), limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS), stochastic gradient decent (SGD), and the Adam algorithm. These tend to be applied to machine learning optimization problems where a single loss function is minimized.
Derivative-free optimization solvers. These tend to be very effective at smaller dimensional problems as they require little structure to the objective and constraints to operate. They are often the best fit for tuning hyperparameters.

Hyperparameters broadly fall into two categories: model training parameters that relate to the model architecture (e.g., the number of layers of a neural network and neurons in each layer) and solver parameters such as the learning rate or momentum of an algorithm. The choice of solver and the solver settings will significantly affect the training process.

In the next sections, we will discuss each of the solvers in more detail.

MACHINE LEARNING FUNCTION OPTIMIZATION USING SOLVERS

As defined earlier, there are many solvers to optimize an objective function, and these are dependent on whether the function to solve is convex or nonconvex. In the next section, we detail selected examples of solvers based on whether the objective function is convex or nonconvex.

Solvers for When the Target Objective Function Is Convex

When the objective function is convex, a solver needs to find one global minimum. Some solvers satisfy constraints while optimizing the objective function. Typically, solvers perform this by utilizing either an interior-point method or an active-set method. Interior-point methods start in the interior of the feasible region and use barrier functions to impose constraints. Active-set methods look to find a solution by quickly guessing the set of active constraints.

If the active constraints are known, then inactive constraints may be discarded and a much simpler solution can be efficiently reached. For this reason, despite the run-time of the algorithm being higher (referred to as “worst-case complexity”), active-set methods can outperform their interior-point counterpart. They are often the method of choice when only variable bounds are present (e.g., there are no other general constraints). The method stops when a workable point is found within the feasible region; at such a point, any attempt to further improve the objective will violate at least one constraint. The weights that achieve these balance points are known as dual variables; that is, they are multipliers of the constraint gradients such that the objective gradients equal the scaled sum of the constraint gradients. A Lagrangian function is often used to describe this system mathematically and hence dual variables are often called Lagrange multipliers.

Solvers for the Target Objective Function That Is Nonconvex

When the problem is nonconvex, there are two options to try:

If the problem is large scale, a multistart option can be used, which essentially applies local solvers at a diverse set of (often randomly generated) points and then returns the best. A multistart method tends work best at higher dimensions.
In lower dimensions, where derivative-free methods are used, deployment of sophisticated sampling methods can be performed that conduct global and local searches simultaneously. Because these do not require gradients to perform optimization, they are more generally applied, although convergence to even a local solution is not in all cases guaranteed. Because they make little assumptions on the problem itself, they are often referred to as black-box solvers.³ Note that black-box solvers do not require the nonlinear problem, and its associated aim and constraints, to be continuous or smooth. The algorithms deployed by the black-box solver utilize genetic algorithms or the generating set search algorithm, and these are described in more detail below:⁴
- Genetic algorithms are a family of local search algorithms that search for best solutions by applying principles of natural selection and evolution. Here, first a sample population of random candidates gets selected, and the target objective function evaluated for each member of the first population. Individual members of the iteration that are current are stochastically chosen, and only the members with the best objective values as solutions are chosen. Therefore, they are considered more “fit” than others, and are thus selected for the next iteration. There are times when solutions can pass onto a crossover operator or a mutation operator as part of the next iteration. The search space is defined by variable ranges, and the crossover operation allows the search to move to new areas of the search space (best characteristics of “parent” solutions are combined to create “offspring”—the combination is a new area of the search space).
  Mutation operators randomly vary the solution, so that the search over the solution space performs in a way that prevents the convergence at the local minimum to be premature.
- Generating set search algorithms that search for improved points along a positive-spanning set of the search direction. In two dimensions, this could be north, east, south, west, for example. A positive-spanning set is needed to prove convergence to local optimal points when certain assumptions implicitly hold (such as continuity and smoothness). Such conditions are not needed, however, for the algorithm to be applied. Because they are using a minimal set of samples per iteration compared to a genetic algorithm, they are much better at quickly finding a local solution. Thus, using the genetic algorithm as a global search algorithm paired with a generating set search algorithm to refine the promising points found by the genetic algorithm can lead to a robust and efficient hybrid solver.

TUNING OF PARAMETERS

There are two broad classes of parameters that require tuning during AI and machine learning model development. These can be optimized during model training or external to the model training process. There are typically hyperparameters associated with model-training (e.g., number of layers in a neural network and the number of neurons at each layer) and the solvers used for training (e.g., learning rate, momentum).

Settings for both the model training parameters and solver parameters of hyperparameters can significantly influence the accuracy of the models. Table 8.1 lists examples of parameter entities and hyperparameter attributes across both model training parameters and solver parameters for a selection of algorithms.

Table 8.1 Key Algorithms and Their Associated Parameter Entities, Including Model Training Parameters and Solver Parameters of Hyperparameters.

Algorithm	Parameter entities	Hyperparameters attributes
Linear regression	Coefficients of the linear regression equation include the intercept term (constant) and betas (coefficients for the independent variables). The main estimation methods for the coefficients are ordinary least squares (OLS), simple averaging method (SAM), and stochastic gradient descent (using both the learning rate and epochs) to estimate parameters.	The stepwise regression approach to use can consist of Backward, Elasticent, Forward, Forwardswap, LAR, LASSO, None or Stepwise methods, and what cut-off values for adding/removing terms, such as the significance level for entry when the significance level is used as the select or stop criterion, and the significance level for removal when the significance level is used as the select or stop criterion.
Logistic regression	Coefficients of the logistic regression equation include the intercept term (constant) and the betas (coefficients for the independent variables). The main estimation methods are maximum likelihood or stochastic gradient descent (using both the learning rate and epochs) to estimate the logistic regression algorithm parameters.	Hyperparameters include, for example, the model selection method and use that can consist of Backward, Forward, LASSO, None, or Stepwise methods, the significance levels for entry/removal, and the lassobase regularization parameter for the LASSO methods and the number of steps for the LASSO method.
Quantile regression	Each quantile level, the distinct set of regression coefficients include the intercept and betas (coefficients for the independent variables).	None
General linear model	For the transformed response in terms of the link function and the independent variables, the coefficients are based on a linear relationship: the intercept term (constant) and the betas (coefficients for the independent variables). Maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) estimates the parameters. MLE relies on large-sample approximations.	The stepwise regression approach to use that can consist of Backward, Elasticent, Forward, Forwardswap, LAR, LASSO, None or Stepwise methods, and what cut-off values for adding/removing terms such as the significance level for entry when the significance level is used as the select or stop criterion, and the significance level for removal when the significance level is used as the select or stop criterion.
Decision trees	The parameters of decision trees such as classification and regression trees (CART) are those that change the tree structure from the inside. Recursive partitioning parameters that define the tree splits are internal to the algorithm and thus determine how the tree grows, such as cost complexity and reduced error methods.	Maximum depth Minimum leaf size Interval input bins Grow criterion
Random forests	Parameters include those on how the trees are split internally using the class/interval target criterion to apply, maximum number of branches, and how missing values as attributes are used.	Maximum depth Minimum leaf size Number of interval bins Number of trees In-bag sample proportion Number of inputs per split
Gradient boosting machines	Parameters are either tree-specific, boosting, or miscellaneous. Tree-specific parameters: the minimum number of samples needed for splitting events, the minimum number of samples needed in a terminal leaf, the fraction of the total number of samples needed in a terminal leaf (weight-based), and the maximum number of leaves.	L1 regularization L2 regularization Learning rate Maximum depth Minimum leaf size Number of interval bins Number of inputs per split Number of trees Subsample rate
	Boosting parameter is the number of sequential trees for modeling. Miscellaneous parameters: the loss function used to minimize each split, initialization of the output that is needed when the outcome of another model is used as the initial estimates for a GBM (gradient boosting machine), the random sample number used for seeding to allow for this number to be fixed but this fixing still allows for random samples to be seeded, verbose that allows for different values to be generated as outputs when the model fits, warm start that allows for additional trees to be fitted on previous fits of a model, and presort that automatically selects the data to be presorted to reduce split time.
Neural networks	Parameters include the activation function, which is a mathematical transformation like tanh(x). Bias that is like the intercept in a linear model, and weights of the connections between neurons either in the hidden layers or the output layer. Other parameters vary depending on the type of activation function used—e.g., the slope parameter for negative values of the nonlinear parametric ReLu activation function.	Number of hidden layers Number of neurons in each hidden layer L1 weight decay L2 weight decay Learning rate Annealing rate
Support vector machines	The three classes of parameters are: Regularization that decides how much to penalize based on the misclassification points Those included in the kernel function, such as the kernel types (linear, radial basis function, polynomial or sigmoid) and the kernel coefficient for each of the kernel functions The £-insensitivity value that defines a margin of tolerance with no penalty to errors	Penalty Polynomial degree
Bayesian network	The parameters are dependent on the complexity of the network. This complexity is based on the probability distribution over the n variables—that is, the probability of every combination state that represents the relationship between the variables. The parameters are: Prescreen predictors—specifies whether to prescreen variables using independence tests between the target and each input variable. Use variable selection—specifies whether to select variables using conditional independence tests between the target and each input variable based on the network. Independence test statistic—the statistic used for the independence test significance level that specifies the significance level (p-value) for independence testing using the values from the independence test significance level.	Network structure Maximum parents Parenting method Number of bins

Tuning hyperparameter values is a critical aspect of the model training process. The approach to finding the ideal values for hyperparameters (tuning a model to a particular dataset) has been a manual effort and thus time consuming. For guidance in setting these values, risk modelers often rely on their experience using machine learning. However, even with expertise in machine learning and their hyperparameters, the best settings of these hyperparameters will change significantly with different data; it is often difficult to prescribe the hyperparameter values based on previous experience. The ability to explore alternative configurations in a more guided and automated manner helps reduce the need for manual effort. Below are common approaches used for automated hyperparameter tuning:

Grid search. Each hyperparameter of interest is discretized into a desired set of values. Models are trained and assessed for all combinations of the values across all values for the hyperparameters (as a grid). The values of the hyperparameters are the levels of the grid. The combinations of potential values of the hyperparameters that are searched across each “grid” depend on the levels of the hyperparameters. For example, for each “grid” where all parameters have the same number of levels, the search number of total combinations is calculated using n (levels) raised to the power of k (parameters). The processing for grid search quickly becomes prohibitively expensive. An example to illustrate how expensive this grows is with GBM from Table 8.1.
For the 9 hyperparameters and assuming the same number of levels (say 3), the grid can end up with 19,683 combinations.
Random search. Candidate models are trained and assessed by using random combinations of hyperparameter values. In random search, distributions for the hyperparameters are defined, either uniformly or with a sampling method. Random search runs faster than an exhaustive grid search because a random sample of combinations is selected. It is important to note that with random search, no learning from previously tried hyperparameter values or combinations of values takes place. This means that there is an element of “luck” that takes place for the selection of the individual hyperparameter values and their combinations.
Latin hypercube sampling. This method follows an experimental design in which samples are exactly uniform across each hyperparameter but random in combinations. The approach is more structured than a pure random search and allows for a more uniform sample of each hyperparameter, which can help identify hyperparameter importance even if the best combination is not discovered.
Optimization. Here, the search is based on an objective of minimizing the model validation error, so each “evaluation” from the algorithm's perspective is a full cycle of model training and validation. These methods are designed to make use of fewer evaluations and thus save on computation time. It is important to note that with optimization for hyperparameter search, “learning” occurs. The learning occurs with previously tried configurations that influence the selection of new configurations while minimizing the loss function. The trade-off is that fewer configurations can be evaluated in parallel. Theoretically, with enough machines, grid or random configurations could be evaluated in parallel, but with optimization the parallelization is limited, as iterations are necessary to learn.
Local search optimization. Like “derivative-free optimization,” this refers to a class of tuning algorithms where many different individual algorithms can be applied independently. A specific implementation of local search optimization that is not a standard approach is the hybrid framework for a distributed environment that can help overcome the challenges and expense of hyperparameter optimization.

OTHER OPTIMIZATION ALGORITHMS FOR RISK MODELS

Optimization algorithms exist for specific machine learning models, including those for logistic regression and neural networks. In this section, we list each of these algorithms.

Logistic Regression

Each of the following techniques are for nonlinear optimization that can be encountered with logistic regression algorithms where repeated computation is needed for the optimization criterion, the gradient vector (first-order partial derivative), and some of the Hessian matrix (second-order partial derivatives).

Conjugate-gradient
Double dogleg
Dual quasi-Newton
Nelder-Mead simplex
Newton-Raphson
Newton-Raphson with ridging
Trust-region

Neural Networks

Stochastic gradient decent. The optimization problem of this algorithm is to solve for the weights. This algorithm is a variation of gradient descent in which, instead of calculating the gradient of the loss over all observations to update the weights at each step, a “mini-batch” random sample of observations is used to estimate loss. The sampling occurs without replacement until all observations have been used. The performance of stochastic gradient descent, as with all optimization algorithms, depends on control parameters for which no default values are applicable to all problems. Stochastic gradient descent parameters include control parameters such as:
- A learning rate that controls the step size for selecting new weights
- An adaptive decay rate and an annealing rate to adjust the learning rate for each weight and time
- A momentum parameter to avoid slow oscillations
- A size for sampling a subset of observations

MACHINE LEARNING MODELS AS OPTIMIZATION TOOLS

We have discussed mathematical optimization and how that is used in algorithms. However, and as explained at the beginning of the chapter, optimization refers to an act, process, or method that is employed to make “something” as fully functional or effective as possible based on a current problem. That “something” can be far reaching in risk management—from a system, decision, a technical design, and of course the machine learning algorithms. What follows are examples of the business applications in risk management that require optimization to maximize effectiveness.

Decision Science Optimization Tool to Reduce Credit Decisioning Policy Rules

Although the policy rules that are applied to consumer loan applications represent a latent loss function of the individual credit risk profile, they also reflect the risk appetite of an organization. This means that the policy rules expand and contract to tighten or loosen credit policy over time. They are also needed to either contribute or act as reason code generators to approve or decline loans. However, policy rules are often applied subjectively, which often slows down decision response time.

Furthermore, although policy rules are critical for lending decisions, there are five overlooked aspects of policy rules that create opportunities for optimization:

Univariate. Policy rules are typically applied as univariate rules (IF, AND, OR, WHERE), and require continuous updates for regulatory and other compliance reasons.
Repetitive. To keep up with regulations and compliance, often the rules increase in number due to the creation of new rules. This creates a doubling-up effect across policy categories and leads to unnecessary complexity.
Slow decisions. A plethora of policy rules result in slow decision response times and a lack of clarity on which individual policies or combinations of policies contribute to a decision.
Rationalization. Risk teams may optimize the number of rules by simply ascertaining the frequency of records captured by them (considering the hit rate). Rules with a low hit rate, on average over a specified time, may be removed manually.
Regulatory pressure. Increased demand due to laws governing conduct aspects, and other regulatory mandates means that any effort to reduce policy rules using hit rates are ineffective.

A decision science optimization tool that uses approximation (or function approximate) machine learning to simulate what the final decision will be could:

Enhance optimization of policy rules
Replace rules or combinations of rules
Anticipate rules required to make the final decisions
Better the customer experience by reducing decision time

The decision science optimization tool carefully considers the lending process flow of a retail loan application. Policy rules are often implemented into a decision engine, but some banks run credit models (i.e., scorecards) against applications first and then policy rules to ensure that credit declines are not overridden by policy rules.

Irrespective of the sequence of running a policy rule or credit model on the loan application, decisions made are often a blend of models and policy rules, and these can be adjusted by manual assessors. Manual assessors can override automated decisions. In some cases, the override process prolongs the time to decision and can result in non-take-up of approved applications. Excessive amounts of policy overrides are often associated with too many policy rules and complexity.

The design principles shown in Figure 8.3 are explained below:

Typically, an initial scorecard assesses the credit risk of the application.

Figure 8.3 The design principles of a decision science optimization tool to reduce policy rules. Importantly, the tool should allow for both forces of the decisioning to be assessed—these being both the scorecard and policy rules.
For policy rule optimization, a two-step process is performed:
- First, identify the variables that correlate with the last decision.
- Then, use these to determine how well the categories of policy rules can be approximated.
Policy rule categories with low approximation can either be removed or, if still required, the sub-segments can be modeled using machine learning.

Importantly, the decision science optimization tool could allow for both aspects of the originations decision process, namely the credit models and policy rules, to be analyzed at the same time. The last decision is typically difficult to model because decisions on loan applications are driven by models, policy rules, and manual assessment interventions, and thus are both objective and subjective in nature.

The tool provides machine learning insights, but importantly, these insights can be used to shorten decision time by utilizing final approvals or rejections early in the lending process. Furthermore, the strength of the machine learning can be used to optimize revenue by determining customer segments that are high net worth, have ability to repay, and should not undergo unnecessarily long decisioning times.

Collateral Optimization

Collateral refers to assets, other guarantees, or securities that a counterparty agrees to transfer to the seller that are equal or exceed the receivables and liabilities provided by the seller. Collateral acts to mitigate the risk to the seller of the counterparty not completing its agreement if, for example, the counterparty defaults.⁵ Thus, collateral is the asset that the buyer transfers to the seller as securities.⁶ Below are definitions to help further explain the terms:

Assets refer to anything of value that can be converted to cash in the future. Examples include cash and cash equivalents, bonds, equity, money market instruments, real estate, and cryptocurrencies.
Counterparties are individuals or legal entities, like a business, commercial bank, or government body.

Managing collateral requires multiple decisions that involve complexity, because the management depends on whether the bank is acting as the seller or counterparty (as is the case in the capital markets). At a general level, these decisions include the following:

The type of security.
Value to assign to the underlying asset of the collateral.
Opportunity costs of the collateral—these relate to the underlying asset administration costs, and the counterparty risk.

When there are only a few available assets to be distributed between a handful of exposures, the management of the collateral is quite simple and intuitive, meaning that the management of the collateral can easily be done manually.⁷ When the available assets, exposures, and counterparties grow beyond a few, it is faster to optimize collateral allocation using algorithmic approaches including machine learning. Here, the optimization exercise is to solve which allocation of collateral results in the lowest cost to the bank, but one that meets the agreements made between the seller and the counterparty and margin call requirements of the assets.⁸ As an example, the objective functions of a set of models are listed below:

Cost model. This model minimizes collateral cost as well as funding costs.
Allocation model. This can be done by for example, ranking the assets from lowest to highest by the cost model, or by linear programming.⁹

CONCLUDING REMARKS

The use of AI, machine learning, and advanced analytics for optimization is a rapidly evolving field. In this chapter, machine learning algorithmic optimization was explained. These involve the use of solvers, the automated tuning of hyperparameters, and model training optimization using stochastic gradient descent. In addition, optimization algorithms with embedded machine learning are powerful tools to solve risk-specific business problems. Keep in mind that although powerful, optimization is mathematically complex and may involve intensive compute resources. The good news is that there are automated ways to apply optimization using machine learning that transfer some of the complexity to help streamline decision-making, so that risk departments can focus on the business value that optimization can generate.

ENDNOTES

1. Stephen Boyd, and Lieven Vandenberghe, Convex Optimization, 1st ed. (Stanford, CA: Cambridge University Press, 2004).
2. Boyd and Vandenberghe, Convex Optimization.
3. Ed Hughes, Steve Gardner, Josh Griffin, and Oleg Golovidov, The New solveBlackbox Action in SAS® Optimization 8.5 (SAS Global Forum, 2020). Paper SAS4494-2020. https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4494-2020.pdf
4. SAS Institute Inc., SAS® Optimization 8.3: Mathematical Optimization Procedures (Cary, NC: SAS Institute Inc., 2018).
5. Adriano A. Rampini and S. Viswanathan, “Collateral, risk management, and the distribution.” Journal of Finance 65(6) (2010): 2293–2322.
6. Manmohan. Singh, The Changing Collateral Space (Washington, D.C, USA: International Monetary Fund (IMF), 2013).
7. Akber Datoo, “Collateral – Enforceability, Reform and Optimisation.” In: NA, ed. Legal Data for Banking: Business Optimisation and Regulatory Compliance (Hoboken, NJ: John Wiley & Sons, 2019), 115–154.
8. Ami Arbel, Exploring Interior-point Linear Programming: Algorithms and Software (Cambridge, MA: Foundations of computing. MIT Press, 1993).
9. Alexander Schrijver, Theory of Linear and Integer Programming. 1st ed. (Hoboken, NJ: John Wiley & Sons, 1998).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 8: Optimizing Parameters for Machine Learning Models and Decisions in Production

Create new playlist

Sign In

Sign Up