34
Troubleshooting Optimizers

34.1 Introduction

This chapter provides a collection of perspectives and insights that may help with identifying issues and solving them when applying optimization.

34.2 DV Values Do Not Change

If a DV value does not change as iterations progress, and if the DV keeps the initial value, then this is a signal of any of a number of undesired aspects. It could be that the coding for the optimizer does not include the DV that is not changed. Perhaps the variable name is misspelled. It could be that the OF calculation is independent of that DV (either because coding erroneously omits it from the OF calculation or because it is irrelevant to the OF or because the OF is effectively insensitive to the DV). It could be that the initial value of the DV is on a constraint or in a flat spot. Don’t accept DV* results if any DV does not change with iteration. Something is wrong. Investigate the surface topology. Check the coding, logic, and constraints.

34.3 Multiple DV* Values for the Same OF* Value

Again, there could be several reasons that distinctly different DV* sets give the same OF* value:

  • Discretization: The objective function may have flat spots w.r.t. DV value. This could be the result of an aspect that discretizes the continuum DV to integer or other finite values. Here, small changes in the continuum DV will lead to the same discretized value for the OF calculation, and at the optimum the continuum DV will have a range of values. You can let the optimizer use the continuum values, but report the discretized value.
  • Local insensitivity: The OF may be dependent on the DV in general, but the optimum may be in a region of flat response, no impact of the DV on the OF. This may be the result of conditionals (IF–THEN statements in the OF calculation) or a functionality that has zero local sensitivity. Here, a DV value will change until it enters the local insensitivity region. You should report the range of equivalent DV* values, not one particular set. Further, consider that the local flexibility in the DV value indicates that the application is underspecified, and include an additional criterion in the optimization statement.
  • Multiple equivalent solutions: consider a quadratic equation solution. Two values return the same solution. The OF may have similar features.
  • Underspecified: If there are more DVs than needed, or if DV structure makes some redundant, then many DV combinations will result in the same OF* value. In such cases, consider an additional measure of desirability and include it into the OF, or remove redundant DVs.

34.4 EXE Error

It is not uncommon to encounter a non‐executable computer operation. These include divide by zero, numerical overflow (or underflow, in integers or reals), negative argument values in powers or roots of log calculations, subscript out of range, etc. Overflow errors might be alleviated by switching to long or double precision variables. Similarly subscript range might be alleviated by increasing the dimension on an array. If not, this probably indicates an error in the program coding or the optimizer assigning an excessive DV value. There are billions of situations that could lead to .exe errors. Mostly, fixing them is just classic computer code debugging. Check the coding. Check the optimization concept and its embodiment. Add error trapping lines that look at argument values prior to a calculation and return a “constraint hit” situation to the optimizer.

When using second‐order methods (NR or SQ) on a surface that is linear (planar), the matrix inversion will result in a divide by zero. If the surface is nearly linear, it will result in an ill‐conditioned matrix. If this is the case, change optimizer type.

34.5 Extreme Values

The optimization may send a DV to a limit, perhaps to zero, or to unity, or to ±infinity. Observe the iteration‐to‐iteration DV trend. If the DV seems to be moving to a limit, then reconsider the choices in the OF, place constraints on the DV range, or initialize in the local region of importance (not in a range that sends the optimizer out of bounds).

34.6 DV* Is Dependent on Convergence Threshold

It is, of course. The DV* value is dependent on both the convergence criterion and its threshold. If the convergence threshold is made smaller, the DV* and OF* values will be more precise, more reproducible, and have a smaller variability. If the precision is not adequate, tighten the convergence threshold to get smaller variability. However, don’t expect perfect precision, zero variability in replicate DV* or OF* values from randomized initializations. The optimizations will not stop exactly at the same DV* values. Convergence tests will stop the iterations when the solution is in the proximity of the optimum. So, don’t seek perfect precision. Choose convergence threshold values to balance context‐required precision with computational burden. Understand your application context.

34.7 OF* Is Irreproducible

If replicate trials do not provide the same OF* value, it could be because the surface is stochastic or experimental, or that the convergence criteria is too coarse, or that the region surrounding the OF* value has multi optima (such as due to numerical discretization), or that the function has many local optima.

Noisy: If the OF response is noisy, because it is either calculated from a stochastic function or experimentally obtained, use an optimizer algorithm and a convergence criterion that can handle the stochastic aspect. Don’t report a single value, but acknowledge the uncertainty in the DV* and OF* ranges.

Coarse convergence: Consider reducing variation by tightening the convergence criterion threshold (see discussion in Sections 34.6 and 34.9).

Coarse discretization: If the numerical discretization is large (perhaps as a convenience to reduce computational work), then the discretization striations may be trapping the optimizer. Consider smaller discretization sizes or an alternate optimizer type.

Multi optima: Some test function applications use high frequency trigonometric functions that create a deterministic undulating surface response to the DV. Each local optima is near to the global but traps the optimizer. Multiplayer optimizers seem to be a good solution.

34.8 Concern over Results

S.E.E.: If the OF statement is not comprehensive, or if the mathematical implementation is not true to the concept, then the solution will raise concern; it won’t feel right, and it will contrast intuitive expectations. If so, reconsider the models, the issues, the context, the choices, the optimization statement, the optimizer, and the convergence criterion. Sketch, erase, evaluate, s., e., e., …

This almost seems like my message: “If you don’t get the answer that you want, then change the equations until you get the answer that you want.”

Every stage of trying to define the application and getting the optimization leads to progressive understanding of the application and its context. Let this experience guide both your expectations and the evolution of the details in the application.

Critically test: Seek to create concern. Don’t try to avoid it. Test from randomized initializations and coefficient values in the optimizers—the optimizer should find the same solution. Test over several cycles to ensure all internal variables are appropriately re‐initialized—the first and last solutions should be the same. Test on a variety of options in the function (e.g., ideal approximations, alternate given values, etc.)—trends should be as expected. Test for the impact of uncertainty in the givens—if uncertainty in the results is not acceptable, refine uncertainty issues on the givens.

34.9 CDF Features

The CDF(OF*) graph can provide troubleshooting clues.

Probability of the global: Ideally, the DV values from many optimizer trials lead to the true DV*, with precision, and then the CDF(OF*) graph will make a sharp rise from 0 to 1 at OF*. However, if there are multiple optima, and each is found with precision, then the CDF(OF*) graph will make sharp steps at each local OF* value. Intervals between the CDF values at the steps indicate the probability that each local optima has been found. This knowledge would be useful in assessing confidence that the global best has been found.

Precision: The CDF(OF*) graph, however, will not make sharp steps. Precision of the optimized DV* values, and of the OF* values, depends on the convergence criterion and threshold. If convergence is “loose” (if the threshold value is large), then DV* and OF* variation will be visible. Precision will be poor. In this case the CDF(OF*) graph will not make a sharp step at each transition, but it will make an S‐shaped transition between steps. If the width of the S‐transition is undesirably large, if the OF* values are not precise enough, then make the threshold for convergence a smaller value, or perhaps change the convergence criterion from one based on DVs to one based on the OF.

Note: If there is one OF*, and the scale of the OF* axis on the CDF(OF*) graph is based on the range of the OF* values, then regardless of the threshold in the convergence, the graph will be S shaped. So, consider the range of the resulting OF* values when changing the threshold.

If there is an approximation operation within the OF calculation (such as numerical root finding or a function approximation by a series) that has a convergence test, or if there is time or space discretization in a numerical procedure, then trial‐to‐trial differences in the internal function convergence would cause OF* variation. At each optimum, instead of sharp or S‐shaped steps in the CDF(OF*) graph, the result might appear as a ramp or series of many steps from the vicinity of one OF* value to the next. Consider tightening internal convergence thresholds or making the time or space discretization smaller.

Leading ramp or tail: If the first step on the CDF(OF*) graph has a ramp or tail to the left, then it may be that convergence is being claimed prior to the optimizer reaching the optimum. If the optimizer is hitting a hard constraint or the DV* solution is in a valley with steep sides and gentle bottom slope, or discretization, or loose convergence, then it could be stopping prior to reaching DV*. Consider changing the hard constraint to a soft constraint, replacing the DV with a slack variable, or using an optimizer with a conjugate gradient logic to better accommodate the valley. Alternately, the long valley may be a clue to redundant, or effectively redundant, parameters. See Section 34.10, and consider reducing the number of DVs in the application.

34.10 Parameter Correlation

Ideally, the DV* values end at exactly the true DV* spot, but because convergence criteria ends iterations when the search is close enough, the DV* values end in the close proximity, not exactly on, the true DV* value. Further, if using randomized initializations, the small deviations in DV* from the true value will be random and independent perturbations. If this is so, a plot of the values of one DV* w.r.t. the associated set of another DV* values will reveal a scatterplot. If the axes are scaled to match the convergence criterion on the DVs, then the scatterplot will be a circular “shotgun” pattern. This is what you would hope to find.

Note: This cannot be an observation of DV* values when they end in local or equivalent, but distinctly separate, optima. The reveal of DV* variation must be at the same DV* solution.

Note: If the convergence criterion threshold on one DV is larger than that on another, then the variation in one will be larger than the other, and the shotgun pattern will not be circular, it will be an oval with axes aligned with the DVs. Although there may be a trend in the pattern w.r.t. one variable, there is no correlation, and the ellipse will not be skewed in a joint direction, in an off‐axis direction.

Note: It may take 20–100 independent trials, each ending at the same OF* to have enough data to confidently see the pattern.

Classic linear correlation analysis can reveal whether the parameter perturbations are independent or related. Hopefully, desirably, the DV* values are independent. Hopefully, there is no correlation between DVs.

If multiple DVs, correlation between each pair should be explored. If there are N DVs, then there are N(N − 1)/2 cross correlations to consider. This is easy, since correlation analysis is a common software offering.

If there is parameter correlation in the DV* values, it reveals the presence of any of a number of undesired aspects.

Redundant coefficients: If there are more coefficients than conditions, if the application is underspecified, then the optimizer has an extra degree of freedom. It can assign a value to one DV then find DV* values that end at the same OF* value. In this case there will be a definite relation between DV* values, a locus of points that clearly define a curve. The view of the OF w.r.t. a correlated DV pair will have a valley with a common minimum. This extra degree of freedom means that another issue can be added to the optimization. Reconsider the desirables and undesirables associated with the extremes of the DV* values to help identify the missing condition in the OF.

Effectively redundant coefficients: It may be that the application is in a range in which there is a strong interrelation between DV values. An example is the hyper‐elastic spring model between tension and elongation, (stress, σ, and strain, ε) images. Here the model coefficients A, B seem independent; however, if either the experiments all use small ε values, or if the material is such that the value of B is small, then the product will be small, and the exponential e will be approximately the same as images, making the model effectively images. Here it is seen that coefficients A, B are redundant. If the combined value for the σ(ε) relation is images, then any A, B relation satisfying images is equivalent to any other relation. Such effectively redundant coefficients arise in situation where there is a weak relation of one DV to the data or where the model choice is one step more complicated than the data justifies. In this case the correlation trend between redundant variables will not be a crisply defined line, but a fuzzy trend. The locus of DV* values on a graph of one DV w.r.t. another will reveal a trend, broadened by random variation. The view of the OF surface w.r.t. the two correlated DVs will reveal a valley with steep walls and a relatively gentle slope of the valley bottom. Use this to lead you to reconsider complexity in the model. Probably some simplification, reducing a DV, is appropriate.

Root finding: If calculation of the OF requires a root‐finding procedure and the convergence criterion is too large, then it will find an approximate value. If the resulting value changes the OF response to the DVs, then DV* values will be correlated. Consider tightening the convergence criteria used in root finding, number of terms in a series or function approximation.

Hard constraints: Hard constraints often block an optimizer, and it converges with DV* values that form a pattern on the constraint. Here the locus of points of converged DV* values will be a crisp line, but they will not have the same OF*, and they will be on a constraint. If this is the case, consider converting the hard constraint to a penalty function, or changing the optimizer to one that better handles the hard constraint, or just report the best OF* from the constrained set (if all OF* values are acceptable). In some cases, converting the search DVs to a slack variable can remove the hard constraint.

Ridges, valleys, and striations: Discontinuities in the OF could arise from discretization of time or space, from conditionals that switch from one model or one variable selection to another, or in truncation or discretization of other variables in the OF (such as rounding to the nearest cent). These trap optimizer solutions in local striation features, or when on a ridge make them think that there is no better direction downhill. Consider smoothing the transition between conditionals in a fuzzy logic manner. Consider making numerical discretization smaller. Consider a multiplayer optimizer.

34.11 Multiple Equivalent Solutions

An application may have multiple solutions, each valid, and each with the same OF* value. Consider the solution to a quadratic equation. There are two independent solutions, and each is valid. Consider the desire to find the zero‐gravity points between three large bodies in space. There are two to four locations depending on the size and location of the masses. If such is the case, report that there are two (or more) equivalent solutions. Alternately, consider how stakeholders would view one or the other, and see if there is an additional criteria for desirability that would select one over the other. For instance, the negative root of the quadratic relation may be mathematically valid, but not physically possible or one zero‐gravity point may be less sensitive to spatial location than another, hence more desirable.

34.12 Takeaway

Debugging the initial attempt at defining the application statement and the mathematical formulation is often a long process. Don’t expect to get a new application right the first time. Critically look at many aspects of diverse variables to guide evolution of the application.

34.13 Exercises

  1. Revisit any of the optimization applications that you have explored, and look at the results in light of topics in this chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset