Machine learning in dynamic environments

Making a prediction in dynamic environments does not always succeed in producing desired outcomes, particularly in complex and unstructured data.

There are several reasons for that. For example, how do you infer a realistic outcome from a bit of data or deal with unstructured and high dimensional data that has been found too tedious? Moreover, model revision with efficient strategies to control the realistic environments is also costly.

Furthermore, sometimes the dimensionality of the input dataset is high. Consequently, data might be too dense or very sparse. In that case, how you deal with very large settings and how to apply the static models in emerging application areas such as robotics, image processing, deep learning, computer vision, or web mining is challenging. On the other hand, ensemble methods are becoming more popular for selecting and combining models from existing models to make the ML model more adaptable. A hierarchical and dynamic environment-based learning is shown in Figure 10:

Machine learning in dynamic environments

Figure 10: The hierarchy of machine learning in a dynamic environment

In this case, ML techniques such as neural networks and statistically-based learning are also becoming popular for their success with numerous applications in industry and research such as biological systems. In particular, classical learning algorithms such as neural networks, decision trees, or vector quantizes are often restricted to purely feedforward settings, and simple vectorial data, instead of dynamic environments. The feature of vectorization often provides a better prediction because of the rich structure. In summary, there are three challenges in developing ML applications in a dynamic environment:

  • How does the data structure emerge to shape in an autonomous environment?
  • How do we deal with input data that is statistically sparse and high dimensional? More specifically, what about making predictive analysis using online algorithms for large-scale datasets, applying the dimensionality reduction and so on?

With only limited reinforcement signals, ill-posed domains, or partially underspecified settings, how do we develop controlled and effective strategies in dynamic environments? Considering these issues and promising advancement in the research, in this section, we will provide some insights into online learning techniques through a statistical and adversarial model. Since learning in a dynamic environment such as streaming will be discussed in Chapter 9, Advanced Machine Learning with Streaming and Graph Data, we will not discuss streaming-based learning in this chapter.

Online learning

Batch learning techniques generate the best predictor by learning on the entire training dataset at once and are often called static learning. Static learning algorithms take batches of training data to train a model, then a prediction is made using the test sample and the found relationship, whereas online learning algorithms take an initial guess model and then pick up a one-one observation from the training population and recalibrate the weights on each input parameter. Data usually becomes available in a sequential order as batches. The sequential data is used to update the best predictor of the outcome at each step as outlined in Figure 11. There are three use cases of online-based learning:

  • Firstly, where it is computationally infeasible to train an ML model over the entire dataset, an online learning is commonly used
  • Secondly, it is also used in a situation where it is necessary for the algorithm to dynamically adapt to new patterns in the data
  • Thirdly, it is used when the data itself is generated as a function of time, for example, the stock price prediction

Online learning, therefore, requires out-of-core algorithms, that is, algorithms that can perform considering the constraints of networks. There are two general modeling strategies that exist for online learning models:

  • Statistical learning models: For example, stochastic gradient descent and perceptron
  • Adversarial models: For example, spam filtering falls into this category, as the adversary will dynamically generate new spam based on the current behavior of the spam detector

Although online and incremental learning techniques are similar, they also differ slightly. In online, it's generally a single pass (epoch=1) or a number of epochs that could be configured, whereas, incremental would mean that you already have a model. No matter how it is built, the model can be mutable by new examples. Also, a combination of online and incremental is often what is required.

Data is being generated in an unprecedented way everywhere, every day. This huge data imposes an enormous challenge to building ML tools that can handle data with high volume, velocity, and veracity. In short, data generated online is also big data. Therefore, we need to know the technique by which to learn about the online learning algorithms that are meant to handle data with such high volume and velocity with limited performance machines.

Online learning

Figure 11: Batch (static) versus online learning, an overview

Statistical learning model

As already outlined, in statistically-based learning models such as stochastic gradient descents (SGD) and artificial neural networks or perceptron, data samples are assumed to be independent of each other. In addition to this, it is also assumed that the dataset is identically distributed as random variables. In other words, they don't adapt with time. Therefore, an ML algorithm has a limited access to the data.

In the field of the statistical learning model there are two interpretations that are considered significant:

  • First interpretation: This considers the stochastic gradient descent method as applied to the problem of minimizing the expected risks. In an infinite stream of data, the predictive analytics is assumed to be drawn from the normal distribution. Therefore only the stochastic gradient descent method is used to bind the deviation. This interpretation is also valid for the finite training set.
  • Second interpretation: This applies to the case of a finite training set and considers the SGD algorithm as an instance of incremental gradient descent method. In this case, one instead looks at the empirical risk: Since the gradients of in the incremental gradient descent, iterations are also stochastic estimates of the gradient of, this interpretation but applied to minimize the empirical risk as opposed to the expected risk. For why multiple passes through the data are readily allowed and actually lead to tighter bounds on the deviations.

Adversarial model

Classical machine learning, which is especially taught in classes, emphasizes a static environment where usually unchanging data is used to make predictions. It is, therefore, formally easier compared to a statistical or causal inference or dynamic environment. On the other hand, finding and solving the learning problem as a game between two players, for example, learner versus data generator in a dynamic environment, is an example of an adversarial model. This kind of modeling and making predictive analytics is critically tedious since the world does not know that you are trying to model it formally.

Furthermore, your model does not have any positive or negative effect on the world. Therefore, the ultimate goal of this kind of model is to minimize losses prevailing from circumstances generated by the move made and played by the other player. The opponent can adapt the data generated based on the output of the learning algorithm in run-time or dynamically. Since no distributional assumptions are made about the data, performing well for the entire sequence that could be viewed ahead of time becomes the ultimate goal. Additionally, regret is to be minimized on the hypothesis at the last pace. According to Cathy O. et al (Weapons of Math Destruction, Cathy O'Neil, and Crown, September 6, 2016) t adversarial-based machine learning can be defined as follows:

Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called relaxing assumptions).

Tip

Up to the Spark 2.0.0 release, there was no formal algorithm implemented in the release. Therefore, we were unable to provide any concrete examples that could be further explained elaborately. Interested readers should check the latest Spark release to understand the updates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset