Drift

The second challenge is the non-stationary aspect of the streaming data. Data is considered stationary if its statistical attributes such as mean, standard deviation, and others do not vary over time. However, we cannot make this assumption for streaming data. This non-stationary behavior is also called drift. Our algorithms should be able to spot and handle drifts efficiently.

The paper, Open Challenges for Data Stream Mining Research, by Georg Kremp et. al (http://www.kdd.org/exploration_files/16-1-2014.pdf), clearly classifies the various issues with processing the stream data.

They explain drift in terms of volatility:

  • Volatility corresponds to a dynamic environment with ever-changing patterns. Here, old data is of limited use, even if it could be saved and processed again later. This is due to changes, that can affect the induced data mining models in multiple ways: change of the target variable, change in the available feature information, and drift.
  • Drift is a phenomenon that occurs when the distributions of features x and target variables y change in time.
  • In supervised learning, drift can affect the posterior P(y|x), the conditional feature P(x|y), the feature P(x), and the class prior P(y) distribution.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset