Summary

The chapter started with an overview of data at motion and data at rest, also called as the streaming data. We further dwelled into the properties of streaming data and the challenges it poses while processing it. We introduced the stream clustering algorithm. The famous offline/online approach to stream clustering was discussed. Later on, we introduced various classes in stream package and how to use them. During that process, we discussed ideas about several data generators, DBSTREAM algorithms to find micro and macro clusters and several metrics to assess the quality of clusters. We then introduced our use case. We went ahead to design a clustering algorithm, with the online part based on reservoir sampling and the offline part was handled by k-means algorithm. Finally, we described the steps needed to take this whole setup in a real streaming environment.

In the next chapter, we will explore graph mining algorithms. We will show you how to use the package igraph to create and manipulate graphs. We will discuss Product Network Analysis and show how graph algorithms can assist in generating micro-categories.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary