Analyze and Understand Networks Using R

Network analysis is the study of graphs. Graphs are defined by a set of nodes or vertices connected by edges. Both the nodes and vertices can have attributes describing them. Most importantly, the edges can carry weight, indicating the importance of the connection. When the directions of the edges are preserved, the graph is called a directed graph; when not preserved, it's called an undirected graph. Network analysis, or network theory, or graph theory provides a rich set of algorithms to analyze and understand graphs. The famous Koenigsberg problem (http://mathworld.wolfram.com/KoenigsbergBridgeProblem.html) introduced by Euler is one of the first graph theory problems to be studied. Koenigsberg is an old city in Prussia (modern day Russia). The river Pregal separates the city. There are two other islands. There are seven bridges connecting the islands and the cities. The Koenigsberg problem was to devise a walk through the city that would cross each of those bridges once and only once.

One graph structure that we all know today and is easy to relate to is the social network structure formed by various social media applications such as Facebook and LinkedIn. In these networks, people form the vertices and, when two of them are connected to each other, an edge is drawn between those vertices. The whole internet is a graph of connected machines. Other examples from biology include protein-protein interaction networks, genetic interaction networks, and so on.

When we represent a problem as a graph, it may give us a different point of view to solve that problem. Sometimes it can make the problem simpler to solve. One such problem that we are going to see in this chapter is assigning categories to items. More importantly, we will understand the micro-categorization of items in a retail setting. Though our example is from a retail setting, this technique is not limited to the retail domain. We will show how we can leverage graphs to assign categories to items. This technique is called the Product Network Analysis.

This chapter is loosely based on the following two papers: Product Network Analysis – The Next Big Thing in Retail Data Mining--a white paper by Forte Consultancy and Extending Market Basket Analysis with Graph Mining Techniques: A Real Case, by Ivan F. Videla-Cavieres , Sebastián A. Ríos, University of Chile, Department of Industrial Engineering, Business Intelligence Research Center (CEINE), Santiago, Chile.

Category management is very important for retailers. Having products grouped into the right category is the first step for retailers to manage their products. Downstream applications such as up-selling, cross-selling, and loyalty systems can benefit tremendously with the right category assignment. In the absence of a sophisticated product network analysis system, product categorization is done manually and is heavily dependent on the product features entered either by the suppliers or the procurement teams. This categorization may not be accurate and will be heavily biased by human judgment. It's impossible to expect concordance between two people in a task such as this one.

In this chapter, we will cover the following topics:

  • Introducing the igraph package to create and manipulate graphs in R
  • Going over our use case and data
  • Preparing our data for consumption by the igraph package
  • Applying graph clustering algorithms to identify product categories
  • Building a RShiny application.

The code for this chapter was written in RStudio Version 0.99.491. It uses R version 3.3.1. As we work through our example, we will introduce the R packages, igraph, and arules that we will be using. During our code description, we will be using some of the output printed in the console. We have included what will be printed in the console immediately following the statement that prints the information to the console, so as not to disturb the flow of the code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset