Use case and data

Category management is analyzing a discrete set of similar or related items sold by a retailer, grouped together, as a strategic business unit. This allows the retailer to then evaluate these units by their turnover and profitability. Brain F. Harris is the inventor of the study of category management. His eight-step process, famously called the Brain Harris model, is used widely today. For more information about category management, refer to http://www.nielsen.com/tw/en/insights/reports/2014/category-management-the-win-win-platform-for-manufacturers-and-retailers.html.

The Nielsen definition of a category is based on product features. Products that exhibit the following features are put under the same category:

They should meet similar end-consumer needs
Products should be interrelated, for example, substitutable
We should be able to place the products together on a retailer shelf

When analyzing purchasing behavior, several patterns emerge; some products are sold together at the same time, some products are sold in a time sequential pattern, the sale of one product affects the sale of another and several others. These types of product interactions and patterns can either occur in the same category or across different categories. This leads to formation of micro-categories. Unfortunately, there are no simple ways to identify these micro-categories. Based on the products, market conditions, price points, consumer preference, and many other factors, several such micro-categories may emerge as more retail transactions aggregate.

A certain retailer has approached us with the problem of micro-categorization. Historically, using the product properties, the categories were created by the procurement team. Over a period of time, this manual process has introduced several inconsistencies in creating/assigning products to categories. Further, he believes that there exists several micro-categories for his products, which can be unearthed only by analyzing the transaction data. Evaluating profitability and turnover using the existing categories is of less use to him. All his supplementary systems including the loyalty system and the online selling platform can be made more effective with these new micro-categories. He has provided us with his historical transaction data. The data includes his past transactions, where each transaction is uniquely identified by an integer called order_id and the list of products present in the transaction called product_id.

This data and source can be downloaded from the Packt website.

> data <- read.csv('data.csv')
> head(data)
  order_id                       product_id
1   837080           Unsweetened Almondmilk
2   837080                    Fat Free Milk
3   837080                           Turkey
4   837080          Caramel Corn Rice Cakes
5   837080                Guacamole Singles
6   837080 HUMMUS 10OZ  WHITE BEAN EAT WELL

The given data is in a tabular format. Every row is a tuple of order_id, representing the transaction and product_id, the item included in that transaction. We need to transform this data so that we have all the pairs of products and the number of transactions in which they have occurred together. We will leverage the arules package to achieve this. This package provides the necessary infrastructure to store, manipulate, and analyze the retail transaction data. We have already covered the arules package in Chapter 1, Association Rule Mining. For more information about arules package, refer to https://cran.r-project.org/web/packages/arules/index.html.

Transaction data used in this chapter is from Instacart's public point of sale data at https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29eadj6f2.

Table of Contents for Use case and data

Create new playlist

Sign In

Sign Up

Table of Contents for
Use case and data