Sequential data analysis

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.
                                                          - Donald Rumsfeld, Former Secretary of Defense

The very first business question I came across after the 1st edition was published revolved around product sequential analysis. The team worked on complicated Excel spreadsheets and pivot tables, along with a bunch of SAS code, to produce insights. After coming across this problem, I explored what could be done with R and was pleasantly surprised to stumble into the TraMineR package, specifically designed for just such a task. I believe the application of R to the problem would have greatly simplified the analysis.

The package was designed for the social sciences, but it can be used in just about every situation where you want to mine and learn how observation's states evolve over discrete periods or events (longitudinal data). A classic use would be as in the case mentioned above where you want to understand the order in which customers purchase products. This would facilitate a recommendation engine of sorts where you can create the probability of the next purchase, as I've heard it being referred to as a next logical product offer. Another example could be in healthcare, examining the order that a patient receives treatments and/or medications, or even physician prescribing habits. I've worked on such tasks, creating simple and complex Markov chains to build models and create forecasts. Indeed, TraMineR allows the creation of Markov chain transition matrices to support such models.  

The code we will examine does the hard work of creating, counting, and plotting the various combinations of transitions over time, also incorporating covariates. That will be our focus, but keep in mind that one can also build a dissimilarity matrix for clustering. The core features covered in the practical exercise will consist of the following:

  • Transition rates
  • duration within each state
  • Sequence frequency

Let's get started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset