Home Page Icon
Home Page
Table of Contents for
Effective CRM Using Predictive Analytics
Close
Effective CRM Using Predictive Analytics
by Antonios Chorianopoulos
Effective CRM using Predictive Analytics
Cover
Title page
Preface
Acknowledgments
1 An overview of data mining: The applications, the methodology, the algorithms, and the data
1.1 The applications
1.2 The methodology
1.3 The algorithms
1.4 The data
1.5 Summary
Part I: The Methodology
2 Classification modeling methodology
2.1 An overview of the methodology for classification modeling
2.2 Business understanding and design of the process
2.3 Data understanding, preparation, and enrichment
2.4 Classification modeling
2.5 Model evaluation
2.6 Model deployment
2.7 Using classification models in direct marketing campaigns
2.8 Acquisition modeling
2.9 Cross-selling modeling
2.10 Offer optimization with next best product campaigns
2.11 Deep-selling modeling
2.12 Up-selling modeling
2.13 Voluntary churn modeling
2.14 Summary of what we’ve learned so far: it’s not about the tool or the modeling algorithm. It’s about the methodology and the design of the process
3 Behavioral segmentation methodology
3.1 An introduction to customer segmentation
3.2 An overview of the behavioral segmentation methodology
3.3 Business understanding and design of the segmentation process
3.4 Data understanding, preparation, and enrichment
3.5 Identification of the segments with cluster modeling
3.6 Evaluation and profiling of the revealed segments
3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies
3.8 Summary
Part II: The Algorithms
4 Classification algorithms
4.1 Data mining algorithms for classification
4.2 An overview of Decision Trees
4.3 The main steps of Decision Tree algorithms
4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures
4.5 Bayesian networks
4.6 Naïve Bayesian networks
4.7 Bayesian belief networks
4.8 Support vector machines
4.9 Summary
5 Segmentation algorithms
5.1 Segmenting customers with data mining algorithms
5.2 Principal components analysis
5.3 Clustering algorithms
5.4 Summary
Part III: The Case Studies
6 A voluntary churn propensity model for credit card holders
6.1 The business objective
6.2 The mining approach
6.3 The data dictionary
6.4 The data preparation procedure
6.5 Derived fields: the final data dictionary
6.6 The modeling procedure
6.7 Understanding and evaluating the models
6.8 Model deployment: using churn propensities to target the retention campaign
6.9 The voluntary churn model revisited using RapidMiner
6.10 Developing the churn model with Data Mining for Excel
6.11 Summary
7 Value segmentation and cross-selling in retail
7.1 The business background and objective
7.2 An outline of the data preparation procedure
7.3 The data dictionary
7.4 The data preparation procedure
7.5 The data dictionary of the modeling file
7.6 Value segmentation
7.7 The recency, frequency, and monetary (RFM) analysis
7.8 The RFM cell segmentation procedure
7.9 Setting up a cross-selling model
7.10 The mining approach
7.11 The modeling procedure
7.12 Browsing the model results and assessing the predictive accuracy of the classifiers
7.13 Deploying the model and preparing the cross-selling campaign list
7.14 The retail case study using RapidMiner
7.15 Building the cross-selling model with Data Mining for Excel
7.16 Summary
8 Segmentation application in telecommunications
8.1 Mobile telephony: the business background and objective
8.2 The segmentation procedure
8.3 The data preparation procedure
8.4 The data dictionary and the segmentation fields
8.5 The modeling procedure
8.6 Segmentation using RapidMiner and K-means cluster
8.7 Summary
Bibliography
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Title page
Table of Contents
Cover
Title page
Preface
Acknowledgments
1 An overview of data mining: The applications, the methodology, the algorithms, and the data
1.1 The applications
1.2 The methodology
1.3 The algorithms
1.4 The data
1.5 Summary
Part I: The Methodology
2 Classification modeling methodology
2.1 An overview of the methodology for classification modeling
2.2 Business understanding and design of the process
2.3 Data understanding, preparation, and enrichment
2.4 Classification modeling
2.5 Model evaluation
2.6 Model deployment
2.7 Using classification models in direct marketing campaigns
2.8 Acquisition modeling
2.9 Cross-selling modeling
2.10 Offer optimization with next best product campaigns
2.11 Deep-selling modeling
2.12 Up-selling modeling
2.13 Voluntary churn modeling
2.14 Summary of what we’ve learned so far: it’s not about the tool or the modeling algorithm. It’s about the methodology and the design of the process
3 Behavioral segmentation methodology
3.1 An introduction to customer segmentation
3.2 An overview of the behavioral segmentation methodology
3.3 Business understanding and design of the segmentation process
3.4 Data understanding, preparation, and enrichment
3.5 Identification of the segments with cluster modeling
3.6 Evaluation and profiling of the revealed segments
3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies
3.8 Summary
Part II: The Algorithms
4 Classification algorithms
4.1 Data mining algorithms for classification
4.2 An overview of Decision Trees
4.3 The main steps of Decision Tree algorithms
4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures
4.5 Bayesian networks
4.6 Naïve Bayesian networks
4.7 Bayesian belief networks
4.8 Support vector machines
4.9 Summary
5 Segmentation algorithms
5.1 Segmenting customers with data mining algorithms
5.2 Principal components analysis
5.3 Clustering algorithms
5.4 Summary
Part III: The Case Studies
6 A voluntary churn propensity model for credit card holders
6.1 The business objective
6.2 The mining approach
6.3 The data dictionary
6.4 The data preparation procedure
6.5 Derived fields: the final data dictionary
6.6 The modeling procedure
6.7 Understanding and evaluating the models
6.8 Model deployment: using churn propensities to target the retention campaign
6.9 The voluntary churn model revisited using RapidMiner
6.10 Developing the churn model with Data Mining for Excel
6.11 Summary
7 Value segmentation and cross-selling in retail
7.1 The business background and objective
7.2 An outline of the data preparation procedure
7.3 The data dictionary
7.4 The data preparation procedure
7.5 The data dictionary of the modeling file
7.6 Value segmentation
7.7 The recency, frequency, and monetary (RFM) analysis
7.8 The RFM cell segmentation procedure
7.9 Setting up a cross-selling model
7.10 The mining approach
7.11 The modeling procedure
7.12 Browsing the model results and assessing the predictive accuracy of the classifiers
7.13 Deploying the model and preparing the cross-selling campaign list
7.14 The retail case study using RapidMiner
7.15 Building the cross-selling model with Data Mining for Excel
7.16 Summary
8 Segmentation application in telecommunications
8.1 Mobile telephony: the business background and objective
8.2 The segmentation procedure
8.3 The data preparation procedure
8.4 The data dictionary and the segmentation fields
8.5 The modeling procedure
8.6 Segmentation using RapidMiner and K-means cluster
8.7 Summary
Bibliography
Index
End User License Agreement
List of Tables
Chapter 01
Table 1.1 Data mining models and direct marketing campaigns
Table 1.2 The CRISP-DM phases
Table 1.3 The minimum required data for the mining datamart of retail banking
Table 1.4 The minimum required data for the mining datamart of mobile telephony (residential customers)
Table 1.5 The minimum required data for the mining datamart of retailers
Chapter 02
Table 2.1 The classification methodology
Table 2.2 A modeling dataset partitioned into training and testing samples
Table 2.3 A modeling dataset with fourfold for Cross Validation
Table 2.4 A class-imbalanced modeling file for cross-selling
Table 2.5 The balanced modeling file
Table 2.6 A class-imbalanced modeling file with weights
Table 2.7 Confusion matrix
Table 2.8 The gains, response, and lift table
Table 2.9 The Profit report generated by Data Mining for Excel
Table 2.10 The schema used for testing both the model and the offer of a direct marketing campaign
Table 2.11 Marketing applications and campaigns that can be supported by classification modeling
Chapter 03
Table 3.1 The behavioral segmentation methodology
Table 3.2 The table of cluster centers
Chapter 04
Table 4.1 The modeling dataset for the Decision Tree example
Table 4.2 A cross-tab of the outcome with the profession input
Table 4.3 The Modeler CHAID parameters
Table 4.4 The Modeler C5.0 parameters
Table 4.5 The Modeler CART parameters
Table 4.6 The RapidMiner Decision Tree parameters
Table 4.7 The Microsoft Decision Trees parameters
Table 4.8 The Modeler Bayesian networks parameters
Table 4.9 The Microsoft Naïve Bayes parameters
Table 4.10 The IBM SPSS Modeler SVM parameters
Table 4.11 The RapidMiner SVM parameters
Chapter 05
Table 5.1 Behavioral fields used in the PCA example
Table 5.2 Pairwise correlation coefficients among inputs
Table 5.3 The PCA model parameters in IBM SPSS Modeler
Table 5.4 The PCA model parameters in RapidMiner
Table 5.5 IBM SPSS Modeler K-means parameters
Table 5.6 RapidMiner’s K-means and K-medoid parameters
Table 5.7 The cluster parameters of Data Mining for Excel
Table 5.8 The IBM SPSS Modeler TwoStep cluster parameters
Chapter 06
Table 6.1 Data dictionary (card level) for the voluntary churn model
Table 6.2 Data dictionary (customer level) for the voluntary churn model
Table 6.3 The churn models parameter settings
Table 6.4 The accuracy and the error rate of the churn Decision Tree model
Table 6.5 The Confusion matrix of the Decision Tree model
Table 6.6 The Gains table and the top 20 percentiles of the Microsoft Decision Tree churn model
Chapter 07
Table 7.1 Initial transactional data retrieved
Table 7.2 Pivoting transactional data to generate new fields per product group, day and time zone of transaction
Table 7.3 Aggregating at a transaction (invoice) level
Table 7.4 Aggregating at a customer (Card ID) level
Table 7.5 The transactional input data
Table 7.6 The demographical input data
Table 7.7 The data dictionary of the modeling file
Table 7.8 Value-based segments and total purchase amount
Table 7.9 The parameters of the cross-selling models.
Table 7.10 Decision Tree model parameters
Table 7.11 The accuracy and the error rate for the generated Decision Tree models
Table 7.12 The confusion matrix for the BDE Decision Tree model
Table 7.13 The confusion matrix for the Entropy Decision Tree model
Table 7.14 The Gains table and the top 20 percentiles of the two Decision Tree models
Chapter 08
Table 8.1 Mobile telephony usage aspects that were investigated in the behavioral segmentation
Table 8.2 Mobile telephony segmentation fields
Table 8.3 The PCA model parameter settings
Table 8.4 Deciding the number of extracted components by examining the variance explained table
Table 8.5 Understanding and labeling the components through rotated component matrix
Table 8.6 The interpretation and labeling of the derived components
Table 8.7 The parameter settings of the cluster model
Table 8.8 The means of some original attributes of importance for each cluster
Table 8.9 The mobile telephony segments
List of Illustrations
Chapter 01
Figure 1.1
Data mining and customer life cycle management.
Chapter 02
Figure 2.1
The data setup and time frames in a classification model trained on historical data.
Figure 2.2
The data setup and time frames in a churn model
.
Figure 2.3
Using multiple time frames in model training
Figure 2.4
The Data Audit node of IBM SPSS Modeler for data exploration
Figure 2.5
The IBM SPSS Modeler’s Partition node for Split validation
Figure 2.6
A Modeler stream with Split validation
Figure 2.7
The parameters of the Split Validation operator in RapidMiner
Figure 2.8
The Split Validation operator and the corresponding subprocesses in RapidMiner
Figure 2.9
The Split step in the Classify Wizard of the Data Mining for Excel
Figure 2.10
The RapidMiner Cross Validation operator
Figure 2.11
The subprocesses created by the RapidMiner Cross-Validation operator
Figure 2.12
The Cross validation wizard in the Data Mining for Excel
Figure 2.13
The Balance node options of IBM SPSS Modeler
Figure 2.14
The Balance node in a Modeler stream
Figure 2.15
Balancing with the Sample operator of RapidMiner
Figure 2.16
Balancing in a RapidMiner process
Figure 2.17
Balancing in Data Mining for Excel
Figure 2.18
Using the Sample wizard in Data Mining for Excel for Split validation and for balancing only the training file
Figure 2.19
Applying class weight in IBM SPSS Modeler Decision Trees
Figure 2.20
The Generate Weight (Stratification) used in RapidMiner for class weighting
Figure 2.21
The Auto Classifier node of IBM SPSS Modeler for simultaneous training of multiple learners
Figure 2.22
The Auto Classifier node of IBM SPSS Modeler in action
Figure 2.23
Selecting the classification algorithm and setting its parameters in Excel
Figure 2.24
Training a classifier with RapidMiner
Figure 2.25
The Bagging operator in RapidMiner
Figure 2.26
The Boosting menu in IBM SPSS Modeler
Figure 2.27
The base Decision Tree models generated by Boosting with 10 iterations
Figure 2.28
Random Forests in RapidMiner
Figure 2.29
The Analysis node in IBM SPSS Modeler for validation with Confusion matrix
Figure 2.30
The Analysis node in a Modeler stream
Figure 2.31
RapidMiner’s Performance operator
Figure 2.32
The Performance operator in a RapidMiner process
Figure 2.33
RapidMiner’s Confusion matrix as generated by the Performance operator
Figure 2.34
The Classification Matrix in Data Mining for Excel
Figure 2.35
The Confusion matrix in Data Mining for Excel
Figure 2.36
Gains chart
Figure 2.37
Response chart
Figure 2.38
Lift chart
Figure 2.39
The Evaluation Modeler node for Gains, Lift, and Response charts
Figure 2.40
A Modeler Gains chart for the comparison of a series of models
Figure 2.41
RapidMiner’s Create Lift Chart operator
Figure 2.42
The Data Mining for Excel Accuracy Chart wizard
Figure 2.43
A sample Data Mining for Excel Gains chart
Figure 2.44
The Compare ROCs operator of RapidMiner
Figure 2.45
Comparing a series of classifiers with the Compare ROCs operator of RapidMiner
Figure 2.46
Example of a ROC curve
Figure 2.47
The IBM SPSS Modeler Evaluation node for building Profit/ROI charts
Figure 2.48
Example of a Profit chart for a classifier
Figure 2.49
Example of a Profit chart for a classifier
Figure 2.50
The Data Mining for Excel Profit chart wizard
Figure 2.51
A Profit chart generated by Data Mining for Excel Profit chart wizard
Figure 2.52
The measured response rate of a cross-selling campaign by group
Figure 2.53
The measured churn rate of a retention campaign by group
Figure 2.54
Scoring with a generated model in IBM SPSS Modeler
Figure 2.55
The classifier’s scores derived by BM SPSS Modeler
Figure 2.56
The Apply Model RapidMiner operator for scoring with a classifier
Figure 2.57
The RapidMiner prediction fields
Figure 2.58
The Create Threshold operator for setting a propensity cutoff value
Figure 2.59
Setting a propensity threshold with the Create Threshold operator
Figure 2.60
The Query wizard used in Data Mining for Excel for model deployment
Figure 2.61
Selecting the score fields from the Query Wizard menu
Figure 2.62
Deploying a classifier in Data Mining for Excel
Figure 2.63
The pilot campaign approach for acquisition modeling
Figure 2.64
The profiling approach for acquisition modeling
Figure 2.65
The pilot campaign approach for cross-selling modeling
Figure 2.66
The product uptake approach for cross-selling modeling
Figure 2.67
The profiling of owners approach for cross-selling modeling
Figure 2.68
The next best offer approach
Figure 2.69
The pilot campaign approach for deep-selling modeling
Figure 2.70
The product usage increase approach for deep-selling modeling
Figure 2.71
The profiling of customers with high usage approach for deep-selling modeling
Figure 2.72
The pilot campaign approach for up-selling modeling
Figure 2.73
The product upgrade approach for up-selling modeling
Figure 2.74
The profiling of “premium” product owners approach for up-selling modeling
Figure 2.75 The approach for building a voluntary churn model
Chapter 03
Figure 3.1
The Replace Missing Values operator of RapidMiner for handling missing values
Figure 3.2
The Detect Outlier operator of RapidMiner
Figure 3.3
The Auto Data Prep node of IBM SPSS Modeler for data cleansing and preparation
Figure 3.4
The Anomaly node of IBM SPSS Modeler for outlier detection
Figure 3.5
Setting the thresholds of acceptable values in the Clean Data wizard of Excel
Figure 3.6
Handling outliers with the Clean Data wizard of Excel
Figure 3.7
The Normalize operator of RapidMiner
Figure 3.8
The PCA operator of RapidMiner
Figure 3.9
The PCA/Factor node of Modeler
Figure 3.10
The IBM SPSS Modeler Auto Cluster node for training cluster models
Figure 3.11
The RapidMiner operator for K-means clustering
Figure 3.12
The Cluster wizard for building cluster models in Data Mining for Excel
Figure 3.13
The Silhouette coefficient used by IBM SPSS Modeler to assess the cluster solution
Figure 3.14
Comparing cluster models with the Silhouette coefficient in IBM SPSS Modeler
Figure 3.15
Assessing the average centroid distances of a cluster solution in RapidMiner using the Cluster Distance Performance
Figure 3.16
Using the Cluster Distance Performance operator in RapidMiner to assess a cluster solution
Figure 3.17
Graphical representation of the cluster centers or centroids
Figure 3.18
The centroids table in RapidMiner
Figure 3.19
Modeler’s representation of the cluster centroids
Figure 3.20
Cluster comparison with boxplots in IBM SPSS Modeler
Figure 3.21
The Data Mining for Excel Cluster Diagram view
Figure 3.22
The Cluster Profiles view of Mining for Excel with the cluster sizes and centroids
Figure 3.23
The Cluster Characteristics view of Mining for Excel for cluster profiling
Figure 3.24
The Cluster Discrimination view of Mining for Excel for comparing the cluster with its complement
Figure 3.25
Combining data mining and market research-driven segmentations.
Figure 3.26
Assigning instances into clusters with a generated cluster model in IBM SPSS Modeler
Figure 3.27
Segmenting instances using a trained RapidMiner model
Chapter 04
Figure 4.1
A simple Decision Tree model
Figure 4.2
The reduction of impurity for the four possible splits based on the Gini index
Figure 4.3
The first level of an IBM SPSS Modeler CART
Figure 4.4
The Information gain measures for the four possible splits
Figure 4.5
A simple, 2-level C5.0 Decision Tree
Figure 4.6
Selecting predictors for split using the CHAID algorithm and the chi-square statistic
Figure 4.7
A simple, 2-level CHAID Decision Tree
Figure 4.8
The IBM SPSS Modeler CHAID growing algorithm options
Figure 4.9
The Modeler CHAID Advanced options
Figure 4.10
The Modeler CHAID Stopping Rules
Figure 4.11
The Modeler C5.0 Model settings
Figure 4.12
The Modeler CART Basics options
Figure 4.13
The Modeler CART Advanced options
Figure 4.14
The Modeler CART Stopping criteria
Figure 4.15
The RapidMiner Decision Tree options
Figure 4.16
The Microsoft Decision Tree parameters
Figure 4.17
A RapidMiner process for training a Naïve Bayes classification model
Figure 4.18
The prediction and probabilities estimated from a RapidMiner Naïve Bayes classification model
Figure 4.19
The Microsoft Naïve Bayes algorithm in the Classify Wizard of Data Mining for Excel
Figure 4.20
The Microsoft Naïve Bayes Dependency network in Data Mining for Excel
Figure 4.21
The Data Mining for Excel Attribute Profiles report presents the conditional probabilities used by the Naïve Bayes algorithm
Figure 4.22
The Data Mining for Excel Attribute Characteristics report sorts the input categories according to their conditional probabilities for the selected target class
Figure 4.23
The structure of Tree Augmented Naïve Bayes models (TAN Bayesian models)
Figure 4.24
A TAN Bayesian network built in IBM SPSS Modeler
Figure 4.25
The probabilities of the output (response to pilot campaign)
Figure 4.26
The conditional probabilities of the profession input attribute
Figure 4.27
The conditional probabilities of the voice calls input attribute
Figure 4.28
The conditional probabilities of the SMS calls input attribute
Figure 4.29
The conditional probabilities of the gender input attribute
Figure 4.30
The training dataset of the TAN Bayesian network
Figure 4.31
The IBM SPSS Modeler Bayesian networks model options
Figure 4.32
The IBM SPSS Modeler Bayesian networks model Expert options
Figure 4.33
The Microsoft Naïve Bayes parameters
Figure 4.34
A set of separating lines in the input feature space, in the case of linearly separable data
Figure 4.35
A separating hyperplane with the support vectors and the margin in a classification problem with two inputs and linear data
Figure 4.36
An alternative separating hyperplane with its support vectors and its margin for the same linear data
Figure 4.37
The IBM SPSS Modeler SVM settings
Figure 4.38
The RapidMiner RBF Support Vector Machine model options
Chapter 05
Figure 5.1
Linear correlation between two continuous measures
Figure 5.2
The “Total Variance Explained” part of the Modeler PCA output for the fixed telephony data
Figure 5.3
The eigenvalues of the RapidMiner PCA model results
Figure 5.4
An orthogonal rotation of the derived components
Figure 5.5
The rotated component matrix of the Modeler PCA model for the interpretation of the components
Figure 5.6
The PCA model settings in IBM SPSS Modeler
Figure 5.7
The PCA model settings in RapidMiner
Figure 5.8
An illustration of the K-means clustering process
Figure 5.9
The distribution of the five clusters revealed by Modeler’s K-means algorithm on the fixed telephony data
Figure 5.10
The centroids of the fixed telephony clusters
Figure 5.11
IBM SPSS Modeler K-means options
Figure 5.12
RapidMiner’s K-means options
Figure 5.13
The cluster options of Data Mining for Excel
Figure 5.14
The IBM SPSS Modeler TwoStep cluster options
Chapter 06
Figure 6.1
The three different time periods in the model training phase
Figure 6.2
Aggregating usage data per customer
Figure 6.3
Flagging cards which were open within the observation period
Figure 6.4
Filtering out cards closed before the observation period
Figure 6.5
Flagging cards open at the end of the observation period. The REFERENCE DATE is the last day of the observation period (2012-12-31)
Figure 6.6
Calculating a new field denoting the total number of transactions over the 12 months of the observation period
Figure 6.7
Using an IBM SPSS Modeler node for aggregating card data at the customer level
Figure 6.8
Deriving customer tenure
Figure 6.9
Deriving monthly average spending amount for each customer
Figure 6.10
Calculating the spending recency for each customer
Figure 6.11
Capturing changes of spending with spending amount delta
Figure 6.12
Capturing trends in card ownership
Figure 6.13
Selecting the modeling population
Figure 6.14
Discarding short-term churners from the model
Figure 6.15
Defining the target filed and population
Figure 6.16
The IBM SPSS Modeler procedure for churn modeling
Figure 6.17
Partitioning the modeling dataset for evaluation purposes
Figure 6.18
Balancing the distribution of the target field
Figure 6.19
The initial and the “balanced” distribution of the CHURN field in the training partition
Figure 6.20
Setting the role of the fields in the propensity model
Figure 6.21
The Auto-Classifier node used for building multiple propensity models
Figure 6.22
The propensity models trained for churn prediction
Figure 6.23
Performance metrics for the candidate models
Figure 6.24
The Gains chart of the candidate models
Figure 6.25
The importance of the predictors
Figure 6.26
The CHAID model (tree format)
Figure 6.27
The CHAID model (rules format)
Figure 6.28
The Gains chart of the ensemble model
Figure 6.29
The Lift chart of the ensemble model
Figure 6.30
The Confusion matrix of the ensemble model
Figure 6.31
The voluntary churn model scoring procedure
Figure 6.32
Scoring active customers and calculating churn propensities
Figure 6.33
Churn propensity-based segments
Figure 6.34
The RapidMiner modeling process
Figure 6.35
The Set Role operator used for defining the role of the attributes in the model
Figure 6.36
The Split validation settings
Figure 6.37
The Split Validation operator for partitioning the modeling dataset
Figure 6.38
The RapidMiner model Confusion matrix
Figure 6.39
The Naïve Bayes model’s ROC curve
Figure 6.40
The Lift chart of the Naïve Bayes model
Figure 6.41
The RapidMiner model predictions and estimated propensities for each customer
Figure 6.42
Selecting the source data for model training in the Classify Wizard of Data Mining for Excel
Figure 6.43
Choosing the predictors and the target in the Classify Wizard of Data Mining for Excel
Figure 6.44
Setting the parameters for the Decision Tree models in Data Mining for Excel
Figure 6.45
Applying a Split validation method in Data Mining for Excel
Figure 6.46
Storing the mining structure and the model
Figure 6.47
The Microsoft decision tree churn model
Figure 6.48
Selecting the model to evaluate in Data Mining for Excel
Figure 6.49
Selecting the target field for validation in Data Mining for Excel
Figure 6.50
Selecting the validation dataset in the Classification Matrix wizard of Data Mining for Excel
Figure 6.51
Requesting a Gains chart through the Accuracy Chart wizard of Data Mining for Excel
Figure 6.52
The Gains chart of the Microsoft Decision Tree churn model
Figure 6.53
The cumulative distribution of churners and nonchurners across the propensity percentiles
Figure 6.54
Using the Query wizard to deploy the model and score customers in Data Mining for Excel
Figure 6.55
The model estimates and the scored file
Chapter 07
Figure 7.1
Categorizing transactions into time zones
Figure 7.2
Pivoting paid amount based on payment type
Figure 7.3
Aggregating transactional data at a customer level
Figure 7.4
Adding the demographics using a Merge node
Figure 7.5
Calculating the relative spending KPIs
Figure 7.6
Deriving the average basket size
Figure 7.7
Constructing recency, the first RFM component
Figure 7.8
Constructing frequency, the second RFM component
Figure 7.9
Constructing the monetary component of RFM
Figure 7.10
Using a Data Audit node to perform an initial exploration of the data
Figure 7.11
An illustration of the value-based segmentation procedure
Figure 7.12
The IBM SPSS Modeler stream for value and RFM segmentation
Figure 7.13
Using a Binning node for grouping customers in groups of 5%
Figure 7.14
Regrouping quantiles into value segments
Figure 7.15
Assignment to the RFM cells.
Figure 7.16
The total RFM cells in the case of binning into quintiles (groups of 20%).
Figure 7.17
The distribution of the constructed RFM cells
Figure 7.18
An RFM scatter plot
Figure 7.19
An outline of the mining procedure followed for the development of the cross-selling model
Figure 7.20
The IBM SPSS Modeler procedure for cross-sell modeling
Figure 7.21
Merging campaign response data with the rest of the information
Figure 7.22
Partitioning the modeling dataset for evaluation purposes
Figure 7.23
Setting the role of the fields in the propensity model
Figure 7.24
Building multiple classification models with Auto Classifier
Figure 7.25
The classification models trained
Figure 7.26
Performance metrics for the individual models
Figure 7.27
The Gains chart for the cross-selling models
Figure 7.28
The first three levels of the generated C5.0 model
Figure 7.29
The TAN Bayesian network
Figure 7.30
The conditional probabilities for the FLAG_GROCERY attribute
Figure 7.31
The ROI chart of the ensemble model
Figure 7.32
The Gains chart of the ensemble model
Figure 7.33
The Modeler deployment stream
Figure 7.34
The model generated fields for the scored customers
Figure 7.35
The RapidMiner process for value segmentation and RFM analysis
Figure 7.36
The Discretize by frequency operator applied for value segmentation
Figure 7.37
The percentage of the total purchase amount for each value segment
Figure 7.38
Constructing RFM cells by concatenating the relevant binned attributes
Figure 7.39
The RapidMiner process file for the cross-selling model
Figure 7.40
The Split validation settings
Figure 7.41
The Split Validation operator for partitioning the modeling dataset
Figure 7.42
The Bagging procedure for building five separate Decision trees
Figure 7.43
The Decision Tree model parameter settings
Figure 7.44
The Decision Tree model in tree format
Figure 7.45
The Confusion matrix
Figure 7.46
The model’s ROC curve
Figure 7.47
The model deployment process
Figure 7.48
The prediction fields derived by the RapidMiner Decision Tree model
Figure 7.49
Selecting the source data for model training in the Classify Wizard of Data Mining for Excel
Figure 7.50
Assigning roles to the model fields in the Classify Wizard of Data Mining for Excel
Figure 7.51
Setting the parameters for the Decision Tree models
Figure 7.52
Applying a Split validation method
Figure 7.53
Storing the mining structure and model
Figure 7.54
The Dependency network of the BDE Decision Tree model
Figure 7.55
The cross-selling BDE Decision Tree model of Data Mining for Excel
Figure 7.56
Selecting the model to evaluate in Data Mining for Excel
Figure 7.57
Selecting the target field for validation in the Classification Matrix wizard of Data Mining for Excel
Figure 7.58
Selecting the validation dataset in the Classification Matrix wizard of Data Mining for Excel
Figure 7.59
Selecting the target class in the Accuracy Chart wizard of Data Mining for Excel
Figure 7.60
The Gains charts for the two Decision tree models in Data Mining for Excel
Figure 7.61
The cumulative distribution of responders and nonresponders across propensity percentiles
Figure 7.62
Using the Query wizard to deploy the model and score customers
Figure 7.63
The model estimates and the scored file
Chapter 08
Figure 8.1
Core segments in mobile telephony.
Figure 8.2
The data preparation Modeler stream
Figure 8.3
The IBM SPSS Modeler procedure for segmentation through clustering
Figure 8.4
An initial comparison of the generated cluster models provided by the Auto Cluster model nugget
Figure 8.5
The Silhouette measure of the cluster model
Figure 8.6
The distribution of the revealed clusters
Figure 8.7
The table of centroids of the five clusters
Figure 8.8
The distribution of factors for Cluster 5
Figure 8.9
Cluster 1 profiling chart
Figure 8.10
Cluster 2 profiling chart
Figure 8.11
Cluster 3 profiling chart
Figure 8.12
Cluster 4 profiling chart
Figure 8.13
Cluster 5 profiling chart
Figure 8.14
The RapidMiner process for clustering
Figure 8.15
The PCA model settings
Figure 8.16
The variance explained by the components
Figure 8.17
The K-means parameter settings
Figure 8.18
The distribution of the K-means clusters
Figure 8.19
Evaluating the cluster solution using within centroid distances
Figure 8.20
A profiling chart of the revealed clusters
Guide
Cover
Table of Contents
Begin Reading
Pages
iii
iv
v
xiii
xiv
xv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
215
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset