Index

Symbols

2-D Gaussian distributions, Profiling: Finding Typical Behavior
“and” operator, Applying Bayes’ Rule to Data Science

A

A Taxonomy of Privacy (Solove), Privacy, Ethics, and Mining Data About Individuals
Aberfeldy single malt scotch, Understanding the Results of Clustering
Aberlour single malt whiskey, Example: Whiskey Analytics
absolute errors, Regression via Mathematical Functions
accuracy (term), Plain Accuracy and Its Problems
accuracy results, From Holdout Evaluation to Cross-Validation
ACM SIGKDD, Superior Data Scientists, Is There More to Data Science?
ad impressions, Example: Targeting Online Consumers With Advertisements
adding variables to functions, Example: Overfitting Linear Functions
advertising, Example: Targeting Online Consumers With Advertisements
agency, Machine Learning and Data Mining
alarms, Evaluating Classifiers
algorithms
clustering, Nearest Neighbors Revisited: Clustering Around Centroids
data mining, From Business Problems to Data Mining Tasks
k-means, Nearest Neighbors Revisited: Clustering Around Centroids
modeling, A General Method for Avoiding Overfitting
Amazon, The Ubiquity of Data Opportunities, Data Science, Engineering, and Data-Driven Decision Making, From Big Data 1.0 to Big Data 2.0, Data and Data Science Capability as a Strategic Asset, Similarity, Neighbors, and Clusters
Borders vs., Achieving Competitive Advantage with Data Science
cloud storage, Thinking Data-Analytically, Redux
data science services provided by, Thinking Data-Analytically, Redux
historical advantages of, Formidable Historical Advantage
analysis
counterfactual, From Business Problems to Data Mining Tasks
learning curves and, Learning Curves
analytic engineering, Decision Analytic Thinking II: Toward Analytical EngineeringFrom an Expected Value Decomposition to a Data Science Solution
churn example, Our Churn Example Revisited with Even More SophisticationFrom an Expected Value Decomposition to a Data Science Solution
expected value decomposition and, From an Expected Value Decomposition to a Data Science SolutionFrom an Expected Value Decomposition to a Data Science Solution
incentives, assessing influence of, Assessing the Influence of the IncentiveAssessing the Influence of the Incentive
providing structure for business problem/solutions with, The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution PiecesThe Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces
selection bias, A Brief Digression on Selection BiasA Brief Digression on Selection Bias
targeting best prospects with, Targeting the Best Prospects for a Charity MailingA Brief Digression on Selection Bias
analytic skills, software skills vs., Implications for Managing the Data Science Team
analytic solutions, Data Mining and Data Science, Revisited
analytic techniques, Other Analytics Techniques and TechnologiesAnswering Business Questions with These Techniques, Decision Analytic Thinking I: What Is a Good Model?Summary
applying to business questions, Answering Business Questions with These TechniquesAnswering Business Questions with These Techniques
baseline performance and, Evaluation, Baseline Performance, and Implications for Investments in DataEvaluation, Baseline Performance, and Implications for Investments in Data
classification accuracy, Plain Accuracy and Its ProblemsGeneralizing Beyond Classification
confusion matrix, The Confusion MatrixThe Confusion Matrix
data warehousing, Data Warehousing
database queries, Database QueryingDatabase Querying
expected values, A Key Analytical Framework: Expected ValueCosts and benefits
generalization methods for, Generalizing Beyond ClassificationGeneralizing Beyond Classification
machine learning and, Machine Learning and Data MiningMachine Learning and Data Mining
OLAP, Database Querying
regression analysis, Regression Analysis
statistics, StatisticsStatistics
analytic technologies, Data Preparation
analytic tools, Holdout Data and Fitting Graphs
Angry Birds, Example: Evidence Lifts from Facebook “Likes”
Annie Hall (film), Data Reduction, Latent Information, and Movie Recommendation
Apollo 13 (film), Examine Data Science Case Studies
Apple Computer, Example: Clustering Business News StoriesThe news story clusters, The Data
applications, The Ubiquity of Data Opportunities, Decision Analytic Thinking I: What Is a Good Model?
area under ROC curves (AUC), The Area Under the ROC Curve (AUC), Example: Performance Analytics for Churn Modeling, Example: Performance Analytics for Churn Modeling
Armstrong, Louis, Example: Jazz Musicians
assessing overfitting, Overfitting
association discovery, Co-occurrences and Associations: Finding Items That Go TogetherAssociations Among Facebook Likes
among Facebook Likes, Associations Among Facebook LikesAssociations Among Facebook Likes
beer and lottery example, Example: Beer and Lottery TicketsExample: Beer and Lottery Tickets
eWatch/eBracelet example, Co-occurrences and Associations: Finding Items That Go TogetherCo-occurrences and Associations: Finding Items That Go Together
Magnum Opus system for, Associations Among Facebook Likes
market basket analysis, Associations Among Facebook LikesAssociations Among Facebook Likes
surprisingness, Measuring Surprise: Lift and LeverageMeasuring Surprise: Lift and Leverage
AT&T, From an Expected Value Decomposition to a Data Science Solution
attribute selection, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation, Selecting Informative AttributesSupervised Segmentation with Tree-Structured Models, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain, The Fundamental Concepts of Data Science
attributes, Models, Induction, and Prediction
finding, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
heterogeneous, Dimensionality and domain knowledge, Heterogeneous Attributes
variable features vs., Models, Induction, and Prediction
Audubon Society Field Guide to North American Mushrooms, Example: Attribute Selection with Information Gain
automatic decision-making, Data Science, Engineering, and Data-Driven Decision Making
average customers, profitable customers vs., Answering Business Questions with These Techniques

B

bag of words approach, Bag of Words
bags, Bag of Words
base rates, Class Probability Estimation and Logistic “Regression”, Holdout Data and Fitting Graphs, Problems with Unbalanced Classes
baseline classifiers, Advantages and Disadvantages of Naive Bayes
baseline methods, of data science, Summary
Basie, Count, Example: Jazz Musicians
Bayes rate, Bias, Variance, and Ensemble Methods
Bayes, Thomas, Bayes’ Rule
Bayesian methods, Bayes’ Rule, Summary
Bayes’ Rule, Bayes’ RuleA Model of Evidence “Lift”
beer and lottery example, Example: Beer and Lottery TicketsExample: Beer and Lottery Tickets
Beethoven, Ludwig van, Example: Evidence Lifts from Facebook “Likes”
beginning cross-validation, From Holdout Evaluation to Cross-Validation
behavior description, From Business Problems to Data Mining Tasks
Being John Malkovich (film), Data Reduction, Latent Information, and Movie Recommendation
Bellkors Pragmatic Chaos (Netflix Challenge team), Data Reduction, Latent Information, and Movie Recommendation
benefit improvement, calculating, Costs and benefits
benefits
and underlying profit calculation, ROC Graphs and Curves
data-driven decision-making, Data Science, Engineering, and Data-Driven Decision Making
estimating, Costs and benefits
in budgeting, Ranking Instead of Classifying
nearest-neighbor methods, Computational efficiency
bi-grams, N-gram Sequences
bias errors, ensemble methods and, Bias, Variance, and Ensemble MethodsBias, Variance, and Ensemble Methods
Big Data
data science and, Data Processing and “Big Data”Data Processing and “Big Data”
evolution of, From Big Data 1.0 to Big Data 2.0From Big Data 1.0 to Big Data 2.0
on Amazon and Google, Thinking Data-Analytically, Redux
big data technologies, Data Processing and “Big Data”
state of, From Big Data 1.0 to Big Data 2.0
utilizing, Data Processing and “Big Data”
Big Red proposal example, Example Data Mining ProposalFlaws in the Big Red Proposal
Bing, Why Text Is Important, Representation
Black-Scholes model, Models, Induction, and Prediction
blog postings, Why Text Is Important
blog posts, Example: Targeting Online Consumers With Advertisements
Borders (book retailer), Achieving Competitive Advantage with Data Science
breast cancer example, Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
Brooks, David, What Data Can’t Do: Humans in the Loop, Revisited
browser cookies, Example: Targeting Online Consumers With Advertisements
Brubeck, Dave, Example: Jazz Musicians
Bruichladdich single malt scotch, Understanding the Results of Clustering
Brynjolfsson, Erik, Data Science, Engineering, and Data-Driven Decision Making, Data Processing and “Big Data”
budget, Ranking Instead of Classifying
budget constraints, Profit Curves
building modeling labs, From Holdout Evaluation to Cross-Validation
building models, Data Mining and Its Results, Business Understanding, From Holdout Evaluation to Cross-Validation
Bunnahabhain single malt whiskey, Example: Whiskey Analytics, Hierarchical Clustering
business news stories example, Example: Clustering Business News StoriesThe news story clusters
business problems
changing definition of, to fit available data, Changing the Way We Think about Solutions to Business ProblemsChanging the Way We Think about Solutions to Business Problems
data exploration vs., Stepping Back: Solving a Business Problem Versus Data ExplorationStepping Back: Solving a Business Problem Versus Data Exploration
engineering problems vs., Other Data Science Tasks and Techniques
evaluating in a proposal, Be Ready to Evaluate Proposals for Data Science Projects
expected value framework, structuring with, The Expected Value Framework: Structuring a More Complicated Business ProblemThe Expected Value Framework: Structuring a More Complicated Business Problem
exploratory data mining vs., The Fundamental Concepts of Data Science
unique context of, What Data Can’t Do: Humans in the Loop, Revisited
using expected values to provide framework for, The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution PiecesThe Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces
business strategy, Data Science and Business StrategyA Firm’s Data Science Maturity
accepting creative ideas, Be Ready to Accept Creative Ideas from Any Source
case studies, examining, Examine Data Science Case Studies
competitive advantages, Achieving Competitive Advantage with Data ScienceAchieving Competitive Advantage with Data Science, Sustaining Competitive Advantage with Data ScienceSuperior Data Science Management
data scientists, evaluating, Superior Data ScientistsSuperior Data Scientists
evaluating proposals, Be Ready to Evaluate Proposals for Data Science ProjectsFlaws in the Big Red Proposal
historical advantages and, Formidable Historical Advantage
intangible collateral assets and, Unique Intangible Collateral Assets
intellectual property and, Unique Intellectual Property
managing data scientists effectively, Superior Data Science ManagementSuperior Data Science Management
maturity of the data science, A Firm’s Data Science MaturityA Firm’s Data Science Maturity
thinking data-analytically for, Thinking Data-Analytically, ReduxThinking Data-Analytically, Redux

C

Caesars Entertainment, Data and Data Science Capability as a Strategic Asset
call center example, Profiling: Finding Typical BehaviorProfiling: Finding Typical Behavior
Capability Maturity Model, A Firm’s Data Science Maturity
Capital One, Data and Data Science Capability as a Strategic Asset, From an Expected Value Decomposition to a Data Science Solution
Case-Based Reasoning, How Many Neighbors and How Much Influence?
cases
creating, Deployment
ranking vs. classifying, Visualizing Model PerformanceExample: Performance Analytics for Churn Modeling
casual modeling, From Business Problems to Data Mining Tasks
causal analysis, Assessing the Influence of the Incentive
causal explanation, Data-Driven Causal Explanation and a Viral Marketing Example
causal radius, The Task
causation, correlation vs., The news story clusters
cellular churn example
unbalanced classes in, Problems with Unbalanced Classes
unequal costs and benefits in, Problems with Unequal Costs and Benefits
Census Bureau Economic Survey, Statistics
centroid locations, Nearest Neighbors Revisited: Clustering Around Centroids
centroid-based clustering, Example: Clustering Business News Stories
centroids, Nearest Neighbors Revisited: Clustering Around CentroidsNearest Neighbors Revisited: Clustering Around Centroids, Example: Clustering Business News StoriesThe news story clusters
characteristics, Answering Business Questions with These Techniques
characterizing customers, Answering Business Questions with These Techniques
churn, Example: Predicting Customer Churn, Data Mining and Data Science, Revisited, Problems with Unbalanced Classes
and expected value, Using Expected Value to Frame Classifier Evaluation
finding variables, Data Mining and Data Science, Revisited
performance analytics for modeling, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
churn prediction, Thinking Data-Analytically, Redux
Ciccarelli, Francesca, Hierarchical Clustering
class confusion, The Confusion Matrix
class labels, * Logistic Regression: Some Technical Details* Logistic Regression: Some Technical Details
class membership, estimating likelihood of, Example: Targeting Online Consumers With Advertisements
class priors, Costs and benefits, ROC Graphs and Curves, ROC Graphs and Curves, Cumulative Response and Lift Curves
class probability, The Ubiquity of Data Opportunities, From Business Problems to Data Mining Tasks, Class Probability Estimation and Logistic “Regression”Example: Logistic Regression versus Tree Induction, Bias, Variance, and Ensemble Methods
classes
exhaustive, Conditional Independence and Naive Bayes
mutually exclusive, Conditional Independence and Naive Bayes
probability of evidence given, Conditional Independence and Naive Bayes
separating, Example: Overfitting Linear Functions
classification, The Ubiquity of Data Opportunities, From Business Problems to Data Mining Tasks, Similarity, Neighbors, and Clusters
Bayes’ Rule for, Applying Bayes’ Rule to Data Science
building models for, Business Understanding
ensemble methods and, Bias, Variance, and Ensemble Methods
neighbors and, Classification
regression and, From Business Problems to Data Mining Tasks
supervised data mining and, Supervised Versus Unsupervised Methods
classification accuracy
confusion matrix, The Confusion MatrixThe Confusion Matrix
evaluating, with expected values, Using Expected Value to Frame Classifier EvaluationUsing Expected Value to Frame Classifier Evaluation
measurability of, Plain Accuracy and Its Problems
unbalanced classes, Problems with Unbalanced ClassesProblems with Unbalanced Classes
unequal costs/benefit ratios, Problems with Unequal Costs and BenefitsProblems with Unequal Costs and Benefits
classification function, Linear Discriminant Functions
classification modeling, Generalizing Beyond Classification
classification tasks, From Business Problems to Data Mining Tasks
classification trees, Supervised Segmentation with Tree-Structured Models
as sets of rules, Trees as Sets of RulesTrees as Sets of Rules
ensemble methods and, Bias, Variance, and Ensemble Methods
in KDD Cup churn problem, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
inducing, Supervised Segmentation with Tree-Structured Models
logistic regression and, The Churn Dataset Revisited
predictive models and, Supervised Segmentation with Tree-Structured Models
visualizing, Visualizing SegmentationsVisualizing Segmentations
classifier accuracy, Plain Accuracy and Its Problems
classifiers
and ROC graphs, ROC Graphs and CurvesROC Graphs and Curves
baseline, Advantages and Disadvantages of Naive Bayes
confusion matrix produced by, Ranking Instead of ClassifyingRanking Instead of Classifying
conservative, ROC Graphs and Curves
cumulative response curves of, Cumulative Response and Lift CurvesCumulative Response and Lift Curves
discrete (binary), ROC Graphs and Curves
inability to obtain accurate probability estimates from, Ranking Instead of Classifying
lift of, Cumulative Response and Lift Curves
linear, Classification via Mathematical Functions
Naive Bayes, Conditional Independence and Naive Bayes
operating conditions of, ROC Graphs and Curves
performance de-coupled from conditions for, ROC Graphs and Curves
permissive, ROC Graphs and Curves
plus thresholds, Ranking Instead of Classifying
random, Profit Curves
scores given to instances by, Ranking Instead of Classifying
classifying cases, ranking vs., Ranking Instead of ClassifyingRanking Instead of Classifying
climatology, Evaluation, Baseline Performance, and Implications for Investments in Data
clipping dendrograms, Hierarchical Clustering
cloud labor, Final Example: From Crowd-Sourcing to Cloud-Sourcing
clumps of instances, Example: Overfitting Linear Functions
cluster centers, Nearest Neighbors Revisited: Clustering Around Centroids
cluster distortion, Nearest Neighbors Revisited: Clustering Around Centroids
clustering, From Business Problems to Data Mining Tasks, Clustering* Using Supervised Learning to Generate Cluster Descriptions, Representing and Mining Text
algorithm, Nearest Neighbors Revisited: Clustering Around Centroids
business news stories example, Example: Clustering Business News StoriesThe news story clusters
centroid-based, Example: Clustering Business News Stories
creating, Hierarchical Clustering
data preparation for, Data preparationData preparation
hierarchical, Hierarchical ClusteringHierarchical Clustering
indicating, Hierarchical Clustering
interpreting results of, Understanding the Results of ClusteringUnderstanding the Results of Clustering
nearest neighbors and, Nearest Neighbors Revisited: Clustering Around CentroidsNearest Neighbors Revisited: Clustering Around Centroids
profiling and, Profiling: Finding Typical Behavior
soft, Profiling: Finding Typical Behavior
supervised learning and, * Using Supervised Learning to Generate Cluster Descriptions* Using Supervised Learning to Generate Cluster Descriptions
whiskey example, Example: Whiskey Analytics RevisitedHierarchical Clustering
clusters, Similarity, Neighbors, and Clusters, Understanding the Results of Clustering
co-occurrence grouping, From Business Problems to Data Mining TasksFrom Business Problems to Data Mining Tasks, Co-occurrences and Associations: Finding Items That Go TogetherAssociations Among Facebook Likes
beer and lottery example, Example: Beer and Lottery TicketsExample: Beer and Lottery Tickets
eWatch/eBracelet example, Co-occurrences and Associations: Finding Items That Go TogetherCo-occurrences and Associations: Finding Items That Go Together
market basket analysis, Associations Among Facebook LikesAssociations Among Facebook Likes
surprisingness, Measuring Surprise: Lift and LeverageMeasuring Surprise: Lift and Leverage
Coelho, Paul, Example: Evidence Lifts from Facebook “Likes”
cognition, Machine Learning and Data Mining
Coltrane, John, Example: Jazz Musicians
combining functions, Nearest Neighbors for Predictive Modeling, * Combining Functions: Calculating Scores from Neighbors* Combining Functions: Calculating Scores from Neighbors
common tasks, From Business Problems to Data Mining TasksFrom Business Problems to Data Mining Tasks, From Business Problems to Data Mining Tasks
communication, between scientists and business people, Superior Data Science Management, The Fundamental Concepts of Data Science
company culture, as intangible asset, Unique Intangible Collateral Assets
comparisons, multiple, * Avoiding Overfitting for Parameter Optimization* Avoiding Overfitting for Parameter Optimization
complex functions, Overfitting in Mathematical Functions, Example: Overfitting Linear Functions
complexity, Learning Curves
complexity control, Overfitting Avoidance and Complexity Control* Avoiding Overfitting for Parameter Optimization, * Avoiding Overfitting for Parameter Optimization
ensemble method and, Bias, Variance, and Ensemble Methods
nearest-neighbor reasoning and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
complications, Selecting Informative Attributes
comprehensibility, of models, Evaluation
computing errors, Regression via Mathematical Functions
computing likelihood, * Logistic Regression: Some Technical Details
conditional independence
and Bayes’ Rule, Bayes’ Rule
unconditional vs., Conditional Independence and Naive Bayes
conditional probability, Combining Evidence Probabilistically
conditioning bar, Combining Evidence Probabilistically
confidence, in association mining, Co-occurrences and Associations: Finding Items That Go Together
confusion matrix
and points in ROC space, ROC Graphs and Curves
evaluating models with, The Confusion MatrixThe Confusion Matrix
expected value corresponding to, Profit Curves
produced by classifiers, Ranking Instead of ClassifyingRanking Instead of Classifying
true positive and false negative rates for, ROC Graphs and Curves
constraints
budget, Profit Curves
workforce, Profit Curves
consumer movie-viewing preferences example, Data Reduction, Latent Information, and Movie Recommendation
consumer voice, From Big Data 1.0 to Big Data 2.0
consumers, describing, Example: Targeting Online Consumers With AdvertisementsExample: Targeting Online Consumers With Advertisements
content pieces, online consumer targeting based on, Example: Targeting Online Consumers With Advertisements
context, importance of, Why Text Is Difficult
control group, evaluating data models with, Flaws in the Big Red Proposal
converting data, Data Preparation
cookies, browser, Example: Targeting Online Consumers With Advertisements
corpus, Representation
correlations, From Business Problems to Data Mining Tasks, Statistics
causation vs., The news story clusters
general-purpose meaning, Statistics
specific technical meaning, Statistics
cosine distance, * Other Distance Functions, * Other Distance Functions
cosine similarity, * Other Distance Functions
Cosine Similarity function, Example: Jazz Musicians
cost matrix, Profit Curves
cost-benefit matrix, Costs and benefits, Costs and benefits, Costs and benefits
costs
and underlying profit calculation, ROC Graphs and Curves
estimating, Costs and benefits
in budgeting, Ranking Instead of Classifying
of data, Data Understanding
counterfactual analysis, From Business Problems to Data Mining Tasks
Cray Computer Corporation, The Data
credit-card transactions, Data Understanding, Profiling: Finding Typical Behavior
creditworthiness model, as example of selection bias, A Brief Digression on Selection Bias
CRISP cycle, Implications for Managing the Data Science Team
approaches and, Implications for Managing the Data Science Team
strategy and, Implications for Managing the Data Science Team
CRISP-DM, Data Mining and Data Science, Revisited, The Data Mining Process
Cross Industry Standard Process for Data Mining (CRISP), Data Mining and Data Science, Revisited, The Data Mining ProcessDeployment, The Data Mining Process
business understanding, Business UnderstandingBusiness Understanding
data preparation, Data PreparationData Preparation
data understanding, Data UnderstandingData Understanding
deployment, DeploymentDeployment
evaluation, EvaluationEvaluation
modeling, Modeling
software development cycle vs., Implications for Managing the Data Science TeamImplications for Managing the Data Science Team
cross-validation, From Holdout Evaluation to Cross-Validation, Summary
beginning, From Holdout Evaluation to Cross-Validation
datasets and, From Holdout Evaluation to Cross-Validation
nested, A General Method for Avoiding Overfitting
overfitting and, From Holdout Evaluation to Cross-ValidationFrom Holdout Evaluation to Cross-Validation
cumulative response curves, Cumulative Response and Lift CurvesCumulative Response and Lift Curves
curse of dimensionality, Dimensionality and domain knowledge
customer churn example
analytic engineering example, Our Churn Example Revisited with Even More SophisticationFrom an Expected Value Decomposition to a Data Science Solution
and data firm maturity, A Firm’s Data Science Maturity
customer churn, predicting, Example: Predicting Customer Churn
with cross-validation, The Churn Dataset RevisitedThe Churn Dataset Revisited
with tree induction, Example: Addressing the Churn Problem with Tree InductionExample: Addressing the Churn Problem with Tree Induction
customer retention, Example: Predicting Customer Churn
customers, characterizing, Answering Business Questions with These Techniques

D

data
as a strategic asset, Data and Data Science Capability as a Strategic Asset
converting, Data Preparation
cost, Data Understanding
holdout, Holdout Data and Fitting Graphs
investment in, From an Expected Value Decomposition to a Data Science Solution
labeled, Models, Induction, and Prediction
objective truth vs., What Data Can’t Do: Humans in the Loop, Revisited
obtaining, From an Expected Value Decomposition to a Data Science Solution
training, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation, Models, Induction, and Prediction
data analysis, Example: Predicting Customer Churn, From Business Problems to Data Mining Tasks
data exploration, Stepping Back: Solving a Business Problem Versus Data ExplorationStepping Back: Solving a Business Problem Versus Data Exploration
data landscape, Hierarchical Clustering
data mining, Business Problems and Data Science SolutionsSummary
and Bayes’ Rule, Applying Bayes’ Rule to Data Science
applying, Answering Business Questions with These TechniquesAnswering Business Questions with These Techniques, Supervised Segmentation
as strategic component, Data-Analytic Thinking
CRISP codification of, The Data Mining ProcessDeployment
data science and, The Ubiquity of Data Opportunities, Data Mining and Data Science, RevisitedData Mining and Data Science, Revisited
domain knowledge and, Dimensionality and domain knowledge
early stages, Supervised Versus Unsupervised Methods
fundamental ideas, Supervised Segmentation with Tree-Structured Models
implementing techniques, Data Processing and “Big Data”
important distinctions, Data Mining and Its Results
matching analytic techniques to problems, Other Analytics Techniques and TechnologiesAnswering Business Questions with These Techniques
process of, The Data Mining ProcessDeployment
results of, Data Mining and Its ResultsData Mining and Its Results, Deployment
skills, Implications for Managing the Data Science Team
software development cycle vs., Implications for Managing the Data Science TeamImplications for Managing the Data Science Team
stages, Data Mining and Data Science, Revisited
structuring projects, Business Problems and Data Science Solutions
supervised vs. unsupervised methods of, Supervised Versus Unsupervised MethodsSupervised Versus Unsupervised Methods
systems, Deployment
tasks, fitting business problems to, From Business Problems to Data Mining TasksFrom Business Problems to Data Mining Tasks, From Business Problems to Data Mining Tasks
techniques, Deployment
Data Mining (field), Machine Learning and Data Mining
data mining algorithms, From Business Problems to Data Mining Tasks
data mining proposal example, Example Data Mining ProposalFlaws in the Big Red Proposal
data preparation, Data Preparation, Representing and Mining Text
data preprocessing, Data PreprocessingData Preprocessing
data processing technologies, Data Processing and “Big Data”
data processing, data science vs., Data Processing and “Big Data”Data Processing and “Big Data”
data reduction, From Business Problems to Data Mining TasksFrom Business Problems to Data Mining Tasks, Data Reduction, Latent Information, and Movie RecommendationData Reduction, Latent Information, and Movie Recommendation
data requirements, Data Preparation
data science, Introduction: Data-Analytic ThinkingSummary, Data Science and Business StrategyA Firm’s Data Science Maturity, ConclusionFinal Words
and adding value to applications, Decision Analytic Thinking I: What Is a Good Model?
as craft, Superior Data Scientists
as strategic asset, Data and Data Science Capability as a Strategic AssetData and Data Science Capability as a Strategic Asset
baseline methods of, Summary
behavior predictions based on past actions, Example: Hurricane Frances
Big Data and, Data Processing and “Big Data”Data Processing and “Big Data”
case studies, examining, Examine Data Science Case Studies
classification modeling for issues in, Generalizing Beyond Classification
cloud labor and, Final Example: From Crowd-Sourcing to Cloud-SourcingFinal Example: From Crowd-Sourcing to Cloud-Sourcing
customer churn, predicting, Example: Predicting Customer Churn
data mining about individuals, Privacy, Ethics, and Mining Data About IndividualsPrivacy, Ethics, and Mining Data About Individuals
data mining and, The Ubiquity of Data Opportunities, Data Mining and Data Science, RevisitedData Mining and Data Science, Revisited
data processing vs., Data Processing and “Big Data”Data Processing and “Big Data”
data science engineers, Deployment
data-analytic thinking in, Data-Analytic ThinkingData-Analytic Thinking
data-driven business vs., Data Processing and “Big Data”
data-driven decision-making, Data Science, Engineering, and Data-Driven Decision MakingData Science, Engineering, and Data-Driven Decision Making
engineering, Data Science, Engineering, and Data-Driven Decision MakingData Science, Engineering, and Data-Driven Decision Making
engineering and, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
evolving uses for, From Big Data 1.0 to Big Data 2.0From Big Data 1.0 to Big Data 2.0
fitting problem to available data, Changing the Way We Think about Solutions to Business ProblemsChanging the Way We Think about Solutions to Business Problems
fundamental principles, The Ubiquity of Data Opportunities
history, Machine Learning and Data Mining
human interaction and, What Data Can’t Do: Humans in the Loop, RevisitedWhat Data Can’t Do: Humans in the Loop, Revisited
human knowledge and, What Data Can’t Do: Humans in the Loop, RevisitedWhat Data Can’t Do: Humans in the Loop, Revisited
Hurricane Frances example, Example: Hurricane Frances
learning path for, Superior Data Scientists
limits of, What Data Can’t Do: Humans in the Loop, RevisitedWhat Data Can’t Do: Humans in the Loop, Revisited
mining mobile device data example, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device DataApplying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
opportunities for, The Ubiquity of Data OpportunitiesThe Ubiquity of Data Opportunities
principles, Data Science, Engineering, and Data-Driven Decision Making, Business Problems and Data Science Solutions
privacy and ethics of, Privacy, Ethics, and Mining Data About IndividualsPrivacy, Ethics, and Mining Data About Individuals
processes, Data Science, Engineering, and Data-Driven Decision Making
software development vs., A Firm’s Data Science Maturity
structure, Machine Learning and Data Mining
techniques, Data Science, Engineering, and Data-Driven Decision Making
technology vs. theory of, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data ScientistChemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
understanding, The Ubiquity of Data Opportunities, Data Processing and “Big Data”
data science maturity, of firms, A Firm’s Data Science MaturityA Firm’s Data Science Maturity
data scientists
academic, Attracting and Nurturing Data Scientists and Their Teams
as scientific advisors, Attracting and Nurturing Data Scientists and Their Teams
attracting/nurturing, Attracting and Nurturing Data Scientists and Their TeamsAttracting and Nurturing Data Scientists and Their Teams
evaluating, Superior Data ScientistsSuperior Data Scientists
managing, Superior Data Science ManagementSuperior Data Science Management
Data Scientists, LLC, Attracting and Nurturing Data Scientists and Their Teams
data sources, Evaluation, Baseline Performance, and Implications for Investments in Data
data understanding, Data UnderstandingData Understanding
expected value decomposition and, From an Expected Value Decomposition to a Data Science SolutionFrom an Expected Value Decomposition to a Data Science Solution
expected value framework and, The Expected Value Framework: Structuring a More Complicated Business ProblemThe Expected Value Framework: Structuring a More Complicated Business Problem
data warehousing, Data Warehousing
data-analytic thinking, Data-Analytic ThinkingData-Analytic Thinking
and unbalanced classes, Problems with Unbalanced Classes
for business strategies, Thinking Data-Analytically, ReduxThinking Data-Analytically, Redux
data-driven business
data science vs., Data Processing and “Big Data”
understanding, Data Processing and “Big Data”
data-driven causal explanations, Data-Driven Causal Explanation and a Viral Marketing ExampleData-Driven Causal Explanation and a Viral Marketing Example
data-driven decision-making, Data Science, Engineering, and Data-Driven Decision MakingData Science, Engineering, and Data-Driven Decision Making
benefits, Data Science, Engineering, and Data-Driven Decision Making
discoveries, Data Science, Engineering, and Data-Driven Decision Making
repetition, Data Science, Engineering, and Data-Driven Decision Making
database queries, as analytic technique, Database QueryingDatabase Querying
database tables, Models, Induction, and Prediction
dataset entropy, Example: Attribute Selection with Information Gain
datasets, Models, Induction, and Prediction
analyzing, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
attributes of, Overfitting in Mathematical Functions
cross-validation and, From Holdout Evaluation to Cross-Validation
limited, From Holdout Evaluation to Cross-Validation
Davis, Miles, Example: Jazz Musicians, Example: Jazz Musicians
Deanston single malt scotch, Understanding the Results of Clustering
decision boundaries, Visualizing Segmentations, Classification via Mathematical Functions
decision lines, Visualizing Segmentations
decision nodes, Supervised Segmentation with Tree-Structured Models
decision stumps, Evaluation, Baseline Performance, and Implications for Investments in Data
decision surfaces, Visualizing Segmentations
decision trees, Supervised Segmentation with Tree-Structured Models
decision-making, automatic, Data Science, Engineering, and Data-Driven Decision Making
deduction, induction vs., Models, Induction, and Prediction
Dell, Data preparation, Achieving Competitive Advantage with Data Science
demand, local, Example: Hurricane Frances
dendrograms, Hierarchical Clustering, Hierarchical Clustering
dependent variables, Models, Induction, and Prediction
descriptive attributes, Data Mining and Data Science, Revisited
descriptive modeling, Models, Induction, and Prediction
Dictionary of Distances (Deza & Deza), * Other Distance Functions
differential descriptions, * Using Supervised Learning to Generate Cluster Descriptions
Digital 100 companies, Data-Analytic Thinking
Dillman, Linda, Data Science, Engineering, and Data-Driven Decision Making
dimensionality, of nearest-neighbor reasoning, Dimensionality and domain knowledgeDimensionality and domain knowledge
directed marketing example, Targeting the Best Prospects for a Charity MailingA Brief Digression on Selection Bias
discoveries, Data Science, Engineering, and Data-Driven Decision Making
discrete (binary) classifiers, ROC Graphs and Curves
discrete classifiers, ROC Graphs and Curves
discretized numeric variables, Selecting Informative Attributes
discriminants, linear, Linear Discriminant Functions
discriminative modeling methods, generative vs., Summary
disorder, measuring, Selecting Informative Attributes
display advertising, Example: Targeting Online Consumers With Advertisements
distance functions, for nearest-neighbor reasoning, * Other Distance Functions* Other Distance Functions
distance, measuring, Similarity and Distance
distribution
Gaussian, Regression via Mathematical Functions
Normal, Regression via Mathematical Functions
distribution of properties, Selecting Informative Attributes
Doctor Who (television show), Example: Evidence Lifts from Facebook “Likes”
document (term), Representation
domain knowledge
data mining processes and, Dimensionality and domain knowledge
nearest-neighbor reasoning and, Dimensionality and domain knowledgeDimensionality and domain knowledge
domain knowledge validation, Associations Among Facebook Likes
domains, in association discovery, Associations Among Facebook Likes
Dotcom Boom, Results, Formidable Historical Advantage
double counting, Costs and benefits
draws, statistical, * Logistic Regression: Some Technical Details

E

edit distance, * Other Distance Functions, * Other Distance Functions
Einstein, Albert, Conclusion
Elder Research, Attracting and Nurturing Data Scientists and Their Teams
Ellington, Duke, Example: Jazz Musicians, Example: Jazz Musicians
email, Why Text Is Important
engineering, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist, Business Understanding
engineering problems, business problems vs., Other Data Science Tasks and Techniques
ensemble method, Bias, Variance, and Ensemble MethodsBias, Variance, and Ensemble Methods
entropy, Selecting Informative AttributesSelecting Informative Attributes, Selecting Informative Attributes, Example: Attribute Selection with Information Gain, Summary
and Inverse Document Frequency, * The Relationship of IDF to Entropy
change in, Selecting Informative Attributes
equation for, Selecting Informative Attributes
graphs, Example: Attribute Selection with Information Gain
equations
cosine distance, * Other Distance Functions
entropy, Selecting Informative Attributes
Euclidean distance, Similarity and Distance
general linear model, Linear Discriminant Functions
information gain (IG), Selecting Informative Attributes
Jaccard distance, * Other Distance Functions
L2 norm, * Other Distance Functions
log-odds linear function, * Logistic Regression: Some Technical Details
logistic function, * Logistic Regression: Some Technical Details
majority scoring function, * Combining Functions: Calculating Scores from Neighbors
majority vote classification, * Combining Functions: Calculating Scores from Neighbors
Manhattan distance, * Other Distance Functions
similarity-moderated classification, * Combining Functions: Calculating Scores from Neighbors
similarity-moderated regression, * Combining Functions: Calculating Scores from Neighbors
similarity-moderated scoring, * Combining Functions: Calculating Scores from Neighbors
error costs, ROC Graphs and Curves
error rates, Plain Accuracy and Its Problems, Error rates
errors
absolute, Regression via Mathematical Functions
computing, Regression via Mathematical Functions
false negative vs. false positive, Evaluating Classifiers
squared, Regression via Mathematical Functions
estimating generalization performance, From Holdout Evaluation to Cross-Validation
estimation, frequency based, Probability Estimation
ethics of data mining, Privacy, Ethics, and Mining Data About IndividualsPrivacy, Ethics, and Mining Data About Individuals
Euclid, Similarity and Distance
Euclidean distance, Similarity and Distance
evaluating models, Decision Analytic Thinking I: What Is a Good Model?Summary
baseline performance and, Evaluation, Baseline Performance, and Implications for Investments in DataEvaluation, Baseline Performance, and Implications for Investments in Data
classification accuracy, Plain Accuracy and Its ProblemsGeneralizing Beyond Classification
confusion matrix, The Confusion MatrixThe Confusion Matrix
expected values, A Key Analytical Framework: Expected ValueCosts and benefits
generalization methods for, Generalizing Beyond ClassificationGeneralizing Beyond Classification
procedure, Flaws in the Big Red Proposal
evaluating training data, Holdout Data and Fitting Graphs
evaluation
in vivo, Evaluation
purpose, Evaluation
evaluation framework, Evaluation
events
calculating probability of, Combining Evidence ProbabilisticallyCombining Evidence Probabilistically
independent, Joint Probability and IndependenceJoint Probability and Independence
evidence
computing probability from, Bayes’ Rule, Bayes’ Rule
determining strength of, Example: Targeting Online Consumers With Advertisements
likelihood of, Applying Bayes’ Rule to Data Science
strongly dependent, Advantages and Disadvantages of Naive Bayes
evidence lift
Facebook “Likes” example, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
modeling, with Naive Bayes, A Model of Evidence “Lift”A Model of Evidence “Lift”
eWatch/eBracelet example, Co-occurrences and Associations: Finding Items That Go TogetherCo-occurrences and Associations: Finding Items That Go Together
examining clusters, Understanding the Results of Clustering
examples, Models, Induction, and Prediction
analytic engineering, Targeting the Best Prospects for a Charity MailingFrom an Expected Value Decomposition to a Data Science Solution
associations, Associations Among Facebook LikesAssociations Among Facebook Likes
beer and lottery association, Example: Beer and Lottery TicketsExample: Beer and Lottery Tickets
biases in data, What Data Can’t Do: Humans in the Loop, Revisited
Big Red proposal, Example Data Mining ProposalFlaws in the Big Red Proposal
breast cancer, Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
business news stories, Example: Clustering Business News StoriesThe news story clusters
call center metrics, Profiling: Finding Typical BehaviorProfiling: Finding Typical Behavior
cellular churn, Problems with Unbalanced Classes, Problems with Unequal Costs and Benefits
centroid-based clustering, Nearest Neighbors Revisited: Clustering Around CentroidsNearest Neighbors Revisited: Clustering Around Centroids
cloud labor, Final Example: From Crowd-Sourcing to Cloud-SourcingFinal Example: From Crowd-Sourcing to Cloud-Sourcing
clustering, Clustering* Using Supervised Learning to Generate Cluster Descriptions
consumer movie-viewing preferences, Data Reduction, Latent Information, and Movie Recommendation
cooccurrence/association, Co-occurrences and Associations: Finding Items That Go TogetherCo-occurrences and Associations: Finding Items That Go Together, Example: Beer and Lottery TicketsExample: Beer and Lottery Tickets
cross-validation, From Holdout Evaluation to Cross-ValidationFrom Holdout Evaluation to Cross-Validation
customer churn, Example: Predicting Customer Churn, Example: Addressing the Churn Problem with Tree InductionExample: Addressing the Churn Problem with Tree Induction, From Holdout Evaluation to Cross-ValidationFrom Holdout Evaluation to Cross-Validation, A Firm’s Data Science Maturity
data mining proposal evaluation, Example Data Mining ProposalFlaws in the Big Red Proposal
data-driven causal explanations, Data-Driven Causal Explanation and a Viral Marketing ExampleData-Driven Causal Explanation and a Viral Marketing Example
detecting credit-card fraud, Profiling: Finding Typical Behavior
directed marketing, Targeting the Best Prospects for a Charity MailingA Brief Digression on Selection Bias
evaluating proposals, Scenario and ProposalFlaws in the GGC Proposal
evidence lift, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
eWatch/eBracelet, Co-occurrences and Associations: Finding Items That Go TogetherCo-occurrences and Associations: Finding Items That Go Together
Facebook “Likes”, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”, Associations Among Facebook LikesAssociations Among Facebook Likes
Green Giant Consulting, Scenario and ProposalFlaws in the GGC Proposal
Hurricane Frances, Example: Hurricane Frances
information gain, attribute selection with, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain
iris overfitting, An Example of Mining a Linear Discriminant from Data, Example: Overfitting Linear FunctionsExample: Overfitting Linear Functions
Jazz musicians, Example: Jazz MusiciansExample: Jazz Musicians
junk email classifier, Advantages and Disadvantages of Naive Bayes
market basket analysis, Associations Among Facebook LikesAssociations Among Facebook Likes
mining linear discriminants from data, An Example of Mining a Linear Discriminant from DataSummary
mining mobile device data, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device DataApplying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
mining news stories, Example: Mining News Stories to Predict Stock Price MovementResults
mushroom, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain
Naive Bayes, Evidence in Action: Targeting Consumers with Ads
nearest-neighbor reasoning, Example: Whiskey AnalyticsExample: Whiskey Analytics
overfitting linear functions, Example: Overfitting Linear FunctionsExample: Overfitting Linear Functions
overfitting, performance degradation and, * Example: Why Is Overfitting Bad?* Example: Why Is Overfitting Bad?
PEC, Example: Targeting Online Consumers With AdvertisementsExample: Targeting Online Consumers With Advertisements
profiling, Profiling: Finding Typical Behavior, Profiling: Finding Typical BehaviorProfiling: Finding Typical Behavior
stock price movement, Example: Mining News Stories to Predict Stock Price MovementResults
supervised learning to generate cluster descriptions, * Using Supervised Learning to Generate Cluster Descriptions* Using Supervised Learning to Generate Cluster Descriptions
targeted ad, Example: Targeting Online Consumers With AdvertisementsExample: Targeting Online Consumers With Advertisements, Evidence in Action: Targeting Consumers with Ads, Privacy, Ethics, and Mining Data About Individuals
text representation tasks, Example: Jazz MusiciansExample: Jazz Musicians, Example: Mining News Stories to Predict Stock Price MovementResults
tree induction vs. logistic regression, Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
viral marketing, Data-Driven Causal Explanation and a Viral Marketing ExampleData-Driven Causal Explanation and a Viral Marketing Example
whiskey analytics, Example: Whiskey AnalyticsExample: Whiskey Analytics
whiskey clustering, Example: Whiskey Analytics RevisitedHierarchical Clustering
Whiz-bang widget, Example Data Mining ProposalFlaws in the Big Red Proposal
wireless fraud, What Data Can’t Do: Humans in the Loop, Revisited
exhaustive classes, Conditional Independence and Naive Bayes
expected profit, Profit CurvesProfit Curves
and relative levels of costs and benefits, ROC Graphs and Curves
calculation of, Using Expected Value to Frame Classifier Evaluation
for classifiers, Problems with Unequal Costs and Benefits
uncertainty of, ROC Graphs and Curves
expected value
calculation of, * The Relationship of IDF to Entropy
general form, A Key Analytical Framework: Expected Value
in aggregate, Using Expected Value to Frame Classifier Evaluation
negative, Ranking Instead of Classifying
expected value framework, The Fundamental Concepts of Data Science
providing structure for business problem/solutions with, The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution PiecesThe Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces
structuring complicated business problems with, The Expected Value Framework: Structuring a More Complicated Business ProblemThe Expected Value Framework: Structuring a More Complicated Business Problem
expected values, A Key Analytical Framework: Expected ValueCosts and benefits
cost-benefit matrix and, Costs and benefitsCosts and benefits
decomposition of, moving to data science solution with, From an Expected Value Decomposition to a Data Science SolutionFrom an Expected Value Decomposition to a Data Science Solution
error rates and, Error rates
framing classifier evaluation with, Using Expected Value to Frame Classifier EvaluationUsing Expected Value to Frame Classifier Evaluation
framing classifier use with, Using Expected Value to Frame Classifier UseUsing Expected Value to Frame Classifier Use
explanatory variables, Models, Induction, and Prediction
exploratory data mining vs. defined problems, The Fundamental Concepts of Data Science
extract patterns, Data Mining and Data Science, Revisited

F

Facebook, Data and Data Science Capability as a Strategic Asset, Why Text Is Important, Thinking Data-Analytically, Redux
online consumer targeting by, Example: Targeting Online Consumers With Advertisements
“Likes“ example, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
Fairbanks, Richard, Data and Data Science Capability as a Strategic Asset
false alarm rate, ROC Graphs and Curves, ROC Graphs and Curves
false negative rate, Costs and benefits
false negatives, Evaluating Classifiers, The Confusion Matrix, Problems with Unequal Costs and Benefits, Costs and benefits
false positive rate, Costs and benefits, ROC Graphs and CurvesROC Graphs and Curves
false positives, Evaluating Classifiers, The Confusion Matrix, Problems with Unequal Costs and Benefits, Costs and benefits
feature vectors, Models, Induction, and Prediction
features, Models, Induction, and Prediction, Models, Induction, and Prediction
Federer, Roger, Example: Evidence Lifts from Facebook “Likes”
Fettercairn single malt scotch, Understanding the Results of Clustering
Fight Club, Example: Evidence Lifts from Facebook “Likes”
financial markets, The Task
firmographic data, From Business Problems to Data Mining Tasks
first-layer models, Nonlinear Functions, Support Vector Machines, and Neural Networks
fitting, * Logistic Regression: Some Technical Details, Holdout Data and Fitting GraphsHoldout Data and Fitting Graphs, From Holdout Evaluation to Cross-Validation, Learning Curves, Summary, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
folds, From Holdout Evaluation to Cross-Validation, The Churn Dataset Revisited
fraud detection, Data Understanding, ROC Graphs and Curves, Thinking Data-Analytically, Redux
free Web services, Example: Targeting Online Consumers With Advertisements
frequency, Measuring Sparseness: Inverse Document Frequency
frequency-based estimates, Probability Estimation, Probability Estimation
functions
adding variables to, Example: Overfitting Linear Functions
classification, Linear Discriminant Functions
combining, Nearest Neighbors for Predictive Modeling
complex, Overfitting in Mathematical Functions, Example: Overfitting Linear Functions
kernel, Nonlinear Functions, Support Vector Machines, and Neural Networks
linkage, Hierarchical Clustering
log-odds, * Logistic Regression: Some Technical Details
logistic, * Logistic Regression: Some Technical Details
loss, Regression via Mathematical FunctionsRegression via Mathematical Functions
objective, Summary
fundamental ideas, Supervised Segmentation with Tree-Structured Models
fundamental principles, The Ubiquity of Data Opportunities

G

Gaussian distribution, Regression via Mathematical Functions, Profiling: Finding Typical Behavior
Gaussian Mixture Model (GMM), Profiling: Finding Typical Behavior
GE Capital, Stepping Back: Solving a Business Problem Versus Data Exploration
generalization, Overfitting in Tree Induction, The Fundamental Concepts of Data Science
mean of, From Holdout Evaluation to Cross-Validation, Summary
overfitting and, GeneralizationGeneralization
variance of, From Holdout Evaluation to Cross-Validation, Summary
generalization performance, Holdout Data and Fitting Graphs, From Holdout Evaluation to Cross-Validation
generalizations, incorrect, * Example: Why Is Overfitting Bad?
generative modeling methods, discriminative vs., Summary
generative questions, Applying Bayes’ Rule to Data Science
geometric interpretation, nearest-neighbor reasoning and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
Gillespie, Dizzy, Example: Jazz Musicians
Gini Coefficient, The Area Under the ROC Curve (AUC)
Glen Albyn single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Glen Grant single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Glen Mhor single malt scotch, Understanding the Results of Clustering
Glen Spey single malt scotch, Understanding the Results of Clustering
Glenfiddich single malt scotch, Understanding the Results of Clustering
Glenglassaugh single malt whiskey, Hierarchical Clustering
Glengoyne single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Glenlossie single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Glentauchers single malt scotch, Understanding the Results of Clustering
Glenugie single malt scotch, Understanding the Results of Clustering
goals, Optimizing an Objective Function
Goethe, Johann Wolfgang von, Introduction: Data-Analytic Thinking
Goodman, Benny, Example: Jazz Musicians
Google, Why Text Is Important, Representation, Attracting and Nurturing Data Scientists and Their Teams
Prediction API, Thinking Data-Analytically, Redux
search advertising on, Example: Targeting Online Consumers With Advertisements
Google Finance, The Data
Google Scholar, Final Example: From Crowd-Sourcing to Cloud-Sourcing
Graepel, Thore, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
graphical user interface (GUI), Database Querying
graphs
entropy, Example: Attribute Selection with Information Gain
fitting, From Holdout Evaluation to Cross-Validation, Summary
Green Giant Consulting example, Scenario and ProposalFlaws in the GGC Proposal
GUI, Database Querying

H

Haimowitz, Ira, Stepping Back: Solving a Business Problem Versus Data Exploration
Harrahs casinos, Data Science, Engineering, and Data-Driven Decision Making, Data and Data Science Capability as a Strategic Asset
hashing methods, Computational efficiency
heterogeneous attributes, Dimensionality and domain knowledge
Hewlett-Packard, Similarity, Neighbors, and Clusters, Data preparation, Named Entity Extraction
hierarchical clustering, Hierarchical ClusteringHierarchical Clustering
Hilton, Perez, The Data
hinge loss, Support Vector Machines, Briefly, Regression via Mathematical Functions
history, Machine Learning and Data Mining
hit rate, ROC Graphs and Curves, Cumulative Response and Lift Curves
holdout data, Holdout Data and Fitting Graphs
creating, Holdout Data and Fitting Graphs
overfitting and, Holdout Data and Fitting GraphsHoldout Data and Fitting Graphs
holdout evaluations, of overfitting, From Holdout Evaluation to Cross-Validation
holdout testing, From Holdout Evaluation to Cross-Validation
homogeneous regions, Classification via Mathematical Functions
homographs, Why Text Is Difficult
How I Met Your Mother (television show), Example: Evidence Lifts from Facebook “Likes”
Howls Moving Castle, Example: Evidence Lifts from Facebook “Likes”
human interaction and data science, What Data Can’t Do: Humans in the Loop, RevisitedWhat Data Can’t Do: Humans in the Loop, Revisited
Hurricane Frances example, Example: Hurricane Frances
hyperplanes, Visualizing Segmentations, Linear Discriminant Functions
hypotheses, computing probability of, Bayes’ Rule
hypothesis generation, Statistics
hypothesis tests, Avoiding Overfitting with Tree Induction

I

IBM, Similarity, Neighbors, and Clusters, Understanding the Results of Clustering, Attracting and Nurturing Data Scientists and Their Teams, Attracting and Nurturing Data Scientists and Their Teams
IEEE International Conference on Data Mining, Is There More to Data Science?
immature data firms, A Firm’s Data Science Maturity
impurity, Selecting Informative Attributes
in vivo evaluation, Evaluation
in-sample accuracy, Holdout Data and Fitting Graphs
Inception (film), Example: Evidence Lifts from Facebook “Likes”
incorrect generalizations, * Example: Why Is Overfitting Bad?
incremental learning, Advantages and Disadvantages of Naive Bayes
independence
and evidence lift, A Model of Evidence “Lift”
in probability, Joint Probability and IndependenceJoint Probability and Independence
unconditional vs. conditional, Conditional Independence and Naive Bayes
independent events, probability of, Joint Probability and IndependenceJoint Probability and Independence
independent variables, Models, Induction, and Prediction
indices, Nearest Neighbors Revisited: Clustering Around Centroids
induction, deduction vs., Models, Induction, and Prediction
inferring missing values, Data Preparation
influence, From Business Problems to Data Mining Tasks
information
judging, Supervised Segmentation
measuring, Selecting Informative Attributes
information gain (IG), Selecting Informative Attributes, Summary, Results
applying, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain
attribute selection with, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain
defining, Selecting Informative Attributes
equation for, Selecting Informative Attributes
using, Example: Attribute Selection with Information Gain
Information Retrieval (IR), Representation
information triage, Results
informative attributes, finding, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation, Supervised Segmentation with Tree-Structured Models
informative meaning, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
informative variables, selecting, Supervised Segmentation
instance scoring, Decision Analytic Thinking I: What Is a Good Model?
instances, Models, Induction, and Prediction
clumping, Example: Overfitting Linear Functions
comparing, with evidence lift, A Model of Evidence “Lift”
for targeting online consumers, Example: Targeting Online Consumers With Advertisements
intangible collateral assets, Unique Intangible Collateral Assets
intellectual property, Unique Intellectual Property
intelligence test score, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
intelligent methods, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
intelligibility, * Using Supervised Learning to Generate Cluster Descriptions
Internet, Why Text Is Important
inverse document frequency (IDF), Measuring Sparseness: Inverse Document FrequencyMeasuring Sparseness: Inverse Document Frequency
and entropy, * The Relationship of IDF to EntropySummary
in TFIDF, Combining Them: TFIDF
term frequency, combining with, Combining Them: TFIDF
investments in data, evaluating, Evaluation, Baseline Performance, and Implications for Investments in DataEvaluation, Baseline Performance, and Implications for Investments in Data
iPhone, The news story clusters, From an Expected Value Decomposition to a Data Science Solution
IQ, evidence lifts for, Example: Evidence Lifts from Facebook “Likes”Example: Evidence Lifts from Facebook “Likes”
iris example
for overfitting linear functions, Example: Overfitting Linear FunctionsExample: Overfitting Linear Functions
mining linear discriminants from data, An Example of Mining a Linear Discriminant from DataSummary
iTunes, From Business Problems to Data Mining Tasks, The news story clusters

L

L2 norm (equation), * Other Distance Functions
labeled data, Models, Induction, and Prediction
labels, Supervised Versus Unsupervised Methods
Ladyburn single malt scotch, Understanding the Results of Clustering
Laphroaig single malt scotch, Understanding the Results of Clustering
Lapointe, François-Joseph, Example: Whiskey Analytics, Hierarchical Clustering, Understanding the Results of Clustering
Latent Dirichlet Allocation, Topic Models
latent information, Data Reduction, Latent Information, and Movie RecommendationData Reduction, Latent Information, and Movie Recommendation
consumer movie-viewing preferences example, Data Reduction, Latent Information, and Movie Recommendation
weighted scoring, Data Reduction, Latent Information, and Movie Recommendation
latent information model, Topic Models
Latent Semantic Indexing, Topic Models
learning
incremental, Advantages and Disadvantages of Naive Bayes
machine, Machine Learning and Data MiningMachine Learning and Data Mining
parameter, Fitting a Model to Data
supervised, Supervised Versus Unsupervised Methods, * Using Supervised Learning to Generate Cluster Descriptions* Using Supervised Learning to Generate Cluster Descriptions
unsupervised, Supervised Versus Unsupervised Methods
learning curves, From Holdout Evaluation to Cross-Validation, Summary
analytical use, Learning Curves
fitting graphs and, Learning Curves
logistic regression, Learning Curves
overfitting vs., Learning CurvesLearning Curves
tree induction, Learning Curves
least squares regression, Regression via Mathematical Functions, Regression via Mathematical Functions
Legendre, Pierre, Example: Whiskey Analytics, Hierarchical Clustering, Understanding the Results of Clustering
Levenshtein distance, * Other Distance Functions
leverage, Measuring Surprise: Lift and LeverageMeasuring Surprise: Lift and Leverage
Lie to Me (television show), Example: Evidence Lifts from Facebook “Likes”
lift, A Model of Evidence “Lift”, Measuring Surprise: Lift and LeverageMeasuring Surprise: Lift and Leverage, The Fundamental Concepts of Data Science
lift curves, Cumulative Response and Lift CurvesCumulative Response and Lift Curves, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
likelihood, computing, * Logistic Regression: Some Technical Details
likely responders, Using Expected Value to Frame Classifier Use
Likes, Facebook, Example: Targeting Online Consumers With Advertisements
limited datasets, From Holdout Evaluation to Cross-Validation
linear boundaries, Example: Overfitting Linear Functions
linear classifiers, Classification via Mathematical Functions, Classification via Mathematical Functions
linear discriminant functions and, Linear Discriminant FunctionsLinear Discriminant Functions
objective functions, optimizing, Optimizing an Objective Function
parametric modeling and, Classification via Mathematical Functions
support vector machines, Support Vector Machines, BrieflySupport Vector Machines, Briefly
linear discriminants, Linear Discriminant Functions
functions for, Linear Discriminant FunctionsLinear Discriminant Functions
mining, from data, An Example of Mining a Linear Discriminant from DataSupport Vector Machines, Briefly
scoring/ranking instances of, Linear Discriminant Functions for Scoring and Ranking Instances
support vector machines and, Support Vector Machines, BrieflySupport Vector Machines, Briefly
linear estimation, logistic regression and, Class Probability Estimation and Logistic “Regression”
linear models, Fitting a Model to Data
linear regression, standard, Regression via Mathematical Functions
linguistic structure, Why Text Is Difficult
link prediction, From Business Problems to Data Mining Tasks, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
linkage functions, Hierarchical Clustering
Linkwood single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
local demand, Example: Hurricane Frances
location visitation behavior of mobile devices, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
log-normal distribution, Profiling: Finding Typical Behavior
log-odds, Class Probability Estimation and Logistic “Regression”
log-odds linear function, * Logistic Regression: Some Technical Details
logistic function, * Logistic Regression: Some Technical Details
logistic regression, Optimizing an Objective Function, Class Probability Estimation and Logistic “Regression”Example: Logistic Regression versus Tree Induction, Example: Overfitting Linear Functions
breast cancer example, Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
classification trees and, The Churn Dataset Revisited
in KDD Cup churn problem, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
learning curves for, Learning Curves
linear estimation and, Class Probability Estimation and Logistic “Regression”
mathematics of, * Logistic Regression: Some Technical Details* Logistic Regression: Some Technical Details
tree induction vs., Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
understanding, Class Probability Estimation and Logistic “Regression”
Lord Of The Rings, Example: Evidence Lifts from Facebook “Likes”
loss functions, Regression via Mathematical FunctionsRegression via Mathematical Functions
Lost (television series), Example: Evidence Lifts from Facebook “Likes”

M

machine learning
analytic techniques for, Machine Learning and Data MiningMachine Learning and Data Mining
methods, Machine Learning and Data Mining
Magnum Opus, Associations Among Facebook Likes
majority classifiers, Evaluation, Baseline Performance, and Implications for Investments in Data
majority scoring function (equation), * Combining Functions: Calculating Scores from Neighbors
majority vote classification (equation), * Combining Functions: Calculating Scores from Neighbors
majority voting, How Many Neighbors and How Much Influence?
Manhattan distance (equation), * Other Distance Functions
Mann-Whitney-Wilcoxon measure, The Area Under the ROC Curve (AUC)
margin-maximizing boundary, Support Vector Machines, Briefly
margins, Support Vector Machines, Briefly
market basket analysis, Associations Among Facebook LikesAssociations Among Facebook Likes
Massachusetts Institute of Technology (MIT), Data Science, Engineering, and Data-Driven Decision Making, Privacy, Ethics, and Mining Data About Individuals
mathematical functions, overfitting in, Overfitting in Mathematical FunctionsOverfitting in Mathematical Functions
matrix factorization, Data Reduction, Latent Information, and Movie Recommendation
maximizing objective functions, * Avoiding Overfitting for Parameter Optimization
maximizing the margin, Support Vector Machines, Briefly
maximum likelihood model, Profiling: Finding Typical Behavior
McCarthy, Cormac, Term Frequency
McKinsey and Company, Data-Analytic Thinking
mean generalization, From Holdout Evaluation to Cross-Validation, Summary
Mechanical Turk, Final Example: From Crowd-Sourcing to Cloud-Sourcing
Medicare fraud, detecting, Data Understanding
Michael Jackson’s Malt Whisky Companion (Jackson), Example: Whiskey Analytics
micro-outsourcing, Final Example: From Crowd-Sourcing to Cloud-Sourcing
Microsoft, Term Frequency, Attracting and Nurturing Data Scientists and Their Teams
Mingus, Charles, Example: Jazz Musicians
missing values, Data Preparation
mobile devices
location of, finding, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
mining data from, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device DataApplying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
model accuracy, Holdout Data and Fitting Graphs
model building, test data and, A General Method for Avoiding Overfitting
model evaluation and classification, Problems with Unbalanced Classes
model induction, Models, Induction, and Prediction
model intelligibility, Intelligibility
model performance, visualizing, Visualizing Model PerformanceExample: Performance Analytics for Churn Modeling
area under ROC curves, The Area Under the ROC Curve (AUC)
cumulative response curves, Cumulative Response and Lift CurvesCumulative Response and Lift Curves
lift curves, Cumulative Response and Lift CurvesCumulative Response and Lift Curves
profit curves, Profit CurvesProfit Curves
ranking vs. classifying cases, Visualizing Model PerformanceExample: Performance Analytics for Churn Modeling
model types, Models, Induction, and Prediction
Black-Scholes option pricing, Models, Induction, and Prediction
descriptive, Models, Induction, and Prediction
predictive, Models, Induction, and Prediction
modelers, Overfitting in Mathematical Functions
modeling algorithms, A General Method for Avoiding Overfitting, Flaws in the Big Red Proposal
modeling labs, From Holdout Evaluation to Cross-Validation
models
comprehensibility, Evaluation
creating, Models, Induction, and Prediction
first-layer, Nonlinear Functions, Support Vector Machines, and Neural Networks
fitting to data, Fitting a Model to Data, The Fundamental Concepts of Data Science
linear, Fitting a Model to Data
parameterizing, Fitting a Model to Data
parameters, Fitting a Model to Data
problems, Probability Estimation
producing, From Holdout Evaluation to Cross-Validation
second-layer, Nonlinear Functions, Support Vector Machines, and Neural Networks
structure, Fitting a Model to Data
table, Generalization
understanding types of, Visualizing Segmentations
worsening, * Example: Why Is Overfitting Bad?
modifiers (of words), Results
Monk, Thelonius, Example: Jazz Musicians
Moonstruck (film), Data Reduction, Latent Information, and Movie Recommendation
Morris, Nigel, Data and Data Science Capability as a Strategic Asset
multiple comparisons, * Avoiding Overfitting for Parameter Optimization* Avoiding Overfitting for Parameter Optimization
multisets, Bag of Words
mushroom example, Example: Attribute Selection with Information GainExample: Attribute Selection with Information Gain
mutually exclusive classes, Conditional Independence and Naive Bayes

N

n-gram sequences, N-gram Sequences
Naive Bayes, Conditional Independence and Naive BayesConditional Independence and Naive Bayes
advantages/disadvantages of, Advantages and Disadvantages of Naive BayesAdvantages and Disadvantages of Naive Bayes
conditional independence and, Conditional Independence and Naive BayesA Model of Evidence “Lift”
in KDD Cup churn problem, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
modeling evidence lift with, A Model of Evidence “Lift”A Model of Evidence “Lift”
performance of, Advantages and Disadvantages of Naive Bayes
targeted ad example of, Evidence in Action: Targeting Consumers with Ads
Naive-Naive Bayes, A Model of Evidence “Lift”A Model of Evidence “Lift”
named entity extraction, Named Entity ExtractionNamed Entity Extraction
NASDAQ, The Data
National Public Radio (NPR), Example: Evidence Lifts from Facebook “Likes”
nearest neighbors
centroids and, Nearest Neighbors Revisited: Clustering Around CentroidsNearest Neighbors Revisited: Clustering Around Centroids
clustering and, Nearest Neighbors Revisited: Clustering Around CentroidsNearest Neighbors Revisited: Clustering Around Centroids
ensemble method as, Bias, Variance, and Ensemble Methods
nearest-neighbor methods
benefits of, Computational efficiency
in KDD Cup churn problem, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
nearest-neighbor reasoning, Nearest-Neighbor Reasoning* Combining Functions: Calculating Scores from Neighbors
calculating scores from neighbors, * Combining Functions: Calculating Scores from Neighbors* Combining Functions: Calculating Scores from Neighbors
classification, ClassificationClassification
combining functions, * Combining Functions: Calculating Scores from Neighbors* Combining Functions: Calculating Scores from Neighbors
complexity control and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
computational efficiency of, Computational efficiency
determining sample size, How Many Neighbors and How Much Influence?
dimensionality of, Dimensionality and domain knowledgeDimensionality and domain knowledge
distance functions for, * Other Distance Functions* Other Distance Functions
domain knowledge and, Dimensionality and domain knowledgeDimensionality and domain knowledge
for predictive modeling, Nearest Neighbors for Predictive Modeling
geometric interpretation and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
heterogeneous attributes and, Heterogeneous Attributes
influence of neighbors, determining, How Many Neighbors and How Much Influence?How Many Neighbors and How Much Influence?
intelligibility of, IntelligibilityIntelligibility
overfitting and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
performance of, Computational efficiency
probability estimation, Probability Estimation
regression, Regression
whiskey analytics, Example: Whiskey AnalyticsExample: Whiskey Analytics
negative profit, Profit Curves
negatives, Evaluating Classifiers
neighbor retrieval, speeding up, Computational efficiency
neighbors
classification and, Classification
retrieving, Regression
using, How Many Neighbors and How Much Influence?
nested cross-validation, A General Method for Avoiding Overfitting
Netflix, Data Science, Engineering, and Data-Driven Decision Making, Similarity, Neighbors, and Clusters, Data Reduction, Latent Information, and Movie Recommendation
Netflix Challenge, Data Reduction, Latent Information, and Movie RecommendationData Reduction, Latent Information, and Movie Recommendation, Superior Data Scientists
neural networks, Nonlinear Functions, Support Vector Machines, and Neural Networks, Nonlinear Functions, Support Vector Machines, and Neural Networks
parametric modeling and, Nonlinear Functions, Support Vector Machines, and Neural NetworksNonlinear Functions, Support Vector Machines, and Neural Networks
using, Nonlinear Functions, Support Vector Machines, and Neural Networks
New York Stock Exchange, The Data
New York University (NYU), Data Processing and “Big Data”
Nissenbaum, Helen, Privacy, Ethics, and Mining Data About Individuals
non-linear support vector machines, Support Vector Machines, Briefly, Nonlinear Functions, Support Vector Machines, and Neural Networks
Normal distribution, Regression via Mathematical Functions, Profiling: Finding Typical Behavior
normalization, Term Frequency
North Port single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
not likely responders, Using Expected Value to Frame Classifier Use
not-spam (target class), Example: Targeting Online Consumers With Advertisements
numbers, Term Frequency
numeric variables, Selecting Informative Attributes
numerical predictions, Supervised Versus Unsupervised Methods

O

Oakland Raiders, Named Entity Extraction
objective functions, Summary
advantages, Regression via Mathematical Functions
creating, Optimizing an Objective Function
drawbacks, Regression via Mathematical Functions
maximizing, * Avoiding Overfitting for Parameter Optimization
optimizing, Optimizing an Objective Function
objectives, Optimizing an Objective Function
odds, Class Probability Estimation and Logistic “Regression”, Class Probability Estimation and Logistic “Regression”
oDesk, Final Example: From Crowd-Sourcing to Cloud-Sourcing
On the Road (Kerouac), Term Frequency
On-line Analytical Processing (OLAP), Database Querying
on-line processing, Database Querying
One Manga, Example: Evidence Lifts from Facebook “Likes”
Orange (French Telecom company), Example: Performance Analytics for Churn Modeling
outliers, Hierarchical Clustering
over the wall transfers, Deployment
overfitting, Data Mining and Data Science, Revisited, Probability Estimation, Overfitting and Its Avoidance* Avoiding Overfitting for Parameter Optimization, The Fundamental Concepts of Data Science
and tree induction, Overfitting in Tree InductionOverfitting in Tree Induction, Avoiding Overfitting with Tree Induction
assessing, Overfitting
avoiding, Overfitting, Overfitting in Mathematical Functions, Overfitting Avoidance and Complexity Control* Avoiding Overfitting for Parameter Optimization
complexity control, Overfitting Avoidance and Complexity Control* Avoiding Overfitting for Parameter Optimization
cross-validation example, From Holdout Evaluation to Cross-ValidationFrom Holdout Evaluation to Cross-Validation
ensemble method and, Bias, Variance, and Ensemble Methods
fitting graphs and, Holdout Data and Fitting GraphsHoldout Data and Fitting Graphs
general methodology for avoiding, A General Method for Avoiding OverfittingA General Method for Avoiding Overfitting
generalization and, GeneralizationGeneralization
holdout data and, Holdout Data and Fitting GraphsHoldout Data and Fitting Graphs
holdout evaluations of, From Holdout Evaluation to Cross-Validation
in mathematical functions, Overfitting in Mathematical FunctionsOverfitting in Mathematical Functions
learning curves vs., Learning CurvesLearning Curves
linear functions, Example: Overfitting Linear FunctionsExample: Overfitting Linear Functions
nearest-neighbor reasoning and, Geometric Interpretation, Overfitting, and Complexity ControlGeometric Interpretation, Overfitting, and Complexity Control
parameter optimization and, * Avoiding Overfitting for Parameter Optimization* Avoiding Overfitting for Parameter Optimization
performance degradation and, * Example: Why Is Overfitting Bad?* Example: Why Is Overfitting Bad?
techniques for avoiding, From Holdout Evaluation to Cross-Validation

P

parabola, Nonlinear Functions, Support Vector Machines, and Neural Networks, Example: Overfitting Linear Functions
parameter learning, Fitting a Model to Data
parameterized models, Fitting a Model to Data
parameterized numeric functions, Profiling: Finding Typical Behavior
parametric modeling, Fitting a Model to Data
class probability estimation, Class Probability Estimation and Logistic “Regression”Example: Logistic Regression versus Tree Induction
linear classifiers, Classification via Mathematical Functions
linear regression and, Regression via Mathematical FunctionsRegression via Mathematical Functions
logistic regression, Class Probability Estimation and Logistic “Regression”Example: Logistic Regression versus Tree Induction
neural networks and, Nonlinear Functions, Support Vector Machines, and Neural NetworksNonlinear Functions, Support Vector Machines, and Neural Networks
non-linear functions for, Nonlinear Functions, Support Vector Machines, and Neural NetworksNonlinear Functions, Support Vector Machines, and Neural Networks
support vector machines and, Nonlinear Functions, Support Vector Machines, and Neural NetworksNonlinear Functions, Support Vector Machines, and Neural Networks
Parker, Charlie, Example: Jazz Musicians, Example: Jazz Musicians
Pasteur, Louis, Thinking Data-Analytically, Redux
patents, as intellectual property, Unique Intellectual Property
patterns
extract, Data Mining and Data Science, Revisited
finding, Data Mining and Its Results
penalties, * Avoiding Overfitting for Parameter Optimization
performance analytics, for modeling churn, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
performance degradation, * Example: Why Is Overfitting Bad?* Example: Why Is Overfitting Bad?
performance, of nearest-neighbor reasoning, Computational efficiency
phrase extraction, Named Entity Extraction
pilot studies, Flaws in the GGC Proposal
plunge (stock prices), The Task
polynomial kernels, Nonlinear Functions, Support Vector Machines, and Neural Networks
positives, Evaluating Classifiers
posterior probability, Applying Bayes’ Rule to Data ScienceApplying Bayes’ Rule to Data Science
Precision metric, Costs and benefits
prediction, Data Science, Engineering, and Data-Driven Decision Making, Models, Induction, and Prediction
Prediction API (Google), Thinking Data-Analytically, Redux
predictive learning methods, * Using Supervised Learning to Generate Cluster Descriptions
predictive modeling, Introduction to Predictive Modeling: From Correlation to Supervised SegmentationIntroduction to Predictive Modeling: From Correlation to Supervised Segmentation, Fitting a Model to Data
alternative methods, Fitting a Model to Data
basic concepts, Summary
causal explanations and, Data-Driven Causal Explanation and a Viral Marketing Example
classification trees and, Visualizing SegmentationsTrees as Sets of Rules
customer churn, predicting with tree induction, Example: Addressing the Churn Problem with Tree InductionExample: Addressing the Churn Problem with Tree Induction
focus, Supervised Segmentation
induction and, Models, Induction, and PredictionModels, Induction, and Prediction
link prediction, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
nearest-neighbor reasoning for, Nearest Neighbors for Predictive Modeling
parametric modeling and, Fitting a Model to Data
probability estimating and, Probability EstimationProbability Estimation
social recommendations and, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
supervised segmentation, Supervised SegmentationSummary
predictors, Models, Induction, and Prediction
preparation, Data Preparation
principles, Data Science, Engineering, and Data-Driven Decision Making, From Business Problems to Data Mining Tasks
prior beliefs, probability based on, Applying Bayes’ Rule to Data Science
prior churn, Data Mining and Data Science, Revisited
prior probability, class, Applying Bayes’ Rule to Data Science
privacy and data mining, Privacy, Ethics, and Mining Data About IndividualsPrivacy, Ethics, and Mining Data About Individuals
Privacy in Context (Nissenbaum), Privacy, Ethics, and Mining Data About Individuals
privacy protection, Privacy, Ethics, and Mining Data About Individuals
probabilistic evidence combination (PEC), Evidence and ProbabilitiesSummary
Bayes’ Rule and, Bayes’ RuleA Model of Evidence “Lift”
probability theory for, Combining Evidence ProbabilisticallyJoint Probability and Independence
targeted ad example, Example: Targeting Online Consumers With AdvertisementsExample: Targeting Online Consumers With Advertisements
Probabilistic Topic Models, Topic Models
probability, * Logistic Regression: Some Technical Details* Logistic Regression: Some Technical Details
and nearest-neighbor reasoning, Probability Estimation
basic rule of, Costs and benefits
building models for estimation of, Business Understanding
conditional, Combining Evidence Probabilistically
joint, Joint Probability and IndependenceJoint Probability and Independence
of errors, Error rates
of evidence, Bayes’ Rule
of independent events, Joint Probability and IndependenceJoint Probability and Independence
posterior, Applying Bayes’ Rule to Data ScienceApplying Bayes’ Rule to Data Science
prior, Applying Bayes’ Rule to Data Science
unconditional, Bayes’ Rule, Applying Bayes’ Rule to Data Science
probability estimation trees, Supervised Segmentation with Tree-Structured Models, Probability Estimation
probability notation, Combining Evidence ProbabilisticallyCombining Evidence Probabilistically
probability theory, Combining Evidence ProbabilisticallyJoint Probability and Independence
processes, Data Science, Engineering, and Data-Driven Decision Making
profiling, From Business Problems to Data Mining Tasks, Profiling: Finding Typical BehaviorProfiling: Finding Typical Behavior
consumer movie-viewing preferences example, Data Reduction, Latent Information, and Movie Recommendation
when the distribution is not symmetric, Profiling: Finding Typical Behavior
profit curves, Profit CurvesProfit Curves, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
profit, negative, Profit Curves
profitability, Answering Business Questions with These Techniques
profitable customers, average customers vs., Answering Business Questions with These Techniques
proposals, evaluating, Be Ready to Evaluate Proposals for Data Science ProjectsFlaws in the Big Red Proposal, Scenario and ProposalFlaws in the GGC Proposal
proxy labels, From an Expected Value Decomposition to a Data Science Solution
psychometric data, Associations Among Facebook Likes
publishing, Attracting and Nurturing Data Scientists and Their Teams
purity, Selecting Informative AttributesSelecting Informative Attributes
Pythagorean Theorem, Similarity and Distance

R

Ra, Sun, Example: Jazz Musicians
ranking cases, classifying vs., Visualizing Model PerformanceExample: Performance Analytics for Churn Modeling
ranking variables, Supervised Segmentation
reasoning, Similarity, Neighbors, and Clusters
Recall metric, Costs and benefits
Receiver Operating Characteristics (ROC) graphs, ROC Graphs and CurvesROC Graphs and Curves
area under ROC curves (AUC), The Area Under the ROC Curve (AUC)
in KDD Cup churn problem, Example: Performance Analytics for Churn ModelingExample: Performance Analytics for Churn Modeling
recommendations, Similarity, Neighbors, and Clusters
Reddit, Why Text Is Important
regional distribution centers, grouping/associations and, Co-occurrences and Associations: Finding Items That Go Together
regression, From Business Problems to Data Mining Tasks, From Business Problems to Data Mining Tasks, Similarity, Neighbors, and Clusters
building models for, Business Understanding
classification and, From Business Problems to Data Mining Tasks
ensemble methods and, Bias, Variance, and Ensemble Methods
least squares, Regression via Mathematical Functions
logistic, Example: Overfitting Linear Functions
ridge, * Avoiding Overfitting for Parameter Optimization
supervised data mining and, Supervised Versus Unsupervised Methods
supervised segmentation and, Selecting Informative Attributes
regression modeling, Generalizing Beyond Classification
regression trees, Supervised Segmentation with Tree-Structured Models, Bias, Variance, and Ensemble Methods
regularization, * Avoiding Overfitting for Parameter Optimization, Summary
removing missing values, Data Preparation
repetition, Data Science, Engineering, and Data-Driven Decision Making
requirements, Data Preparation
responders, likely vs. not likely, Using Expected Value to Frame Classifier Use
retrieving, Similarity, Neighbors, and Clusters
retrieving neighbors, Regression
Reuters news agency, Example: Clustering Business News Stories
ridge regression, * Avoiding Overfitting for Parameter Optimization
root-mean-squared error, Generalizing Beyond Classification

S

Saint Magdalene single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Scapa single malt scotch, Understanding the Results of Clustering
Schwarz, Henry, Stepping Back: Solving a Business Problem Versus Data Exploration
scoring, From Business Problems to Data Mining Tasks
search advertising, display vs., Example: Targeting Online Consumers With Advertisements
search engines, Why Text Is Important
second-layer models, Nonlinear Functions, Support Vector Machines, and Neural Networks
segmentation
creating the best, Selecting Informative Attributes
supervised, Clustering
unsupervised, Stepping Back: Solving a Business Problem Versus Data Exploration
selecting
attributes, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
informative variables, Supervised Segmentation
variables, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
selection bias, A Brief Digression on Selection BiasA Brief Digression on Selection Bias
semantic similarity, syntactic vs., The news story clusters
separating classes, Example: Overfitting Linear Functions
sequential backward elimination, A General Method for Avoiding Overfitting
sequential forward selection (SFS), A General Method for Avoiding Overfitting
service usage, From Business Problems to Data Mining Tasks
sets, Bag of Words
Shannon, Claude, Selecting Informative Attributes
Sheldon Cooper (fictional character), Example: Evidence Lifts from Facebook “Likes”
sign consistency, in cost-benefit matrix, Costs and benefits
Signet Bank, Data and Data Science Capability as a Strategic Asset, From an Expected Value Decomposition to a Data Science Solution
Silver Lake, Term Frequency
Silver, Nate, Evaluation, Baseline Performance, and Implications for Investments in Data
similarity, Similarity, Neighbors, and Clusters* Using Supervised Learning to Generate Cluster Descriptions
applying, Example: Whiskey Analytics
calculating, The Fundamental Concepts of Data Science
clustering, ClusteringThe news story clusters
cosine, * Other Distance Functions
data exploration vs. business problems and, Stepping Back: Solving a Business Problem Versus Data ExplorationStepping Back: Solving a Business Problem Versus Data Exploration
distance and, Similarity and DistanceSimilarity and Distance
heterogeneous attributes and, Heterogeneous Attributes
link recommendation and, Link Prediction and Social Recommendation
measuring, Similarity and Distance
nearest-neighbor reasoning, Nearest-Neighbor Reasoning* Combining Functions: Calculating Scores from Neighbors
similarity matching, From Business Problems to Data Mining Tasks
similarity-moderated classification (equation), * Combining Functions: Calculating Scores from Neighbors
similarity-moderated regression (equation), * Combining Functions: Calculating Scores from Neighbors
similarity-moderated scoring (equation), * Combining Functions: Calculating Scores from Neighbors
Simone, Nina, Example: Jazz Musicians
skew, Problems with Unbalanced Classes
Skype Global, Term Frequency
smoothing, Probability Estimation
social recommendations, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
soft clustering, Profiling: Finding Typical Behavior
software development, Implications for Managing the Data Science Team
software engineering, data science vs., A Firm’s Data Science Maturity
software skills, analytic skills vs., Implications for Managing the Data Science Team
Solove, Daniel, Privacy, Ethics, and Mining Data About Individuals
solution paths, changing, Data Understanding
spam (target class), Example: Targeting Online Consumers With Advertisements
spam detection systems, Example: Targeting Online Consumers With Advertisements
specified class value, Supervised Versus Unsupervised Methods
specified target value, Supervised Versus Unsupervised Methods
speech recognition systems, Thinking Data-Analytically, Redux
speeding up neighbor retrieval, Computational efficiency
Spirited Away, Example: Evidence Lifts from Facebook “Likes”
spreadsheet, implementation of Naive Bayes with, Evidence in Action: Targeting Consumers with Ads
spurious correlations, * Example: Why Is Overfitting Bad?
SQL, Database Querying
squared errors, Regression via Mathematical Functions
stable stock prices, The Task
standard linear regression, Regression via Mathematical Functions
Star Trek, Example: Evidence Lifts from Facebook “Likes”
Starbucks, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
statistical draws, * Logistic Regression: Some Technical Details
statistics
calculating conditionally, Statistics
field of study, Statistics
summary, Statistics
uses, Statistics
stemming, Term Frequency, Example: Jazz Musicians
Stillwell, David, Example: Evidence Lifts from Facebook “Likes”
stock market, The Task
stock price movement example, Example: Mining News Stories to Predict Stock Price MovementResults
Stoker (movie thriller), Term Frequency
stopwords, Term Frequency, Term Frequency
strategic considerations, Data and Data Science Capability as a Strategic Asset
strategy, Implications for Managing the Data Science Team
strength, in association mining, Co-occurrences and Associations: Finding Items That Go Together, Example: Beer and Lottery Tickets
strongly dependent evidence, Advantages and Disadvantages of Naive Bayes
structure, Machine Learning and Data Mining
Structured Query Language (SQL), Database Querying
structured thinking, Data Mining and Data Science, Revisited
structuring, Business Understanding
subjective priors, Applying Bayes’ Rule to Data Science
subtasks, From Business Problems to Data Mining Tasks
summary statistics, Statistics, Statistics
Summit Technology, Inc., The Data
Sun Ra, Example: Jazz Musicians
supervised data, Introduction to Predictive Modeling: From Correlation to Supervised SegmentationIntroduction to Predictive Modeling: From Correlation to Supervised Segmentation, Summary
supervised data mining
classification, Supervised Versus Unsupervised Methods
conditions, Supervised Versus Unsupervised Methods
regression, Supervised Versus Unsupervised Methods
subclasses, Supervised Versus Unsupervised Methods
unsupervised vs., Supervised Versus Unsupervised MethodsSupervised Versus Unsupervised Methods
supervised learning
generating cluster descriptions with, * Using Supervised Learning to Generate Cluster Descriptions* Using Supervised Learning to Generate Cluster Descriptions
methods of, * Using Supervised Learning to Generate Cluster Descriptions
term, Supervised Versus Unsupervised Methods
supervised segmentation, Introduction to Predictive Modeling: From Correlation to Supervised SegmentationIntroduction to Predictive Modeling: From Correlation to Supervised Segmentation, Supervised SegmentationSupervised Segmentation with Tree-Structured Models, Clustering
attribute selection, Selecting Informative AttributesExample: Attribute Selection with Information Gain
creating, Supervised Segmentation with Tree-Structured Models
entropy, Selecting Informative AttributesSelecting Informative Attributes
inducing, Supervised Segmentation with Tree-Structured Models
performing, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
purity of datasets, Selecting Informative AttributesSelecting Informative Attributes
regression problems and, Selecting Informative Attributes
tree induction of, Supervised Segmentation with Tree-Structured ModelsSupervised Segmentation with Tree-Structured Models
tree-structured models for, Supervised Segmentation with Tree-Structured ModelsSupervised Segmentation with Tree-Structured Models
support vector machines, Optimizing an Objective Function, Example: Overfitting Linear Functions
linear discriminants and, Support Vector Machines, BrieflySupport Vector Machines, Briefly, Support Vector Machines, Briefly
non-linear, Support Vector Machines, Briefly, Nonlinear Functions, Support Vector Machines, and Neural Networks
objective function, Support Vector Machines, Briefly
parametric modeling and, Nonlinear Functions, Support Vector Machines, and Neural NetworksNonlinear Functions, Support Vector Machines, and Neural Networks
support, in association mining, Example: Beer and Lottery Tickets
surge (stock prices), The Task
surprisingness, Measuring Surprise: Lift and LeverageMeasuring Surprise: Lift and Leverage
synonyms, Why Text Is Difficult
syntactic similarity, semantic vs., The news story clusters

T

table models, Generalization, Holdout Data and Fitting Graphs
tables, Models, Induction, and Prediction
Tambe, Prasanna, Data Processing and “Big Data”
Tamdhu single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions
Target, Data Science, Engineering, and Data-Driven Decision Making
target variables, Models, Induction, and Prediction, Regression
estimating value, Example: Attribute Selection with Information Gain
evaluating, Flaws in the Big Red Proposal
targeted ad example, Example: Targeting Online Consumers With AdvertisementsExample: Targeting Online Consumers With Advertisements
of Naive Bayes, Evidence in Action: Targeting Consumers with Ads
privacy protection in Europe and, Privacy, Ethics, and Mining Data About Individuals
targeting best prospects example, Targeting the Best Prospects for a Charity MailingA Brief Digression on Selection Bias
tasks/techniques, Data Science, Engineering, and Data-Driven Decision Making, Other Data Science Tasks and TechniquesSummary
associations, Co-occurrences and Associations: Finding Items That Go TogetherAssociations Among Facebook Likes
bias, Bias, Variance, and Ensemble MethodsBias, Variance, and Ensemble Methods
classification, From Business Problems to Data Mining Tasks
co-occurrence, Co-occurrences and Associations: Finding Items That Go TogetherAssociations Among Facebook Likes
data reduction, Data Reduction, Latent Information, and Movie RecommendationData Reduction, Latent Information, and Movie Recommendation
data-driven causal explanations, Data-Driven Causal Explanation and a Viral Marketing ExampleData-Driven Causal Explanation and a Viral Marketing Example
ensemble method, Bias, Variance, and Ensemble MethodsBias, Variance, and Ensemble Methods
latent information, Data Reduction, Latent Information, and Movie RecommendationData Reduction, Latent Information, and Movie Recommendation
link prediction, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
market basket analysis, Associations Among Facebook LikesAssociations Among Facebook Likes
overlap in, Regression Analysis
principles underlying, From Business Problems to Data Mining Tasks
profiling, Profiling: Finding Typical BehaviorProfiling: Finding Typical Behavior
social recommendations, Link Prediction and Social RecommendationLink Prediction and Social Recommendation
variance, Bias, Variance, and Ensemble MethodsBias, Variance, and Ensemble Methods
viral marketing example, Data-Driven Causal Explanation and a Viral Marketing ExampleData-Driven Causal Explanation and a Viral Marketing Example
Tatum, Art, Example: Jazz Musicians
technology
analytic, Data Preparation
applying, Other Analytics Techniques and Technologies
big-data, Data Processing and “Big Data”
theory in data science vs., Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data ScientistChemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
term frequency (TF), Term FrequencyTerm Frequency
defined, Term Frequency
in TFIDF, Combining Them: TFIDF
inverse document frequency, combining with, Combining Them: TFIDF
values for, Example: Jazz Musicians
terms
in documents, Representation
supervised learning, Supervised Versus Unsupervised Methods
unsupervised learning, Supervised Versus Unsupervised Methods
weights of, Topic Models
Terry, Clark, Example: Jazz Musicians
test data, model building and, A General Method for Avoiding Overfitting
test sets, Holdout Data and Fitting Graphs
testing, holdout, From Holdout Evaluation to Cross-Validation
text, Representing and Mining Text
as unstructured data, Why Text Is DifficultWhy Text Is Difficult
data, Representing and Mining Text
fields, varying number of words in, Why Text Is Difficult
importance of, Why Text Is Important
Jazz musicians example, Example: Jazz MusiciansExample: Jazz Musicians
relative dirtiness of, Why Text Is Difficult
text processing, Representing and Mining Text
text representation task, RepresentationCombining Them: TFIDF
text representation task, RepresentationCombining Them: TFIDF
bag of words approach to, Bag of Words
data preparation, The DataThe Data
data preprocessing, Data PreprocessingData Preprocessing
defining, The TaskThe Task
inverse document frequency, Measuring Sparseness: Inverse Document FrequencyMeasuring Sparseness: Inverse Document Frequency
Jazz musicians example, Example: Jazz MusiciansExample: Jazz Musicians
location mining as, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
measuring prevalence in, Term FrequencyTerm Frequency
measuring sparseness in, Measuring Sparseness: Inverse Document FrequencyMeasuring Sparseness: Inverse Document Frequency
mining news stories example, Example: Mining News Stories to Predict Stock Price MovementResults
n-gram sequence approach to, N-gram Sequences
named entity extraction, Named Entity ExtractionNamed Entity Extraction
results, interpreting, ResultsResults
stock price movement example, Example: Mining News Stories to Predict Stock Price MovementResults
term frequency, Term FrequencyTerm Frequency
TFIDF value and, Combining Them: TFIDF
topic models for, Topic ModelsTopic Models
TFIDF scores (TFIDF values), Data preparation
applied to locations, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
text representation task and, Combining Them: TFIDF
The Big Bang Theory, Example: Evidence Lifts from Facebook “Likes”
The Colbert Report, Example: Evidence Lifts from Facebook “Likes”
The Daily Show, Example: Evidence Lifts from Facebook “Likes”
The Godfather, Example: Evidence Lifts from Facebook “Likes”
The New York Times, Example: Hurricane Frances, What Data Can’t Do: Humans in the Loop, Revisited
The Onion, Example: Evidence Lifts from Facebook “Likes”
The Road (McCarthy), Term Frequency
The Signal and the Noise (Silver), Evaluation, Baseline Performance, and Implications for Investments in Data
The Sound of Music (film), Data Reduction, Latent Information, and Movie Recommendation
The Stoker (film comedy), Term Frequency
The Wizard of Oz (film), Data Reduction, Latent Information, and Movie Recommendation
Thomson Reuters Text Research Collection (TRC2), Example: Clustering Business News Stories
thresholds
and classifiers, Ranking Instead of ClassifyingRanking Instead of Classifying
and performance curves, Profit Curves
time series (data), The Data
Tobermory single malt scotch, Understanding the Results of Clustering
tokens, Representation
tools, analytic, Holdout Data and Fitting Graphs
topic layer, Topic Models
topic models for text representation, Topic ModelsTopic Models
trade secrets, Unique Intellectual Property
training data, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation, Models, Induction, and Prediction, Overfitting
evaluating, Holdout Data and Fitting Graphs, Flaws in the Big Red Proposal
limits on, Bias, Variance, and Ensemble Methods
using, From Holdout Evaluation to Cross-Validation, Learning Curves, Summary
training sets, Holdout Data and Fitting Graphs
transfers, over the wall, Deployment
tree induction, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
ensemble methods and, Bias, Variance, and Ensemble Methods
learning curves for, Learning Curves
limiting, Avoiding Overfitting with Tree Induction
logistic regression vs., Example: Logistic Regression versus Tree InductionExample: Logistic Regression versus Tree Induction
of supervised segmentation, Supervised Segmentation with Tree-Structured ModelsSupervised Segmentation with Tree-Structured Models
overfitting and, Overfitting in Tree InductionOverfitting in Tree Induction, Avoiding Overfitting with Tree InductionAvoiding Overfitting with Tree Induction
problems with, Avoiding Overfitting with Tree Induction
Tree of Life (Sugden et al; Pennisi), Hierarchical Clustering
tree-structured models
classification, Supervised Segmentation with Tree-Structured Models
creating, Supervised Segmentation with Tree-Structured Models
decision, Supervised Segmentation with Tree-Structured Models
for supervised segmentation, Supervised Segmentation with Tree-Structured ModelsSupervised Segmentation with Tree-Structured Models
goals, Supervised Segmentation with Tree-Structured Models
probability estimation, Supervised Segmentation with Tree-Structured Models, Probability Estimation
pruning, Avoiding Overfitting with Tree Induction
regression, Supervised Segmentation with Tree-Structured Models
restricting, Overfitting in Tree Induction
tri-grams, N-gram Sequences
Tron, Example: Evidence Lifts from Facebook “Likes”
true negative rate, Costs and benefits
true negatives, Costs and benefits
true positive rate, Costs and benefits, ROC Graphs and CurvesROC Graphs and Curves, Cumulative Response and Lift Curves
true positives, Costs and benefits
Tullibardine single malt whiskey, Hierarchical Clustering
Tumblr, online consumer targeting by, Example: Targeting Online Consumers With Advertisements
Twitter, Why Text Is Important
Two Dogmas of Empiricism (Quine), What Data Can’t Do: Humans in the Loop, Revisited

W

Wal-Mart, The Ubiquity of Data Opportunities, Example: Hurricane Frances, Data Science, Engineering, and Data-Driven Decision Making
Waller, Fats, Example: Jazz Musicians
Wang, Wally, Example: Evidence Lifts from Facebook “Likes”, Associations Among Facebook Likes
Washington Square Park, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data
weather forecasting, Evaluation, Baseline Performance, and Implications for Investments in Data
Web 2.0, Why Text Is Important
web pages, personal, Why Text Is Important
web properties, as content pieces, Example: Targeting Online Consumers With Advertisements
Web services, free, Example: Targeting Online Consumers With Advertisements
Weeds (television series), Example: Evidence Lifts from Facebook “Likes”
weighted scoring, How Many Neighbors and How Much Influence?, Data Reduction, Latent Information, and Movie Recommendation
weighted voting, How Many Neighbors and How Much Influence?
What Data Cant Do (Brooks), What Data Can’t Do: Humans in the Loop, Revisited
whiskey example
clustering and, Example: Whiskey Analytics RevisitedHierarchical Clustering
for nearest-neighbors, Example: Whiskey AnalyticsExample: Whiskey Analytics
supervised learning to generate cluster descriptions, * Using Supervised Learning to Generate Cluster Descriptions* Using Supervised Learning to Generate Cluster Descriptions
Whiz-bang example, Example Data Mining ProposalFlaws in the Big Red Proposal
Wikileaks, Example: Evidence Lifts from Facebook “Likes”
wireless fraud example, What Data Can’t Do: Humans in the Loop, Revisited
Wisconsin Breast Cancer Dataset, Example: Logistic Regression versus Tree Induction
words
lengths of, Why Text Is Difficult
modifiers of, Results
sequences of, N-gram Sequences
workforce constraint, Profit Curves
worksheets, Models, Induction, and Prediction
worsening models, * Example: Why Is Overfitting Bad?

Y

Yahoo! Finance, The Data
Yahoo!, online consumer targeting by, Example: Targeting Online Consumers With Advertisements
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset