List of Tables

Chapter 2. Introducing recommenders

Table 2.1. An illustration of the average difference and root-mean-square calculation

Chapter 3. Representing recommender data

Table 3.1. Illustration of default table schema for taste_preferences in MySQL

Chapter 4. Making recommendations

Table 4.1. The Pearson correlation between user 1 and other users based on the three items that user 1 has in common with the others

Table 4.2. The Euclidean distance between user 1 and other users, and the resulting similarity scores

Table 4.3. The preference values transformed into ranks, and the resulting Spearman correlation between user 1 and each of the other users

Table 4.4. The similarity values between user 1 and other users, computed using the Tanimoto coefficient. Note that preference values themselves are omitted, because they aren’t used in the computation.

Table 4.5. The similarity values between user 1 and other users, computed using the log-likelihood similarity metric

Table 4.6. Evaluation results under various ItemSimilarity metrics

Table 4.7. Average differences in preference values between all pairs of items. Cells along the diagonal are 0.0. Cells in the bottom left are simply the negative of their counterparts across the diagonal, so these aren’t represented explicitly. Some diffs don’t exist, such as 102-107, because no user expressed a preference for both 102 and 107.

Table 4.8. Summary of available recommender implementations in Mahout, their key input parameters, and key features to consider when choosing an implementation

Chapter 5. Taking recommenders to production

Table 5.1. Average absolute difference in estimated and actual preferences when evaluating a user-based recommender using one of several similarity metrics, and using a nearest-n user neighborhood

Table 5.2. Average absolute difference in estimated and actual preferences when evaluating a user-based recommender using one of several similarity metrics, and using a threshold-based user neighborhood. Some values are “not a number,” or undefined, and are denoted by Java’s NaN symbol.

Table 5.3. Average absolute differences in estimated and actual preferences, when evaluating an item-based recommender using several different similarity metrics

Chapter 6. Distributing recommendation computations

Table 6.1. The co-occurrence matrix for items in a simple example data set. The first row and column are labels and not part of the matrix.

Table 6.2. Multiplying the co-occurrence matrix with user 3’s preference vector (U3) to produce a vector that leads to recommendations, R

Chapter 7. Introduction to clustering

Table 7.1. Result of clustering using various distance measures

Chapter 8. Representing data

Table 8.1. set of apples of different weight, sizes, and colors converted to vectors

Table 8.2. Important flags for the Mahout dictionary-based vectorizer and their default values

Chapter 9. Clustering algorithms in Mahout

Table 9.1. Top five words in selected topics from LDA topic modeling of Reuters news data

Table 9.2. Top five words in selected topics from LDA topic modeling after increased smoothing is applied

Table 9.3. The different clustering algorithms in Mahout, their entry-point classes, and their properties

Chapter 10. Evaluating and improving clustering quality

Table 10.1. Flags of the Mahout ClusterDumper tool and their default values

Chapter 13. Introduction to classification

Table 13.1. Mahout is most useful with extremely large or rapidly growing data sets where other solutions are least feasible.

Table 13.2. Terminology for the key ideas in classification

Table 13.3. Four common types of values used to represent features

Table 13.4. Sample data that illustrates all four value types. These examples are typical of features of email data.

Table 13.5. Workflow in a typical classification project

Table 13.6. Fields used in the donut.csv data file

Table 13.7. Command-line options for the trainlogistic program

Table 13.8. Command-line options for the runlogistic program

Chapter 14. Training a classifier

Table 14.1. Approaches to encoding classifiable data as a vector

Table 14.2. The most common headers found in the 20 newsgroups articles. A few of the less common headers are included as well.

Table 14.3. Characteristics of the Mahout learning algorithms used for classification

Chapter 15. Evaluating and tuning a classifier

Table 15.1. Data from two hypothetical classifiers to show some of the limitations of just looking at percent correct. The columns show the frequency of each possible model output. Each row contains data for a particular correct value, and the answer with the highest score is in bold. Model 1 is never quite right, but may still be useful, whereas model 2 is like a stopped clock.

Table 15.2. Mahout supports a variety of classifier performance metrics through multiple APIs.

Table 15.3. The Mahout classes that support performance evaluation for classifiers

Table 15.4. How bad tokens can defeat feature extraction

Table 15.5. Configuration methods for SGD learning classes

Appendix A. JVM tuning

Table A.1. Key JVM tuning parameters for recommender engines

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset