Content-based filtering is out of scope in the Mahout framework, mainly because it is up to you to decide how to define similar items. If we want to do a content-based item-item similarity, we need to implement our own ItemSimilarity
. For instance, in our book's dataset, we might want to make up the following rule for book similarity:
0.15
to similarity0.50
to similarityWe could now implement our own similarity measure as follows:
class MyItemSimilarity implements ItemSimilarity { ... public double itemSimilarity(long itemID1, long itemID2) { MyBook book1 = lookupMyBook (itemID1); MyBook book2 = lookupMyBook (itemID2); double similarity = 0.0; if (book1.getGenre().equals(book2.getGenre()) similarity += 0.15; } if (book1.getAuthor().equals(book2. getAuthor ())) { similarity += 0.50; } return similarity; } ... }
We then use this ItemSimilarity
instead of something like LogLikelihoodSimilarity
or other implementations with a GenericItemBasedRecommender
. That's about it. This is as far as we have to go to perform content-based recommendation in the Mahout framework.
What we saw here is one of the simplest forms of content-based recommendation. Another approach could be to create a content-based profile of users, based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually-rated content vectors.