The utility matrix

In a hybrid recommendation system, there are two classes of entities: users and items (examples are movies, products, and so on). Now, as a user, you might have preferences for certain items. Therefore, these preferences must be extracted from data about items, users, or ratings. Often this data is represented as a utility matrix, such as a user-item pair. This type of value can represent what is known about the degree of preference of that user for a particular item. The entry in the matrix, that is, a table, can come from an ordered set. For example, integers 1-5 can be used to represent the number of stars that the user gave as a rating for items.

We have argued that often users might not have rated items; that is, most entries are unknown. This also means that the matrix might be sparse. An unknown rating implies that we have no explicit information about the user's preference for the item. Table 1 shows an example utility matrix. The matrix represents the ratings of users about movies on a 1-5 scale, 5 being the highest rating. A blank entry means no users have provided any rating about those movies.

Here HP1, HP2, and HP3 are acronyms for the movies Harry Potter I, II, and III, respectively; TW is for Twilight; and SW1, SW2, and SW3 represent Star Wars episodes 1, 2, and 3, respectively. The users are represented by capital letters A, B, C, and D:

Figure 2: Utility matrix (user versus movies matrix)

There are many blank entries for the user-movie pairs. This means that users have not rated those movies. In a real-life scenario, the matrix might be even sparser, with the typical user rating only a tiny fraction of all available movies. Now, using this matrix, the goal is to predict the blanks in the utility matrix. Let's see an example. Suppose we are curious to know whether user A likes SW2. However, this is really difficult to determine since there is little evidence in the matrix in Table 1.

Thus, in practice, we might develop a movie recommendation engine to consider the uncommon properties of movies, such as producer name, director name, lead stars, or even the similarity of their names. This way, we can compute the similarity of movies SW1 and SW2. This similarity would drive us to conclude that since A did not like SW1, they are not likely to enjoy SW2 either.

However, this might not work for the larger dataset. Therefore, with much more data, we might observe that the people who rated both SW1 and SW2 were inclined to give them similar ratings. Finally, we can conclude that A would also give SW2 a low rating, similar to A's rating of SW1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset