Recommendation Fits well in Database Systems !

“What users (customers) really want?” a question asked by almost every business and online retail store. The answer to this question helps users find interesting items (products, movies, books) to buy, watch, read, etc… In a pursuit to such an answer, personalized recommender systems analyze users preferences/tastes and purchasing history in order to predict how much a specific user would like an unseen item. Actually, Netflix, an online streaming video service, reported that 75% of movies users watch on Netflix are from recommendation.

Technically speaking, a recommender system takes as input a user-item ratings matrix (movies example is given below) in which rows represent users and columns represent items and each matrix entry represent a rating (e.g., scale from 1 to 10) that a user assigned to an item. To recommend a movie to Alice, a recommender system first predicts how much Alice would like unseen movies (i.e., Inception and Casino) and then returns the movie that is expected to have the maximum predicted rating value.

Offline Recommenders

Existing recommender systems are offline in nature; they pre-compute a set of recommended items offline for every user, store them on disk, and returns the pre-computed recommendation to a user when she logs on to the system. However, such libraries requires loading the whole user/items ratings data from persistent storage to memory, which may represent a performance bottleneck with large-scale data. Moreover, they assume the input data and the generated recommendation model fits in-memory, and hence does not scale to large-scale datasets. Such offline systems include: software libraries that perform the full recommendation process in-memory, e.g., LensKit and MyMediaLight.

Offline recommenders also include large-scale Hadoop-based offline systems like Apache Mahout. These systems are built on-top of the Hadoop ecosystem and run the recommendation generation functionality as a batch processing MapReduce job. Despite the fact that Hadoop-based implementations are scalable, nonetheless, they suffer from the following: (1)~Tremendous overhead of transferring the user/item ratings data from the transactional database system (where the data reside) to HDFS and vice-versa. (2)~Inadequacy of handling online arbitrary recommendation scenarios since the recommended items are pre-computed offline. (3)~Hadoop-based systems require users to set-up a Hadoop cluster and write a MapReduce program rather than using a declarative query which is not appealing for novice users.

Recommendation in Database Systems

At the University of Minnesota Data Management lab, we developed RecDB; an open Source Recommendation Engine built entirely Inside PostgreSQL 9.2. RecDB allows application developers to build recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms like user-user collaborative filtering, item-item collaborative filtering, Singular value decomposition.

RecDB provides an intuitive interface for application developers to build custom-made recommenders. That allows application developers to implement myriad recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms. To achieve that, RecDB extends SQL with new statements to create and/or drop recommenders. The system efficiently maintains each created recommender that be queried to generate personalized recommendations to end-users. RecDB proposes a novel-querying paradigm that allows database users to express recommendation as part of the issued SQL queries. The system then optimizes the recommendation-aware SQL query through a set of newly introduced recommendation-aware relational operators to realize a variety of popular data recommendation algorithms inside the database query processor.

Applications powered by RecDB can produce online and flexible personalized recommendations to end-users. RecDB has the following main features:

Usability: RecDB is an out-of-the-box tool for web and mobile developers to implement a myriad of recommendation applications. The system is easily used and configured so that a novice developer can define a variety of recommenders that fits the application needs in few lines of SQL.
Seamless Database Integration: Crafted inside PostgreSQL database engine, RecDB is able to seamlessly integrate the recommendation functionality with traditional database operations, i.e., SELECT, PROJECT, JOIN, in the query pipeline to execute ad-hoc recommendation queries.
Scalability and Performance: The system optimizes incoming recommendation queries (written in SQL) and hence provides near real-time personalized recommendation to a high number of end-users who expressed their opinions over a large pool of items.

Creating a Recommender

Users may create recommenders a-priori so that when a recommendation query is issued may be answered with less latency. The user needs to specify the ratings table in the ON clause and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings in the USING clause. An example of creating an Item-Item Collaborative Filtering recommender on the User/Item ratings table MovieRatings is as follows

CREATE RECOMMENDER MovieRec ON MovieRatings
USERS FROM userid
ITEMS FROM itemid
EVENTS FROM ratingval
USING ItemCosCF

Generating Recommendation

To generate recommendation, RecDB allows users to write their recommendation query using SQL. In the recommendation query, the user needs to specify the ratings table and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings. For example, if MovieRatings(userid,itemid,ratingval) represents the ratings table in a movie recommendation application, then to recommend top-10 movies based on the rating predicted using Item-Item Collaborative filtering (applying cosine similarity measure) algorithm to user 1, the user writes the following SQL:

SELECT * FROM MovieRatings R
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10

The main benefit of implementing the recommendation functionality inside a database engine (PostgreSQL) is to allow for integration with traditional database operations, e.g., selection, projection, join. For example, the following query recommends the top 10 Comedy movies to user 1. In order to do that, the query joins the recommendation with the Movies table and apply a filter on the movies genre column (genre LIKE ‘%Comedy%’).

SELECT * FROM MovieRatings R, Movies M
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1 AND M.movieid = R.itemid AND M.genre LIKE '%Comedy%'
ORDER BY R.ratingval
LIMIT 10

More Applications

For demonstration purposes, we developed a restaurant recommendation application, using RecDB, that generates restaurant recommendation to users based upon their spatial locations. The application analyzes the history of user CheckIns (visits) such that each checkin entry represents whether a user has visited a restaurants before. In such case, the checkin field is set to one if the user visited the restaurant, and zero otherwise. We create RestaurantRec; a recommender that generates personalized recommendation using the singular value decomposition (SVD) recommendation algorithms performed on the CheckIns table as the user/item events matrix.

CREATE RECOMMENDER RestaurantRec ON CheckIns
USERS FROM userid
ITEMS FROM itemid
RATINGS FROM checkin
USING SVD

A user visiting ‘New York City’ asks for top five restaurant (location-aware) recommendation by issuing the following query to RecDB. The query states the current user location using traditional SQL operators (WHERE A.itemid = B.itemid AND B.location = ‘New York City’). RecDB therefore produces a set of five restaurants by using ORDER BY / LIMIT SQL.

SELECT A.itemid FROM CheckIns A , Restaurant B 
RECOMMEND A.rid TO A.userid ON A.checkin USING SVD
WHERE A.uid = 1 AND A.itemid = B.itemid AND B.location = 'New York'
ORDER BY A.checkin LIMIT 5

RecDB source code is available here on GitHub. Since its release on October 3rd 2013, RecDB has been downloaded more than 20000 times.