Mohamed Sarwat

Assistant Professor of Computer Science and Engineering at Arizona State University


The Three Phases of the PhD Timeline



I created this infographic (on the left) for those who think about pursuing a PhD. It is supposed to be simple and to the point (WikiHow style). The infographic targets folks who are not familiar with the PhD process. Hence, some of these steps may seem straightforward to professors and well-established scientists.

As a graduate student, there are three phases you go through the course of your PhD studies. The first phase is called the “Spoon Feeding” phase. In this phase, you just started the PhD program. You barely know a thing or two about the subject. You advisor know much more about your thesis topic than you actually do. You need to follow your PhD advisor’s guidelines carefully. The second phase is named the “Neck & Neck” phase. In such phase, you delved deeper into the subject and acquired the necessary knowledge. Now, you can run a mature scientific conversation with your PhD Advisor. The third phase is called the “Catch me if you can” phase. In this phase, you became an expert in your field. You published a couple of articles in good venues. Scientists in your area starts to acknowledge your scientific contribution. By the third phase, your PhD advisor: (1) cannot catch up with you, (2) realizes that you completed your training and (3) starts to see you as a fellow scientist. Now, you are ready to graduate!

In this infographic, I share my perspective on the PhD timeline.  Below is other scholar’s critique of the infographic:

” I am skeptical about the “ready to graduate” part at the end. PhD means “Doctor of Philosophy in X”, where X is a scientific field. Your infographic accurately depicts the X part, but ignores the “Philosophy” part. To graduate a student, the advisor is responsible on ensuring that the student not only has enough knowledge in X, but also has excellent “philosophy”. This includes the ability to decide on which research problems to tackle, to judge on what is good and bad in research, and to have the appropriate big picture understanding of how X fits in the world knowledge. Not to mention excelling in soft skills like how to write a paper, how to position your research, and communication/presentation skills.” — Mohamed F. Mokbel

Inaugurating the Data Systems Lab


When I joined ASU in Fall 2014, I inaugurated the Data Systems Lab (DataSys@ASU). The main mission of DataSys@ASU is to impact society by boosting education and conducting novel scientific research in the data management area. Members of DataSys@ASU design and develop data management systems that enable emerging applications such as predictive analytics, location-based services, social networking), new data types (examples; geospatial data, graph data) and new workload. The lab also develops data management systems that support new computing paradigms (examples; Cloud computing and Cluster computing) and new hardware system architectures such as flash memory storage devices.

Recommendation Fits well in Database Systems !

“What users (customers) really want?” a question asked by almost every business and online retail store. The answer to this question helps users find interesting items (products, movies, books) to buy, watch, read, etc… In a pursuit to such an answer, personalized recommender systems analyze users preferences/tastes and purchasing history in order to predict how much a specific user would like an unseen item. Actually, Netflix, an online streaming video service, reported that 75% of movies users watch on Netflix are from recommendation.

Technically speaking, a recommender system takes as input a user-item ratings matrix (movies example is given below) in which rows represent users and columns represent items and each matrix entry represent a rating (e.g., scale from 1 to 10) that a user assigned to an item. To recommend a movie to Alice, a recommender system first predicts how much Alice would like unseen movies (i.e., Inception and Casino) and then returns the movie that is expected to have the maximum predicted rating value.

Offline Recommenders

Existing recommender systems are offline in nature; they pre-compute a set of recommended items offline for every user, store them on disk, and returns the pre-computed recommendation to a user when she logs on to the system. However, such libraries requires loading the whole user/items ratings data from persistent storage to memory, which may represent a performance bottleneck with large-scale data. Moreover, they assume the input data and the generated recommendation model fits in-memory, and hence does not scale to large-scale datasets. Such offline systems include: software libraries that perform the full recommendation process in-memory, e.g., LensKit and MyMediaLight.

Offline recommenders also include large-scale Hadoop-based offline systems like Apache Mahout. These systems are built on-top of the Hadoop ecosystem and run the recommendation generation functionality as a batch processing MapReduce job. Despite the fact that Hadoop-based implementations are scalable, nonetheless, they suffer from the following: (1)~Tremendous overhead of transferring the user/item ratings data from the transactional database system (where the data reside) to HDFS and vice-versa. (2)~Inadequacy of handling online arbitrary recommendation scenarios since the recommended items are pre-computed offline. (3)~Hadoop-based systems require users to set-up a Hadoop cluster and write a MapReduce program rather than using a declarative query which is not appealing for novice users.

Recommendation in Database Systems

At the University of Minnesota Data Management lab, we developed RecDB; an open Source Recommendation Engine built entirely Inside PostgreSQL 9.2. RecDB allows application developers to build recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms like user-user collaborative filtering, item-item collaborative filtering, Singular value decomposition.

RecDB provides an intuitive interface for application developers to build custom-made recommenders. That allows application developers to implement myriad recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms. To achieve that, RecDB extends SQL with new statements to create and/or drop recommenders. The system efficiently maintains each created recommender that be queried to generate personalized recommendations to end-users. RecDB proposes a novel-querying paradigm that allows database users to express recommendation as part of the issued SQL queries. The system then optimizes the recommendation-aware SQL query through a set of newly introduced recommendation-aware relational operators to realize a variety of popular data recommendation algorithms inside the database query processor.

Applications powered by RecDB can produce online and flexible personalized recommendations to end-users. RecDB has the following main features:

  • Usability: RecDB is an out-of-the-box tool for web and mobile developers to implement a myriad of recommendation applications. The system is easily used and configured so that a novice developer can define a variety of recommenders that fits the application needs in few lines of SQL.
  • Seamless Database Integration: Crafted inside PostgreSQL database engine, RecDB is able to seamlessly integrate the recommendation functionality with traditional database operations, i.e., SELECT, PROJECT, JOIN, in the query pipeline to execute ad-hoc recommendation queries.
  • Scalability and Performance: The system optimizes incoming recommendation queries (written in SQL) and hence provides near real-time personalized recommendation to a high number of end-users who expressed their opinions over a large pool of items.

Creating a Recommender

Users may create recommenders a-priori so that when a recommendation query is issued may be answered with less latency. The user needs to specify the ratings table in the ON clause and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings in the USING clause. An example of creating an Item-Item Collaborative Filtering recommender on the User/Item ratings table MovieRatings is as follows

EVENTS FROM ratingval

Generating Recommendation

To generate recommendation, RecDB allows users to write their recommendation query using SQL. In the recommendation query, the user needs to specify the ratings table and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings. For example, if MovieRatings(userid,itemid,ratingval) represents the ratings table in a movie recommendation application, then to recommend top-10 movies based on the rating predicted using Item-Item Collaborative filtering (applying cosine similarity measure) algorithm to user 1, the user writes the following SQL:

SELECT * FROM MovieRatings R
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval

The main benefit of implementing the recommendation functionality inside a database engine (PostgreSQL) is to allow for integration with traditional database operations, e.g., selection, projection, join. For example, the following query recommends the top 10 Comedy movies to user 1. In order to do that, the query joins the recommendation with the Movies table and apply a filter on the movies genre column (genre LIKE ‘%Comedy%’).

SELECT * FROM MovieRatings R, Movies M
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1 AND M.movieid = R.itemid AND M.genre LIKE '%Comedy%'
ORDER BY R.ratingval


More Applications

For demonstration purposes, we developed a restaurant recommendation application, using RecDB, that generates restaurant recommendation to users based upon their spatial locations. The application analyzes the history of user CheckIns (visits) such that each checkin entry represents whether a user has visited a restaurants before. In such case, the checkin field is set to one if the user visited the restaurant, and zero otherwise.   We create RestaurantRec; a recommender that generates personalized recommendation using the singular value decomposition (SVD) recommendation algorithms performed on the CheckIns table as the user/item events matrix.


A user visiting ‘New York City’ asks for top five restaurant (location-aware) recommendation by issuing the following query to RecDB. The query states the current user location using traditional SQL operators (WHERE A.itemid = B.itemid AND B.location = ‘New York City’). RecDB therefore produces a set of five restaurants by using ORDER BY / LIMIT SQL.

SELECT A.itemid FROM CheckIns A , Restaurant B 
RECOMMEND A.rid TO A.userid ON A.checkin USING SVD
WHERE A.uid = 1 AND A.itemid = B.itemid AND B.location = 'New York'
ORDER BY A.checkin LIMIT 5

RecDB source code is available here on GitHub. Since its release on October 3rd 2013, RecDB has been downloaded more than 20000 times.

PhD Dissertation Acknowledgement

There are many people that have earned my gratitude for their contribution to my time in graduate school. More specifically, I would like to thank five groups of people, without whom this thesis would not have been possible: my thesis committee members, my lab mates, my industrial collaborators, funding agencies, and my family.

My Advisor

First, I am indebted to my thesis advisor, Mohamed F. Mokbel. Since my first day in graduate school, Mohamed believed in me like nobody else and gave me endless support. It all started in Fall 2009 when he offered me such a great opportunity to join the data management lab. On the academic level, Mohamed taught me fundamentals of conducting scientific research in the database systems area. Under his supervision, I learned how to define a research problem, find a solution to it, and finally publish the results. On a personal level, Mohamed inspired me by his hardworking and passionate attitude. To summarize, I would give Mohamed most of the credit for becoming the kind of scientist I am today.

Thesis Committee Memebers

Besides my advisor, I would like to thank the rest of my dissertation committee members (Gedas Adocmivicius, Shashi Shekhar, and Eric Van Wyk) for their great support and invaluable advice. I am thankful to Prof. Adomavicius, an expert in context-aware recommender systems, for his crucial remarks that shaped my final dissertation. I am also grateful to Prof. Shekhar for his insightful comments and for sharing with me his tremendous experience in the spatial data management field. I am quite appreciative of Prof. Eric Van Wyk for agreeing to serve on my dissertation committee on such a short notice as a replacement for John Riedl. I also show gratitude for Prof. John Riedl (former thesis committee member), an excellent teacher and pioneer in the recommender systems area, who unfortunately passed away a few months before the official dissertation defense.

My UMN Data Management Labmates

I would like to thank my lab mates for their continued support. This dissertation would not have been possible without the intellectual contribution of Justin J. Levandoski, a data management lab alumni. Moreover, I am thankful to James Avery and Ahmed Eldawy for their collaboration and contribution in various projects related to this dissertation. I would also like to thank my other lab mates that include Louai Alarabi, Jie Bao, Chi-Yin Chow, Abdeltawab Hendawi, Mohamed Khalefa, Amr Magdy, and Joe Naps for making my experience in the data management lab and graduate school exciting and fun.

Collaborators form Industry

I am also grateful to my industrial collaborators. I spent two summers at Microsoft Research where I had the chance to collaborate with fantastic researchers. More specifically, I would like to thank Sameh Elnikety and Yuxiong He for their continuous support and for providing me the great opportunity to work on large-scale systems. I also extend my gratitude to members of the database, cloud systems, and DMX groups at Microsoft research for the fruitful discussions and for making my internship at Microsoft such an eye-opening experience. I also had one summer internship at NEC laboratories where I have collaborated with wonderful scientists in the Data Management department. I would like to thank Jagan Sankaranarayanan (my mentor at NEC Labs) and Hakan Hacigumus (my manager at NEC Labs) for their great mentorship and guidance.

Funding Agencies

Thanks are also due to the (NSF) National Science Foundation (under Grants IIS-0952977, IIS-1218168, IIS-0811998, IIS-0811935, and CNS-0708604), University of Minnesota Digital Technology Center, and Microsoft Research for their financial support that I otherwise would not have been able to develop my scientific discoveries.

My Family and Friends

Last but not least, I would like to express my deepest gratitude to my family and friends. This dissertation would not have been possible without their warm love, continued patience, and endless support.