Machine Learning: a nice introduction with TripAdvisor examples

This afternoon I attended a seminar from Prof. Padraig Cunningham. The seminar was organized by Gilt (thanks!).

The seminar was about Machine Learning. First we discussed about linear and logistic regressions, while at the end we focused on recommenders.

I really like the part about Weka: it is a nice tool to play with data and compare different models. Me and my friend Giuseppe Rizzo have used Weka before when building Automatic Cross-Language Spotters (if interested in some code take a look on the Github projects).

It was nice that the Professor presented an example about TripAdvisor:

Bv-Ed0OCEAI05dc

The example was taken by the paper Learning to Recommend Helpful Hotel Reviews. It is an interesting reading, but currently we are focusing more on suggesting the best hotel for each user (Just for You is the brand of this product, as it appears on our website).

One argument that Prof. Cunningham did not spent much time on is the discovery of latens spaces. He briefly referred to the paper
Matrix Factorization Techniques for Recommender Systems. The paper presents a method to individuate the relevant dimensions in a very large and sparse matrix, like the ones typically used by recommenders in real-life systems (at least for worldwide market leaders like TripAdvisor, Amazon, Netflix, etc.). Those recommenders to implement collaborative filtering have to deal with matrices n * m where n is the number of users and m the number of items (books, movies, hotels). In practice, it is very difficult to do that, for a variety of reasons. A system to individuate categories (thriller movies, fantasy books, romantic hotels) seems a very nice and elegant solutions.

If you are interested in learning more about recommender systems, we recently finished reading this book: “Recommender Systems: an Introduction“. Despite the title is a quite complete book, touching all the different aspects involved in building a working, scaling, robust recommender.