My take on the Mahout and Myrrix recommendation algorithms

MahoutWhereas BestComparator has his own recommendation engine based on user profiling, behavior analysis and analysis of product specs, I recently wanted to explore the possibilities of the famous recommendation engine built inside Mahout.

First of all, Mahout is a set of machine learning algorithms which leverage the Hadoop environment, providing powerful and scalable algorithms. One of its main target is the recommendation algorithms also known as taste collaborative filtering.

Recommendation algorithms have been made famous by websites such as Amazon, Youtube or Netflix. They use it to make suggestions based on what you bought, watched or liked.

Myrrix

MyrrixOne of the author of Taste/Mahout recommender engine, Sean Owen decided to give the engine a more formal structure by building Myrrix.

Myrrix is a recommendation engine based on Mahout. It offers an out of the box configuration for a recommendation engine accessible with a Rest API. The good to know are:

  • Scalable as Mahout and Hadoop are scalable, using computing parallelization and a distributed file system
  • Runs an optimized version of Taste (currently Taste 3)
  • Runs in real time
  • Can be efficient even with a relatively small amount of data

Recommendation process

The first thing you want to do is to feed your model (ie. Your algorithm) with current observed data. The models aggregates users, items and the associations between them. These associations are called preferences and are qualified by their value, describing the strength of the association between the user and the item.

Feeding the engine means pushing every observed associations with the user id, the item id and the strength. You are simply giving the engine your current taste graph, linking users to items via their tastes.

When your engine is fed, you have to ask it to refresh. Thus it will re-analyze the given graph and compute an actualized, and thus better, model. This may take some time, but Myrrix has the ability to continue answer your requests during this time.

Finally, with your shinny model you can ask questions and get recommendations. Here are the main queries:

  • Recommend to a user
  • Recommend to a group of users
  • Recommend to an anonymous user
  • Recommend similar items
  • Estimate the strength of the preference between an user and an item

With such a panel of tools you can easily guess that answering the question “What item users like me also liked?” becomes accessible.

Consuming Myrrix from PHP

PHPIn order to integrate Myrrix results to my recommendation engine, I had to build a PHP Myrrix client. I decided to use the Guzzle library that provides a really neat way of building a PHP client for Rest APIs. You can download my library on the open source Github Project: https://github.com/michelsalib/bcc-myrrix.

After installing the library, you can write some very fancy code:

// Get a client
$this->client = MyrrixClient::factory(array(
    'hostname' => 'localhost',
    'port'     => 8080,
));

// Put a user/item assocation, here use #101 as an association of strength 0.5 with item #1000
$command = $this->client->getCommand('PostPref', array(
    'userID' => 101,
    'itemID' => 1000,
    'value'  => (string)0.5,
));
$this->client->execute($command);

// Refresh the index
$command = $this->client->getCommand('Refresh');
$this->client->execute($command);

// Get a recommendation for user #101
$command = $this->client->getCommand('GetRecommendation', array(
    'userID' => 101,
));
$recommendation = $this->client->execute($command)->json();

Here we instantiate a Myrrix client hosted on the localhost on port 8080. We put into the model a preference of 0.5 between the user #101 and the item #1000. We then ask the model to refresh. Finally we get a recommendation for the user #101. The recommendation result is an array of item id with their estimated strength for the given user.

The library is pretty straight forward and help you leverage in a very simple way all the powerfulness of the Myrrix engine from PHP.

I also made a Symfony Bundle that helps you get the client from the dependency container, and offers a cleaner configuration process: https://github.com/michelsalib/BCCMyrrixBundle.

Don’t hesitate to get the code, install it and test it. I would be very happy to get contributions, feedbacks or feature requests.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s