My take on the Mahout and Myrrix recommendation algorithms

MahoutWhereas BestComparator has his own recommendation engine based on user profiling, behavior analysis and analysis of product specs, I recently wanted to explore the possibilities of the famous recommendation engine built inside Mahout.

First of all, Mahout is a set of machine learning algorithms which leverage the Hadoop environment, providing powerful and scalable algorithms. One of its main target is the recommendation algorithms also known as taste collaborative filtering.

Recommendation algorithms have been made famous by websites such as Amazon, Youtube or Netflix. They use it to make suggestions based on what you bought, watched or liked.

Myrrix

MyrrixOne of the author of Taste/Mahout recommender engine, Sean Owen decided to give the engine a more formal structure by building Myrrix.

Myrrix is a recommendation engine based on Mahout. It offers an out of the box configuration for a recommendation engine accessible with a Rest API. The good to know are:

  • Scalable as Mahout and Hadoop are scalable, using computing parallelization and a distributed file system
  • Runs an optimized version of Taste (currently Taste 3)
  • Runs in real time
  • Can be efficient even with a relatively small amount of data

Recommendation process

The first thing you want to do is to feed your model (ie. Your algorithm) with current observed data. The models aggregates users, items and the associations between them. These associations are called preferences and are qualified by their value, describing the strength of the association between the user and the item.

Feeding the engine means pushing every observed associations with the user id, the item id and the strength. You are simply giving the engine your current taste graph, linking users to items via their tastes.

When your engine is fed, you have to ask it to refresh. Thus it will re-analyze the given graph and compute an actualized, and thus better, model. This may take some time, but Myrrix has the ability to continue answer your requests during this time.

Finally, with your shinny model you can ask questions and get recommendations. Here are the main queries:

  • Recommend to a user
  • Recommend to a group of users
  • Recommend to an anonymous user
  • Recommend similar items
  • Estimate the strength of the preference between an user and an item

With such a panel of tools you can easily guess that answering the question “What item users like me also liked?” becomes accessible.

Consuming Myrrix from PHP

PHPIn order to integrate Myrrix results to my recommendation engine, I had to build a PHP Myrrix client. I decided to use the Guzzle library that provides a really neat way of building a PHP client for Rest APIs. You can download my library on the open source Github Project: https://github.com/michelsalib/bcc-myrrix.

After installing the library, you can write some very fancy code:

// Get a client
$this->client = MyrrixClient::factory(array(
    'hostname' => 'localhost',
    'port'     => 8080,
));

// Put a user/item assocation, here use #101 as an association of strength 0.5 with item #1000
$command = $this->client->getCommand('PostPref', array(
    'userID' => 101,
    'itemID' => 1000,
    'value'  => (string)0.5,
));
$this->client->execute($command);

// Refresh the index
$command = $this->client->getCommand('Refresh');
$this->client->execute($command);

// Get a recommendation for user #101
$command = $this->client->getCommand('GetRecommendation', array(
    'userID' => 101,
));
$recommendation = $this->client->execute($command)->json();

Here we instantiate a Myrrix client hosted on the localhost on port 8080. We put into the model a preference of 0.5 between the user #101 and the item #1000. We then ask the model to refresh. Finally we get a recommendation for the user #101. The recommendation result is an array of item id with their estimated strength for the given user.

The library is pretty straight forward and help you leverage in a very simple way all the powerfulness of the Myrrix engine from PHP.

I also made a Symfony Bundle that helps you get the client from the dependency container, and offers a cleaner configuration process: https://github.com/michelsalib/BCCMyrrixBundle.

Don’t hesitate to get the code, install it and test it. I would be very happy to get contributions, feedbacks or feature requests.

Advertisements

About work, open source and commitment

With this post, I want to take some time to explain my commitment to open source, especially to Symfony and how it is tangled with my everyday work.

What do I do?

I am currently developing for my new startup I founded with two of my associates and friends. We knew each other at engineer school ECE Paris, and were always hungry for entrepreneurship. We worked together for years now, and had some ideas on the way. Some were bad, some were better, and one was genial enough to deserve a real leap into the world of startup creation.

And so we created BestComparator.

BestComparator?

BestComparatorIt is service that helps everyday consumer to choose the best product according to his profile, his urges, and the general quality vs. price value of the products available on the market.

Our ultimate goal is to make the best product recommendation answering this simple sentence: “I am Michel, I am a 23 years old nerd, I need a new Laptop, and most of my time I develop, listen to podcasts and surf on the web.”

We worked really hard, developing the service, creating partnership with price comparison services, and defining exactly how the UX/algorithm would look like. We starting working full time on September 2011. We released an early private preview on October and a public beta on March 2012.

The service is finally online since July 2012 and we already have some nice recommendations and the business is smoothly starting.

This is just a start and many thing could be, and will be, better in the coming months. We also have a heavy roadmap, containing very nice (killer) features, more products, better recommendations, an ecosystem, and some other surprises.

So what do I do in that venture?

BestComparator is a 3 men job. While my associates work on the business and the UX, I spend my time coding the foundations of the service. I am the Data, Architecture, Algorithm, Information System guy. AKA the CTO/Lead developer/Cofounder.

The challenge is quite big. We have to aggregate, consolidate and extract meaning from a gigantic amount of data, from products specifications to our member’s behaviors and desires. Those are coming through various funnel, from our partners, the socials networks, what the users say and don’t say… I needed a solid base with a solid promise.

Three years ago I was already a symfony 1.4 junky. This was with tremendous thrill that I started working with Symfony2 nearly two years ago when the first preview appeared. I saw in Symfony2 a very powerful tool moving at a fast pace, bringing many new core concepts to the PHP world (bundles, annotation, DI…), and gathering a very prolific community. Symfony2 is scalable, fast, flexible and stable. Most of all, it made PHP better with a modern architecture and better practices. Its strength is well known and since we adopted it Drupal, eZ Publish, phpBB and many more followed the way.

As a proof of or commitment you can have a look at the credit page at BestComparator.

As you can see we have a quite exhaustive architecture, composed around Symfony2.1 we are running on Doctrine2.3, ElasticSearch, Behat, Sonata, Assetic, Twig and many more.

I also committed to a community

Of course we started when Symfony2 was young, and in some ways it still is young. Along the way, I was involved in some features of the framework (especially around the translation components), and I pushed into the Sonata project for instance. I also developed some standalone bundles such as the BCCExtraToolsBundle, the BCCCronManagerBundle… which are quite famous now.

Developing such bundles requires time. I love to spend it working on open source projects, especially when it matters to people. Thus, I am very happy today when I can work on BCC Bundles, designed for BestComparator and pushed to the open world.

As a bonus I also develop some C# apps for Windows Phone 7: Gi7 and Readr7. They are open source, and some other might come around the Windows 8 platform.

I try to allocate as much energy that I can, following, debugging and improving these projects. That’s my second commitment.

How about a third commitment

Also I am getting married teaching programming at ECE Paris for a year now. Mostly C and C#, I also have some involvement in some web development and Java courses.

I do it partly because it allow me to earn enough money to pursue my work at BestComparator and mostly because I like to share.

I can also tell you that committing to a hundreds of students is not a so easy commitment and it requires time and patience (and patience too).

Follow up

First of all I enquire to go on BestComparator, things are French for the moment, as long as we need some contracts with local resellers. Here are the common way of getting updates on the venture: twitter, facebook, g+.

You can watch and fork my work on github, and pack your PHP projects on packagist. My windows apps are on the market place.

And if you are interesting in a very prolific and fast growing engineer school (with very talent teachers of course), go have a look on the ECE Paris website.

The BCCEnumerableUtility is out

The BCCEnumerableUtility is out

One of the most frustrating things you may encounter as a PHP developer is definitely the lack of clarity and consistency of many of the core array and string manipulation functions.

What goes wrong

Let’s take for example the function that checks if a string contains another string. You might want to use strstr but you should use strpos for performance purpose. You already noticed the lack of clarity in the name of the functions, and the fact that whatever you do, the better way of dealing with strings in php is never straightforward.
You may also know that the case insensitive equivalents of these function adds an i in the middle of their names (stristr, stripos). Again, this is not very explicit. Finally, some functions does not respect the same format, such as str_split or str_replace.

And the same matters occur with the array functions.

As you may know, I also have a .NET background, especially in C#. One of the thing I like is the way they deal with this issue. First of all, strings and arrays are objects… They also have the IEnumerable interface that is implemented by everything that is an enumeration of items (strings are enumeration of characters). They added a bunch a generic methods that extends the IEnumerable interface to provide clear and powerful functionalities. So whenever you have something that is enumerable you leverage automatically tens of filtering, ordering, transformation and manipulation functions.

I really miss IEnumerable when I am using PHP.

The BCCEnumerableUtility

When PHP5.4 came out, I saw in traits a way to port IEnumerable from C# to PHP.

So I mimicked the C# interface and made the Enumerable trait, with the Collection and String classes that leverage the trait.

Let’s avoid taking too much, here is an example with the Collection class that leverage the trait:

select(function($item) { return $item*$item; })->average();

// filter the even numbers and then order
$values->where(function($item) { return $item%2 == 0; })->orderBy();

The Enumerable trait comes with a bunch of nice functions that you can discover on the BBCEnumerableUtility Github repository.

The String class

The library also provides a String class that adds some string dedicated functions:

replace('world', 'pineapple') // replace world by pineapple
->toUpper() // to upper case
->skip(6) // skip the 6 first letters
->takeWhile(function($char) { $char != '!'; }); // take the rest while the char is different from '!'

echo $string; // PINEAPPLE

I know many people won’t like the idea of a String class, so I provide a StringUtility helper that gives you access to all the functionality with static calls:

<?php
use BCCEnumerableUtilityStringUtility;

$string = 'Hello world!';

$string = StringUtility::replace ($string, 'world', 'pineapple'); // replace world by pineapple
$string = StringUtility::toUpper ($string); // to upper case
$string = StringUtility::skip ($string, 6); // skip the 6 first letters
$string = StringUtility::takeWhile($string, function($char) { $char != '!'; }); // take the rest while the char is different from '!'

echo $string; // PINEAPPLE

Wrap up

The BCCEnumerableUtility library will help you manipulate strings and arrays more easily.

As usual, I published this library on Github. I welcome remarks, bugs and contributions 🙂

About me…

As you know, I am a student in ECE Paris. It is an engineer school with a 5 year program. I am going through the last year and I took a specialization in software development and security. My courses also contain some management, software design, user experience, marketing and other useful skills.

I also enjoyed traveling during my studies. I went for a full semester in Canada where I discovered new techniques and way to approach management and software development issues. Also I am currently packing for a seminar in California which will last for a month.

About my skills, I developed a lot for web applications with two main platforms. I have a personal big project on a community web site which I am developing with some friends of mine. It is based on the symphony framework with the Doctrine ORM. I was already familiar with web development in php but this framework made me go on a whole new scale. I took advantage of this project to put into practice Extreme Programming which was a really good experience, not easy at all but definitely interesting.

Also, I am currently finishing an internship that I voluntary spend on a complete different technology: .NET. At the beginning I wasn’t used to the Microsoft platform but I learned and I finally took part in the development as regular member of the team. We are developing a fully configurable web application designed for HR services that should fit the need of many customers. It is a very complex project based on several key technologies of the .NET framework such as asp.mvc, entity framework and unity. The architecture that we are developing also challenges major issues of nowadays developments. Again another agile methodology, we are practicing scrum.

In my personal life I also enjoy playing the guitar, movies, and reading. Well, I just don’t like getting bored.

Welcome to my new blog !

Hi, welcome to my new blog!

Until recently I kept developing my own website based on the symphony framework but I realized that I didn’t have enough time to make it as good as I wanted. Also, with my current internship over the .NET framework and my personal big project on symphony I was really starting to become schizophrenic.

I was considering starting a real pro-blogging activity thus I decided to switch to a real tool: WordPress. Less time loosing reinventing the wheel, more time to share my experience. Some of you might notice that my old work is still available at perso.michelsalib.com.

With this blog, I intend to share on various subjects over several technologies. Symfony, of course, Doctrine, jQuery, but also Microsoft technologies such as asp.mvc, entity, and I hope Silverlight and WM7 development…

Yea, I am schizophrenic!