My take on the Mahout and Myrrix recommendation algorithms

MahoutWhereas BestComparator has his own recommendation engine based on user profiling, behavior analysis and analysis of product specs, I recently wanted to explore the possibilities of the famous recommendation engine built inside Mahout.

First of all, Mahout is a set of machine learning algorithms which leverage the Hadoop environment, providing powerful and scalable algorithms. One of its main target is the recommendation algorithms also known as taste collaborative filtering.

Recommendation algorithms have been made famous by websites such as Amazon, Youtube or Netflix. They use it to make suggestions based on what you bought, watched or liked.

Myrrix

MyrrixOne of the author of Taste/Mahout recommender engine, Sean Owen decided to give the engine a more formal structure by building Myrrix.

Myrrix is a recommendation engine based on Mahout. It offers an out of the box configuration for a recommendation engine accessible with a Rest API. The good to know are:

  • Scalable as Mahout and Hadoop are scalable, using computing parallelization and a distributed file system
  • Runs an optimized version of Taste (currently Taste 3)
  • Runs in real time
  • Can be efficient even with a relatively small amount of data

Recommendation process

The first thing you want to do is to feed your model (ie. Your algorithm) with current observed data. The models aggregates users, items and the associations between them. These associations are called preferences and are qualified by their value, describing the strength of the association between the user and the item.

Feeding the engine means pushing every observed associations with the user id, the item id and the strength. You are simply giving the engine your current taste graph, linking users to items via their tastes.

When your engine is fed, you have to ask it to refresh. Thus it will re-analyze the given graph and compute an actualized, and thus better, model. This may take some time, but Myrrix has the ability to continue answer your requests during this time.

Finally, with your shinny model you can ask questions and get recommendations. Here are the main queries:

  • Recommend to a user
  • Recommend to a group of users
  • Recommend to an anonymous user
  • Recommend similar items
  • Estimate the strength of the preference between an user and an item

With such a panel of tools you can easily guess that answering the question “What item users like me also liked?” becomes accessible.

Consuming Myrrix from PHP

PHPIn order to integrate Myrrix results to my recommendation engine, I had to build a PHP Myrrix client. I decided to use the Guzzle library that provides a really neat way of building a PHP client for Rest APIs. You can download my library on the open source Github Project: https://github.com/michelsalib/bcc-myrrix.

After installing the library, you can write some very fancy code:

// Get a client
$this->client = MyrrixClient::factory(array(
    'hostname' => 'localhost',
    'port'     => 8080,
));

// Put a user/item assocation, here use #101 as an association of strength 0.5 with item #1000
$command = $this->client->getCommand('PostPref', array(
    'userID' => 101,
    'itemID' => 1000,
    'value'  => (string)0.5,
));
$this->client->execute($command);

// Refresh the index
$command = $this->client->getCommand('Refresh');
$this->client->execute($command);

// Get a recommendation for user #101
$command = $this->client->getCommand('GetRecommendation', array(
    'userID' => 101,
));
$recommendation = $this->client->execute($command)->json();

Here we instantiate a Myrrix client hosted on the localhost on port 8080. We put into the model a preference of 0.5 between the user #101 and the item #1000. We then ask the model to refresh. Finally we get a recommendation for the user #101. The recommendation result is an array of item id with their estimated strength for the given user.

The library is pretty straight forward and help you leverage in a very simple way all the powerfulness of the Myrrix engine from PHP.

I also made a Symfony Bundle that helps you get the client from the dependency container, and offers a cleaner configuration process: https://github.com/michelsalib/BCCMyrrixBundle.

Don’t hesitate to get the code, install it and test it. I would be very happy to get contributions, feedbacks or feature requests.

Advertisements

The BCCEnumerableUtility is out

The BCCEnumerableUtility is out

One of the most frustrating things you may encounter as a PHP developer is definitely the lack of clarity and consistency of many of the core array and string manipulation functions.

What goes wrong

Let’s take for example the function that checks if a string contains another string. You might want to use strstr but you should use strpos for performance purpose. You already noticed the lack of clarity in the name of the functions, and the fact that whatever you do, the better way of dealing with strings in php is never straightforward.
You may also know that the case insensitive equivalents of these function adds an i in the middle of their names (stristr, stripos). Again, this is not very explicit. Finally, some functions does not respect the same format, such as str_split or str_replace.

And the same matters occur with the array functions.

As you may know, I also have a .NET background, especially in C#. One of the thing I like is the way they deal with this issue. First of all, strings and arrays are objects… They also have the IEnumerable interface that is implemented by everything that is an enumeration of items (strings are enumeration of characters). They added a bunch a generic methods that extends the IEnumerable interface to provide clear and powerful functionalities. So whenever you have something that is enumerable you leverage automatically tens of filtering, ordering, transformation and manipulation functions.

I really miss IEnumerable when I am using PHP.

The BCCEnumerableUtility

When PHP5.4 came out, I saw in traits a way to port IEnumerable from C# to PHP.

So I mimicked the C# interface and made the Enumerable trait, with the Collection and String classes that leverage the trait.

Let’s avoid taking too much, here is an example with the Collection class that leverage the trait:

select(function($item) { return $item*$item; })->average();

// filter the even numbers and then order
$values->where(function($item) { return $item%2 == 0; })->orderBy();

The Enumerable trait comes with a bunch of nice functions that you can discover on the BBCEnumerableUtility Github repository.

The String class

The library also provides a String class that adds some string dedicated functions:

replace('world', 'pineapple') // replace world by pineapple
->toUpper() // to upper case
->skip(6) // skip the 6 first letters
->takeWhile(function($char) { $char != '!'; }); // take the rest while the char is different from '!'

echo $string; // PINEAPPLE

I know many people won’t like the idea of a String class, so I provide a StringUtility helper that gives you access to all the functionality with static calls:

<?php
use BCCEnumerableUtilityStringUtility;

$string = 'Hello world!';

$string = StringUtility::replace ($string, 'world', 'pineapple'); // replace world by pineapple
$string = StringUtility::toUpper ($string); // to upper case
$string = StringUtility::skip ($string, 6); // skip the 6 first letters
$string = StringUtility::takeWhile($string, function($char) { $char != '!'; }); // take the rest while the char is different from '!'

echo $string; // PINEAPPLE

Wrap up

The BCCEnumerableUtility library will help you manipulate strings and arrays more easily.

As usual, I published this library on Github. I welcome remarks, bugs and contributions 🙂

Welcome the BCCCronManagerBundle, the Symfony2 bundle that helps you managing your crons

Welcome the BCCCronManagerBundle, the Symfony2 bundle that helps you managing your crons

One thing I don’t like to do when maintaining a website, is having to pull out ssh on a daily basis in order to check that everything is running fine. A basic use case is scheduling and watching crons.

So I recently came out with the idea of building a web interface wrapping the use of the crontab command with some tools for watching associated log files.

And thus, the BCCCronManagerBundle was born.

A quick presentation

The BCCCronManagerBundle can already do some nice things:

  • display the cron entries of the cron table, parsing time expression, command, output file, error file and comment
  • guess the last execution time and status
  • display log files
  • add, edit and remove cron entries

The bundle is localized in english and french. The forms also include some shortcuts t easily build common time expression, launch a symfony command or log in the symfony log directory.

The cron list
The cron list
The cron form
The cron form
A cron log file
A cron log file


How it works

Actually, the architecture is quite simple. Everything relies on two classes: CronManager and Cron.

The CronManager launches the [cci]crontab -l[/cci] command in constructor, then extracting each lines in order to build a collection of Cron instance. It has get, add and remove methods in order to access the Cron collection. A raw method build up the cron table string based on the Cron collection and a write method puts it into a temporary file before launching the [cci]crontab $file[/cci] command.

The Cron class is instantiated using the parse static method. Its job is to parse a cron line from the cron table and extract the time expression, command, output, error output and comment (if defined). Based on the output files it can guess if the cron has already been runned (one of the output files is present) and if it was successful or not (the error file is empty). A getExpression method can build the time expression and the _toString is overriden in order to give the cron representation for the cron table.

The interface is quite neat, thanks to Sam. He helped me implementing the twitter bootstrap which is very powerfull and elegant. I also decided to make use of the jQuery plugin that is quite impressive and can easily replace jQueryUI on some points.

Wrap up

You can download and install the bundle on the github : https://github.com/michelsalib/BCCCronManagerBundle.
I also welcome contributions for any improvement, such as a better cron table parsing, more options for cron definition (such as log files), better support of multi platforms, or translations.

BCCExtraToolsBundle for Symfony2 now includes new features

BCCExtraToolsBundle for Symfony2 now includes new features

The purpose of this blog post is to review the last features that have been included into the BCCExtraToolsBundle.
As you may know, its current main feature is a command that extract translation string from your template and dump them into your translation files. It has been so popular that the code has been adapted and merged it into the code Framework a few months ago with the help of the community.
Sadly, I will eventually remove this functionality from the bundle when the 2.1 version will come out. Hopefully, I recently add new cool stuffs into the Bundle recently that you may find useful.

A date parser

You may know that the Bundle currently includes a date formatter that provides you a nice way to localize your dates directly with a twig filter.
I reviewed the code recently to provide a way to make the operation working the other way. You can now do such things :

[cc_php]
get(‘bcc_extra_tools.date_formatter’);

// obtain a datetime instance for a normally formated string
$date = $dateFormatter->parse(‘November 1, 2011’);

// obtain a datetime instance using a defined locale
$date = $dateFormatter->parse(‘1 Novembre 2011, 20;14’, ‘fr’);

// obtains a datetime instance with a uncommon string
$date = $dateFormatter->parse(‘Nov. 2011);

[/cc_php]

Basically, the code is trying every parsing it can using the formats available for the formatter (more information here http://www.michelsalib.com/2011/07/a-twig-extension-that-translates-countries-and-dates/) and more so that it can parse almost anything.
Of course it can be very handy when you consume some weird API using nonstandard datetime formatting.

A unit converter

About consuming API. I can be very annoying to find some values in a unit don’t want to use. The unit formatter is here to help you:

[cc_php]

get(‘bcc_extra_tools.unit_converter’);

// transform a value knowing the source and destination units
echo $unitConverter->convert(1000, ‘m’, ‘km’); // echoes : 1

// transform a value knowing only the destination units
echo $unitConverter->guessConvert(‘1h’, ‘m’); // echoes : 60

[/cc_php]

To do such work, your value will go through different phases::
– A chain unit converter will try to convert using its DI registered unit converters
– A ratio unit converter which converts units that are strictly proportional will take care of most your conversion
– The ratio unit converter supports different kind of units registered using DI, such as length, weight, speed, time… those are called ratio unit providers
– Ratio units providers defines units and ratios related to a specific unit kind. It includes also the locale where the unit is applicable and the prefixes that can be associated to the unit. For instance, the computer capacity unit kind will define the octet, the bit and the byte with their corresponding conversion ratio and the applicable prefixes (k, M, G…).

With such a structure, the converter insure you that any prefix is taken into account, that the conversion is done between coherent units (it won’t try to convert time into money… sadly), and it also takes into account your current locale (the ‘metre’ is the french unit for the ‘meter’).

I’ll try to add more units and documentation in the coming posts.

Wrap up

The BCCExtraToolsBundle continues to provide useful features to the developers. Get the code of the datetime parser and the unit converter on the BCCExtraToolsBundle github : https://github.com/michelsalib/BCCExtraToolsBundle. Don’t hesitate to report issues, feedback and improve the code 🙂

Introduce the BCCAutoMapperBundle for Symfony2

Introduce the BCCAutoMapperBundle for Symfony2

One of my favorite tools with my .NET developments is definitely AutoMapper. It allows to easily mapping objects to other objects by generating default maps for graph of objects base on the name of the different members.

I found it quite a shame that such a tool did not exist on the PHP platform. So I started a new Symfony2 bundle to palliate this lack.

How the AutoMapperBundle works

You can find it here: https://github.com/michelsalib/BCCAutoMapperBundle.

As a quick example here to show you how you can map objects together. Here is your model:

[cc_php]
name = $name;
$this->description = $description;
$this->author = new SourceAuthor($author);
}

public getName() {
return $this->name;
}

public getDescription() {
return $this->description;
}

public getAuthor() {
return $this->author;
}
}

class SourceAuthor {
private $name;

public __construct($name) {
$this->name = $name;
}

public getName() {
return $this->name;
}
}

class DestinationPost {
private $title;
private $description;
private $author;

public getTitle() {
return $this->title;
}

public setTitle($title) {
$this->title = title;
}

public getDescription() {
return $this->description;
}

public setDescription($description) {
$this->description = description;
}

public getAuthor() {
return $this->author;
}

public setAuthor($author) {
$this->author = author;
}
}
[/cc_php]

Then, in you application:

[cc_php]
get(‘bcc_auto_mapper.mapper’);
// create default map and route members
$mapper->createMap(‘MySourcePost’, ‘MyDestinationPost’)
->route(‘title’, ‘name’)
->route(‘author’, ‘author.name’);

// create objects
$source = new SourcePost(‘AutoMapper Bundle’, ‘A great bundle’, ‘Michel’);

// map
$destination = $mapper->map($source, $new DestinationPost());

echo $destination->getTitle(); // outputs ‘AutoMapper Bundle’
echo $destination->getDescription(); // outputs ‘A great bundle’
echo $destination->getAuthor(); // outputs ‘Michel’
[/cc_php]

Read the doc on the github to find more explanation and more functionalities: https://github.com/michelsalib/BCCAutoMapperBundle.

This is a very light implementation and some cases are not covered yet (such as dynamic graph creation). Please, don’t hesitate to fork me and provide feedback.

Meet Readr7: A new Google Reader client for windows phone 7

Meet Readr7: A new Google Reader client for windows phone 7

As you may know, I am currently developing a Github client for windows phone 7, named Gi7. It is not complete yet even if it is functional. During that development, a friend of mine asked me for a Google Reader client.

As long as I am also using Google Reader and there is not really nice free app that is better than the mobile website, I decided to take au pause in the development of Gi7 to start a new app named Readr7. Hopefully the app is smaller and I shall continue the development of Gi7 soon.

What it does

Stream
Stream

The Readr7 app just does what a light Google Reader client should do.
– You can login and logout from your Google account.
– Then you have all your unread items with abstracts. They are automatically marked as read when scrolling the feed.
– You can of course access a configuration panel to selected folder and choose to see also marked as read items.
– Just tap on an item to open IE or make a long tap to manually mark as read/unread.
– Last but not least, if you pin the app to your home, the tile will be updated every 30 minutes with your unread items count.

About the development

As usual, I published the code to Github here. It shares some stuffs and concepts with the Gi7 app. You can of course get the code and contribute to the repo.

The code is almost finished. I just need to review some design (get a logo for instance), and bugs. About the last point, it is very important for me to get some external feedback, especially from developers who may run into error I did not though about and thus help be debug some issues.

About Google Reader, the bad thing is that it doesn’t have any official API. You can find some doc, but things may change without any advance notice. Whatever, the best doc I found is here: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI.

As usual, I used my favorite tools:
RestSharp for fast access to the Google Reader API with easy json deserialization
MVVM light toolkit for easy use of the MVVM pattern
Silverlight Toolkit for Windows Phone that I recompiled for mango

Last but not least, I add a post build instruction to copy the xap file to the root of the solution so that you can directly download the last app version on github.

Meet Gi7 : A new github app for Windows Phone 7

Meet Gi7 : A new github app for Windows Phone 7

As you may know, I am not only a Symfony2 developer. I also enjoy a lot coding C#, especially Silverlight on the Windows Phone platform. Most of the time, I try to keep working on both technologies because they bring me complementary knowledge about good practices and design patterns. Also it prevents me from getting bored too fast on a project.
I am currently finishing my internship where I have a Silverlight 4 project using RIA Services, Prism and other cool stuffs. Thus I am looking into a new Silverlight project which is a Windows Phone 7 application called Gi7.

Gi7, the project

Gi7 is a github client for windows phone 7 that is also hosted on github.
As a matter of fact, I hope to get some help, feedback about some part of the code. I tried to make of the best practices I know, but I know things are still not perfect. Also I am not a so good UI designer, and I could use some help on the subject.
What it does
Github 7 intends to be a complete mobile application view of github. You should consult everything you need, but also post comments, accept pull request, follow users, edit a repo or a file…
It should also have a non logged part (which is not the case right know) to browse the gits and users.
For the moment you can do some nice things:
– Login/logout
– Read your news feed, get your stats, followers, followings, owned and watched repos

Homepage - newsfeed
Homepage - newsfeed
Homepage - profile
Homepage - profile



– Consult another’s user profile, with stats, followers, followings and repos

User - details
User - details
User - repos
User - repos

User - users
User - users



– Consult a repo with stats, commits, pull requests and issues

Repo - details
Repo - details
Repo - commits
Repo - commits
Repo - pull requests
Repo - pull requests



– Read a commit (in heavy dev)

The development

Actually I started the development last week on my free time, and it is going at a fast pace. I intend to have an almost full featured application around the end of august.
The application is built for the 7.1 version of windows phone 7 (aka. Mango) so that it already supports all the features of the new sdk. I make use of the MVVM light Toolkit and the RestSharp library.
Technically, I heavily make use of the MVVM design pattern with some Dependency Injection. I intend to make Gi7 a cutting-edge project while keeping things lightweight.
The Gi7 app has already nice development features, such as:
– auto caching of images.
– a Github client that automatically cache every requests and provide a very easy way to de-serialize the json api responses. Also the clients interface is very easy to bind to a view model.

Get involved, get the app

Again, I am looking for contributors and feedback about the application. Feel free to fork me, report issues, submit pull requests or ask for new functionalities: https://github.com/michelsalib/Gi7.
I intend to submit the app to the marketplace when the mango is officially released.