Category Archives: Software Development

Anything to do with web / software development

My experiences taking the Machine Learning on Coursera

Machine LearningFor the past 10 weeks, all my spare time was devoted taking a course on Coursera…

Coursera is a provider of Massive Open Online Courses (or MOOCs). For the past 10 weeks I’ve taken a course in Machine Learning put together by Professor Andrew Ng at Stanford University, one of the founders of Coursera.

Course structure

The course lasts ten weeks. Each week a new set of video materials are released. Typically there will be 5-6 video lectures lasting around 10-15 minutes. Here’s how each week breaks down:

  1. Introduction (40 mins) Linear Regression (75 mins) Linear Algebra Refresher (61 mins) = ~3hrs
  2. Introduction to Octave 1hr 21mins
  3. Logistic Regression (71 mins) Regularization (30 mins) = 1hr 41 mins
  4. Neural Networks 63 mins
  5. Neural Networks 70 mins
  6. Advice for applying Machine Learning (64 mins) System Design (60 mins) = ~2hrs
  7. Support Vector Machines 1hr 39 mins
  8. Clustering (39 mins) Dimensionalty Reduction (68 mins) = 1hr 47 mins
  9. Anomaly Detection (91 mins) Recommender Systems (59 mins) = 2hrs 30 mins
  10. 116 Large Scale Machine Learning (64 mins) Photo OCR Example (52 mins) = ~2hrs

Most of the videos feature a simple multiple choice question to ensure you’ve understood the concepts introduced in the video. I kept a notebook of handwritten notes while watching the videos, which meant I had to pause occasionally, but I found myself spending 2-3 hours a week on this lecture material.

Review questions

Each week also features some review questions on the week’s material. This takes the form of a graded multiple choice quiz. If you get some questions wrong, you can make another attempt after a short “cooling off” period to prevent quick fire guessing. The questions do vary slightly with each attempt, and I found that they did help cement my understanding of the material.

Programming Exercises

The course features a number of Octave programming exercises which were very well put together. Generally, the exercises involved implementing a particular learning algorithm. Your work is automatically assessed by submitting it to an automated test environment.

Each exercise included a lot of supporting code to walk through a particular algorithm with graphed visualizations of the algorithms progress to aid understanding. Once the graded part of the exercise is complete, there is plenty of scope for exploring further on your own.

Discussion Forums

These are particularly useful. Because of the scheduled nature of the course, there are hundreds of other students all exposed to the same materia. If you have problems, chances are, others have the same struggles you have. Whenever I had a query, I found the forums were a rapid source of help.

Personal experiences

I found the course very well structured, and enjoyed the experience of having submission deadlines; I eagerly awaiting the release of new materials each week! Had the course been self-paced, I may have glossed over some of the early material without appreciating the insights it gives later on.

It may be “old school” but I found keeping a handwritten notebook a very useful aid to staying focussed on the lecture material. It also made it easy to review and reflect on older material. I used an A5-sized spiral bound notepad and now have a handwritten, 150 page Machine Learning textbook!

Jedi Mind TrickAs for the course content itself: It helps to be comfortable with linear algebra. I was a bit rusty, but there’s some course content aimed at getting you back up speed. Learning how to vectorize machine learning algorithms is one of the key skills you’ll pick up on this course. It also helps to have some calculus knowledge – the course doesn’t specifically teach this, so some of the explanations can be a bit of Jedi handwaving! It’s enough to remember basic differentiation concepts though.

What next?

I’d like to try some more Coursera courses, I’d highly recommend it! However, my spare time is limited and I’d really like to spend some time playing around with machine learning…maybe I’ll release something here soon!

Point Break or Bad Boys 2?

Point Break or Bad Boys 2?I’ve always wanted to try “crowdsorting”, that is, using a large number of subjective comparisons to order a large set of data.

People are good at comparing things, but not at assigning a fixed score or estimate. It takes a lot of rigour to be consistent with scoring things, for example, giving movies a score out of 10. If I give Star Wars a score of 10, and I think Spinal Tap is better, where do I go? I can’t go one louder!

So, I wondered what would happen if I took a list of movies, and pulled two of them at random and asked people to choose which they’d prefer to watch.

Point Break or Bad Boys 2 was born – give it a try, then come back to read the rest!

(Incidentally, the name comes from a line in Hot Fuzz, in case you wondered).

I try to learn a few things with each of these toys. Here’s what I covered doing this…

Crowdsorting with an Elo ranking

Elo was originally a method of ranking chess players. All players start with a rank of 1600, and after a match the losing player loses some of their ranking points to the winner. The number of points is proportional to probability of the win. So, if a low ranked player beats a high ranked one, they get more points than if they beat someone with a similar rank.

The movie sorting uses exactly the same algorithm. We show two films, and which ever one is picked is the ‘winner’ and rises up the rankings.

The code records all the comparisons so that I can ‘replay’ games with different k-values to see how it affects the ranking. As I write this, I’ve only got a few hundred comparisons logged. Once I have more than a few thousand I’ll see what analyses can be drawn from the data.

In the meantime, the top 100 chart can be viewed here.

Integrating with Facebook

I thought it might be fun to make this a social game, so to play you must login with Facebook. This kind of integration can be done with Javascript, but didn’t want the front end to be too heavy with JS so I opted for a server-side approach.

First, I added the Facebook PHP SDK to composer.json

{
    "require": {
         "facebook/php-sdk": "dev-master",
         ...
    }
}

Then ran composer update to fetch the necessary code. Next, I added a facebook service to my Silex application after creating a application id on the Facebook Developer portal:

$app['facebook_app_id']='....';
$app['facebook_secret']='....';

$app['facebook'] = $app->share(function ($app) {
    return new Facebook(
        array(
            'appId' => $app['facebook_app_id'],
            'secret' => $app['facebook_secret'],
        )
    );
});

if you’re not familiar with Silex, this just ensures the Facebook class is created on demand and is a shared instance.

Now making the login is fairly simple. Here’s a simplified example of the handling for the front page in Silex…

$app->get('/', function (Silex\Application $app, Request $req) {

    $host=$req->getHost();
    
    //are we logged in? 
    $user = $app['facebook']->getUser();
    if ($user) {
        
        //calculate a logout url which will return to /logout
        $logoutUrl = $app['facebook']->getLogoutUrl(array(
            'next' => "http://$host/logout", 
        ));

        //render main game page
        return $app['twig']->render('game.twig', array(
            'logout' => $logoutUrl,
        ));
    } else {

        //we are not logged in - calculate a login url
        $loginUrl = $app['facebook']->getLoginUrl(array(
            'redirect_uri' => "http://{$host}/login"
        ));

        //show page inviting login
        return $app['twig']->render('index.twig', array(
            'fblogin' => $loginUrl,
        ));
    }

});

The /login route is pretty simple. We get bounced back here after logging into Facebook. If the SDK can give us a user id, then the login was successful. Otherwise, it failed…

$app->get('/login', function (Silex\Application $app, Request $req) {
    $user = $app['facebook']->getUser();
    if ($user) {
        return $app->redirect('/');
    } else {
        return 'login failed';
    }
});

Finally, the /logout route just does some cleanup to ensure we are completely logged out

$app->get('/logout', function (Silex\Application $app, Request $req) {
    $app['facebook']->destroySession();
    return $app->redirect('/');
});

That’s it! The application logs the user_id of each person who makes a comparison, but doesn’t store any names. It might be interesting to offer some feedback based on how closely your choses match those of your friends, but that’s for another day.

Layout using 960.gs

I wanted to begin adopting a common layout for these toys, so I used a fairly lightweight grid layout from 960.gs. This very easy to implement, it only took a few minutes to pick it up and rework my templates. They also provide some printable templates which are great for sketching rough layouts.

The grid is just a first step – with my next project I’ll see how I can pull in some shared templates and css through composer.

Vertical alignment of text

This is a small thing, but I wanted to vertically align the film titles inside a rectangle. Absolute Horizontal And Vertical Centering In CSS by Stephen Shaw of Smashing Magazine provided a very neat solution.

Summary

A fun experiment, many things learned, and I’ll be keen to explore the data some more once I’ve collected a few thousand comparisons!

Byzantime – a historical snippet for every minute of the day

oldbookA couple of days ago, a colleague remarked how his wife was able to relate the time in the morning to a year in the life of the Byzantine Empire….

“0811 [pause]. Year of the Bulgarian massacre of the troops of Nicephorus I”

I needed a little project, and so Byzantime was born – for every minute of the day, it will display a historical event by interpreting the time as a year.

http://byzantime.dixo.net

Under the hood

While it’s fun in itself, what I really wanted to play with was Silex. This is a microframework based on Symfony components, aimed at making single-file apps like this concise and testable.

We’ve been using Symfony at Alexander Street Press for over a year, and I’m impressed at how it’s really helped us raise our game when it comes to quality of engineering. But when you need to make something smaller, it feels like overkill.

So, I gave Silex a try. I’m pretty impressed with the results. It’s certainly concise – to give you an idea, the page needs to request some JSON containing events for the current hour. Here’s how the routing for that is handled using Silex:

$app->get('/event/{year}', function (Silex\Application $app, $year) {

    $start=floor($year/100)*100;
    $end=$start+59;

    $events=$app['historian']->getEvents($start, $end);

    return $app->json($events);

})->assert('year', '\d+');

Only 6 lines of code, but a lot is going on here. Firstly, we’re defining the route for our AJAX request defined as /event/yyyy, where yyyy is the year we’re interested in. This parameter is passed to our handler closure..

The next two lines just do a little arithmetic.

Then we reference $app['historian'], which obtains a previously configured service class from the Pimple dependancy injection container provided by $app. Much like a Symfony service, if we never use it, it won’t be created. Having got the service, I obtain an array of events.

We want to return that data as JSON, and Silex provides a hander helper to do just that.

Finally, you’ll see theres a chained call to assert our year parameter is numeric. If it’s not, the closure would not be executed.

Look at that – it look far longer to describe in English!

Conclusion

I recently dusted off the original pastebin.com code to stick up on github. While PHP gets a fairly bad rap, when I compare old code like that to something taking advantage of current technologies like Silex, Composer and Doctrine, it makes me smile. This is a good time to be working with PHP!

Comments on Byzantime are welcome, I think I’ll probably use it as the basis for further experiments…