Daily Archives: July 10, 2007

Pastebin Reloaded!

Well, I promised it waaaay back in january, but I’ve finally released an update to pastebin.com. A few people have asked for the source over the past few months and have seen some of the updates already, but here’s what’s new…

  • MySQL storage replaced with file-based storage, making it much faster
  • Revamped the colour scheme, which has been pretty much the same for 5 years
  • Added a ‘delete post’ feature
  • Switched to Affero GPL licence

If you’ve drifted away from pastebin due it’s lethargic speed, now’s the time to come back! Give it a whirl and if you have any feedback, leave a comment on this post.

Here’s some more detail on the changes…

File based storage

Pastebin used MySQL for storage since it was first launched in 2002. It has steadily grown in popularity, but that popularity began to take its toll on performance in the past 12 months.

Pastebin started out just keeping the last 1000 posts, which kept things zippy. Then I added custom domains, which increased the number of posts being retained, but what really hurt it was adding a common request – permanent posts, which meant that over time, the database grew inexorably larger.

In January I began to wonder if I needed a relational database at all. After all, pastebin is really just a single table application, and there are only two main operations:

  • Fetch post x
  • Get last 10 posts on domain foo

So I refactored the code to allow the storage mechanism to be changed. The new file based mechanism assigns a random identifier to a new post, e.g. abcdefgh and stores it in a structured directory:

posts/<d|m|f>/ab/cd/ef/abcdefgh

The top level directory ‘d’, ‘m’, or ‘f’ is chosen based on the desired lifetime of the post (1 day, 1 month or forever). Garbage collection of the 1 day posts in the ‘d’ directory can thus be carried out by performing a find for files older than a day with something like this running from cron every day:

find /path/to/pastebin/posts/d -mtime +1 -exec rm \{\} \;

To maintain the MRU lists of recent posts, the code maintains a serialized array for each domain. Whenever a post is made, this serialized file is locked, updated and unlocked. This is the only time the code can find itself competing for a shared resource, and even then its on a per-domain basis, rather than for the entire application as with the mysql storage.

As I write, this mechanism has been running for a few hours on the live site, and performance is much improved. At peak times it could take 15-20 seconds to make a post, it’s now much, much zippier!

Revamped Colour Scheme

I thought the old CSS was looking a little tired so I’ve freshened it up a little. I want to avoid adding graphics to the design and just use pure HTML and CSS if possible, which keeps things speedy too.

Comments on it are welcome, it’s likely I’ll tinker with it some more…

Delete Post

This is quite neat I think – if you choose to hit the “remember me” button, you’ll be assigned a random token which is used to mark your posts. This token is stored in a cookie. When you later view a post, if your cookie token and the post token match, you’ll be offered the opportunity of deleting the post.

I like this as you don’t have to go entering a password or setting up an account – it just works.

As always, if you’ve made a post you want removing and this feature doesn’t do it for you, just ask and I’ll take care of it

Changed to Affero GPL

The last few releases of pastebin used the GPL licence. Trouble is, while the GPL guarantees access to the source if you receive a binary copy of the software, with a website that doesn’t happen. The Affero GPL is a modified version of the GPL which contains an extra clause guaranteeing your access to the source when you interact with the software over a network.

So if you use pastebin in your own site, or adapt it further, you must continue to offer that source to your users. Lovely

What’s next?

Well, now that pastebin is actually usable again, I’m on a roll. The code has partially complete support for translation, and I’ve an army of volunteers ready to translate, so that’s the next goal…