Pastebin Reloaded!

Well, I promised it waaaay back in january, but I’ve finally released an update to pastebin.com. A few people have asked for the source over the past few months and have seen some of the updates already, but here’s what’s new…

  • MySQL storage replaced with file-based storage, making it much faster
  • Revamped the colour scheme, which has been pretty much the same for 5 years
  • Added a ‘delete post’ feature
  • Switched to Affero GPL licence

If you’ve drifted away from pastebin due it’s lethargic speed, now’s the time to come back! Give it a whirl and if you have any feedback, leave a comment on this post.

Here’s some more detail on the changes…

File based storage

Pastebin used MySQL for storage since it was first launched in 2002. It has steadily grown in popularity, but that popularity began to take its toll on performance in the past 12 months.

Pastebin started out just keeping the last 1000 posts, which kept things zippy. Then I added custom domains, which increased the number of posts being retained, but what really hurt it was adding a common request – permanent posts, which meant that over time, the database grew inexorably larger.

In January I began to wonder if I needed a relational database at all. After all, pastebin is really just a single table application, and there are only two main operations:

  • Fetch post x
  • Get last 10 posts on domain foo

So I refactored the code to allow the storage mechanism to be changed. The new file based mechanism assigns a random identifier to a new post, e.g. abcdefgh and stores it in a structured directory:

posts/<d|m|f>/ab/cd/ef/abcdefgh

The top level directory ‘d’, ‘m’, or ‘f’ is chosen based on the desired lifetime of the post (1 day, 1 month or forever). Garbage collection of the 1 day posts in the ‘d’ directory can thus be carried out by performing a find for files older than a day with something like this running from cron every day:

find /path/to/pastebin/posts/d -mtime +1 -exec rm \{\} \;

To maintain the MRU lists of recent posts, the code maintains a serialized array for each domain. Whenever a post is made, this serialized file is locked, updated and unlocked. This is the only time the code can find itself competing for a shared resource, and even then its on a per-domain basis, rather than for the entire application as with the mysql storage.

As I write, this mechanism has been running for a few hours on the live site, and performance is much improved. At peak times it could take 15-20 seconds to make a post, it’s now much, much zippier!

Revamped Colour Scheme

I thought the old CSS was looking a little tired so I’ve freshened it up a little. I want to avoid adding graphics to the design and just use pure HTML and CSS if possible, which keeps things speedy too.

Comments on it are welcome, it’s likely I’ll tinker with it some more…

Delete Post

This is quite neat I think – if you choose to hit the “remember me” button, you’ll be assigned a random token which is used to mark your posts. This token is stored in a cookie. When you later view a post, if your cookie token and the post token match, you’ll be offered the opportunity of deleting the post.

I like this as you don’t have to go entering a password or setting up an account – it just works.

As always, if you’ve made a post you want removing and this feature doesn’t do it for you, just ask and I’ll take care of it

Changed to Affero GPL

The last few releases of pastebin used the GPL licence. Trouble is, while the GPL guarantees access to the source if you receive a binary copy of the software, with a website that doesn’t happen. The Affero GPL is a modified version of the GPL which contains an extra clause guaranteeing your access to the source when you interact with the software over a network.

So if you use pastebin in your own site, or adapt it further, you must continue to offer that source to your users. Lovely

What’s next?

Well, now that pastebin is actually usable again, I’m on a roll. The code has partially complete support for translation, and I’ve an army of volunteers ready to translate, so that’s the next goal…

15 thoughts on “Pastebin Reloaded!

  1. Stan

    Very interesting! I wonder if sqlite would be a reasonable option for you also? Postgres might be worth looking at too, IMO it’s much better for heavy lifting than mySQL. Most production environments I work with use Postgres or Sqlite because of MySQL’s less than stellar performance at times.

    I took a look at the source, you have the opportunity to use your backend drivers via a factory pattern – might be worth trying if you continue to expand the driver base.

    Keep up the good work!

  2. lordelph Post author

    I did consider changing to a DB with row level locking, such as MySQL with InnoDB tables or PostGres, but what really threw it was the realisation that I just didn’t need the overhead of an RDBMS.

    I’ve no plans to expand the storage drivers unless people want to contribute some though…

  3. Slepp

    Good to see you finally got pastebin.com back off its knees.. I was starting to wonder if you gave up.

    In regards to the RDBMS, PostgreSQL is serving Pastebin.ca quite well these days (it took some tuning :), and SQLite was a horrible option.. It was about 6 times slower for the load styles a Pastebin endures (and yes, I’m a huge fan of SQLite in general).

    Then again, your bin and my bin are quite different beasts these days, I don’t think I could get by with a flat file storage methodology. That and I’m and SQL junkie.

    Anyhow, like the new look too :>

  4. .Lou

    I’ve sent you a feedback message, but I figured that it’d be good if I commented here as well.

    I think it would be awesome if private pastebins could have owners. These owners could edit, delete, or do whatever they want, to the posts in that pastebin. Though this would probably require a login/registration system and stuff.

    You can always reply to me by email about this, as I may forget that I posted this.

    Thanks!

  5. lordelph Post author

    Download link fixed – thanks for the heads up.

    As for having owners, it’s something I’ve been thinking about – I thought it would be nice to offer CSS customisation and the ability to add extra links and info the sidebar. I’ll see what I can do!

  6. Pingback: LordElph’s Ramblings » Pastebin - Turbo Boost Success!

  7. DannyIsOnFire

    Glad to see Pastebin up and running again.
    Just made use of the new delete feature, which i want to add, is an absoloutley excellent edition.
    Keep up the good work!

  8. simon day

    Hey i have a suggestion…
    why not add support for reverse popups

    on some forums you can not paste all you want
    or embed code due to limits by the forum linkedin.com is a good example the linkedin.pastebin.com system is nice
    what would be good is a url something like
    linkedin.pastebin.com/pop/xyzvwq
    where the refering link is detected the page is poped as a popup and the refering page is reopened

    email me thoughts / questions

  9. Tack

    Hey,

    I’ve just recently started using pastebin and I love it (just got a private URL 5 minutes ago) but I think I’m going to end up wanting to use it in the same way I use ImageShack.

    For ImageShack, I have an extension to the standard windows explorer shell that allows me to right click an image file and do “Send to ImageShack” and it automatically uploads it and then gives me a URL directly to the image file. I’d like to be able to do the same thing with a snippet of text (slightly different, of course, since I don’t always want to paste an entire 3000 line file if 10 lines of code are in question) perhaps from the same menu used for cut/copy/paste.

    Does anyone know if a windows shell extension that will do this? Google (for the third time in my entire life and/or the last 7 years I’ve used it) is no help on this, so I’m asking you, the pastebin blog users.

    Also, is anyone skilled enough in C++ or C# to attempt it if there isn’t one?

  10. lordelph Post author

    I could do fairly easily – I wrote a tool years back called URLMenu which had a nifty feature – if you did a quick double-copy to the clipboard, the text was scanned for URLs and added to your bookmarks.

    Could easily adapt the idea so that a double copy posted to pastebin, then replace the clipboad with the URL.

    Just imagine – highlight code, Ctrl+C, Ctrl+C wait for a beep, then Ctrl+V into IRC!

  11. Zash

    A suggestion: for the “immortal” pastes, have a cronjob like “@monthly find /path/to/pastes/f/ -type f -atime +365 -delete”
    That would delete files that no one has accessed in the last year.

Comments are closed.