Tuesday, November 10, 2009

ZPlanner Technology Brief: Mercurial

If you've been following this blog for any period of time at all, you're probably aware of my little personal project called ZPlanner, an Agile project tracking tool. I've listed some of the technologies I chose for the project in past entries, and figured it might be interesting to devote a blog entry to one of them now and again.

Today, I'll give a very brief view of the revision control system I chose, Mercurial. This is by no means a comprehensive overview, there are plenty already floating around on the web and Mercurial's official documentation is great. I see no need to tread the same ground, so I'm more just going to give a very high level overview, why I think it's worth your time to look at Mercurial in more depth, and my own subjective opinions.

In the last few years there has been a dramatic uptick in interest around distributed version control systems. There are a number of competitors in the space, many of which have very similar feature sets. Some of the earlier entries in the field, such as Darcs, seem to be waning in popularity, wherease a few have clearly emerged as leaders in the space. The two biggest names in distributed version control are now GIT and Mercurial.

I'll be upfront, I've not used GIT myself, so I'll stear clear of too much in the way of comparison. What I do mention comes from Mercurial proponents, the validity and objectivity of which I leave to you to assess. The main reason I mention GIT at all is when I was deciding what system to use, it basically came down to a choice between the two.

That said, one of the big reasons I've made the progress I have on ZPlanner is that I try not to obsess too much about technological decisions. One should always do a little research, but at some point it's much better to just pick something and move on than to obsess endlessly about such things. I'm sure if I'd picked GIT instead, I'd be just as far along and it likely would have made little difference in the grand scheme of things.

Pretty much all of my career has been spent using CVS--unless I count an early and unfortunate encounter with Visual SourceSafe at my first job--and most of that time was spent on the command line in Linux. I've certainly heard my share of gripes around it, but to be honest, I never hated it. Part of that, of course, is that most of the time I spent with it it was being used on relatively small, self containted projects. There was little need to branch (and consequently merge), there was usually only a few people doing stuff on head at any one time, and it just pretty much worked. When something went horribly awry, I could usually Google the commands to fix it. In my current environment (where I'm only managing) there is tons of branches and merging and CVS is not really very well suited since commits are only performed on files. In contrast, other systems (and they are not necessarily distributed) have a notion of changesets which show you *everything* that was done as part of change in aggregate. This can be a powerful tool when you have lots of concurrent development.

But if you have a small team, I think CVS is just fine. And when I first started setting up to do work on Zplanner, my thought process was pretty much: I'd like to not have to rename files and move them around when I want to try something new. So, my first impulse was to install CVS. Then, after a moment or two, I thought, everyone says Subversion is better, so maybe I'll do that.

I started installing a Subversion server on my laptop, but about 2 minutes in, I suddenly had an epiphany "What the hell am I doing?" I only need to be able to roll back to older versions. If the server is on my laptop there's nothing special that's making the code any more "backed up" so why go to all this effort? For me, a DVCS would work just as well.

So, I did a bit of reading, and most of what I found talked either about GIT or Mercurial. And enough of it was in favor of Mercurial, that I said 'Hell, Mercurial it is'. The biggest thing for me in Mercurial's favor was the nearly universal opinion that it had better documentation and had an easier learning curve. That was important to me. In contrast to some of my other technology choices, I wasn't using Mercurial to learn Mercurial--I was using it because I wanted to manage change safely within my project.

The biggest difference between Mercurial and something like CVS is the notion that everyone has their own repository and all are equally valid. Like many I suspect, this gave me an oddly unsettled sense that there was no centralized control. But if you think about it, even in CVS there's nothing to stop people from committing bad code in most places. And both GIT and Mercurial have the notion of a hub repository to allay such concerns. This repository is not special, it's simply agreed upon by convention and up to the developers to make sure that the code there is what it should be. If it still sounds disconcerting, it shouldn't. It's really not different.

The biggest difference between Mercurial and CVS, however, is that both Bob and Alice can clone (this is the operation performed rather than a checkout) all the code from a centralized repository and then make any changes they want locally. They don't worry about collisions, or having to update code as they do work. There repository is whole and complete. It contains the complete history of the cloned respository plus anything they've done subsequently.

One they want to integrate their work back into the hub, they simply clone the repository again, merge their changes in and push it back to the hub.

There are a number of advantages to this model of course. One of the biggest is that for geographically dispersed teams (think offshoring), commit operations are much faster. There is no network latency and one need not even have access to do commits, merges, whatever.

Additionally, every contributor has a copy of the repository (though in differing states) on every his PC. If one is lost, it's probalby not the end of the world. Contrast this with a usual CVS setup, where if cvsroot gets hosed you're screwed. One could argue that in such a scenario, people will have a copy of the checked out code, which is just as good. The difference, however, is that in such a situation they *only* have a code. They don't have the complete history of checkins, old version so the files, and so on. They only have the last revision. In Mercurial, you have *everything*.

A big plus for Mercurial is its ease of use. Particularly, if you've used CVS, many of the commands are nearly identical. Additionally, the documentation around Mercurial is excellent and although some of the tools lag a bit, they are quickly catching up. There is already a good plugin for Eclipse, which is what I've been using. For local development, it behaves almost identically to using CVS via Eclipse. There is also a standalone client from Tortoise, TortoiseHg, (though from what I know it is a bit buggy on Windows) among others.

Additionally, Mercurial comes with an extension named 'convert' which allows one to import revision history from any of Subversion, CVS, Git, or Darcs.

When compared to its primary competitor in the distributed RCS arena, GIT, it is often noted that Mercurial is easier to learn due to its focus on simplicity. While most of them are not needed, GIT has over 139 commands. Mercurial about half that. Additionally, due to the how the data is stored GIT repositories need frequent manual maintenance (called "repacks") to maintain performance and prevent rapid increases in disk usage. Mercurial, in contrast, requires no active maintenance.

Mercurial is currently in use by projects having hundredes of contributors, including such high-profile projects such as Mozilla, NetBeans, Python, Symbian, and XEmacs.

The core, however, is what is it like to use Mercurial on a day-to-day basis? And for me, it's nearly invisible. Admittedly, I've been using it in a fairly limited context, but to give some sense of its power, here a quick example:

I installed Mercurial locally to track changes on my laptop. Recently, I decided I'd like to share my code with a few people. Using BitBucket.org (what GitHub is to Git, BitBucket is to Mercurial), I imported my repository--AND ALL OF ITS HISTORY in about 5 minutes. Now, I can give a URL to anyone to checkout the code and they can see every commit I ever made (with associate comments) since I started. I thought that was pretty damn cool.

If you're curious to see the ZPlanner source or what a Mercurial repository looks like, sign up for an account on BitBucket.org and shoot me your username and I'll add you (this is assuming I know you as ZPlanner isn't open source).

3 comments:

Shaun said...

We used Mercurial at my previous job and I liked it quite a bit, though I could see little difference between it and Darcs (which we used before it) from a usability standpoint. The problem with Darcs seemed to be scalability. As more developers joined and the code base got bigger, it was becoming incredibly slow. Not sure if that's been fixed over the past year or not, but it forced us to switch and I doubt many developers who have made the switch are gonna go back.

I know little of Git but I guess it's the favorite of Linus Torvalds, so I suppose that speaks to its difficulty, though I'm sure it's very powerful in the hands of an experienced user.

Code Monkey said...

Yeah, I've read about the speed issues with Darcs. Git also had some issues with speed over http for some operations and size/speed issues as a repository grows in size, though I think in both cases, the issues have been resolved in newer releases.

Having used Mercurial (albeit for a fairly trivial use case), I can't see any reason to switch to one of of the other options. It's easy, quick, and...just works.

I would hope Git was Torvalds' favorite as he was the original author. I guess I'm biased, but I think Mercurial, which was started around the same time, demonstrates a conceptual integrity that Git lacks. The wikipedia entry on Git is fairly interesting actually:
http://en.wikipedia.org/wiki/Git_%28software%29

Shaun said...

Ha! I didn't know he had anything to do with developing Git. I should have known with how much he was praising it.