What Would Picard Do?: November 2009

Tuesday, November 17, 2009

ReviewBoard Woes Or: Build vs. Buy

A couple weeks ago I wrote a bit about ReviewBoard, an open-source web application for doing code reviews. I now have it up and running and have introduced it to my team. Hopefully, over the next weeks and months we'll find it a useful tool for reviewing code.

But it certainly took more than a little work to get it up and running on Windows. I had to install Apache, MySQL, PHP, Django, and any number of PHP packages. Add to that the time I spent fiddling with Apache. Of course, I got it working on my own desktop fairly easily, but for some reason when I put it on a dedicated box and started trying to run Apache as a service, the program was having none of it.

My first issue was that even though I'd configured the thing nearly identically to how I had on my successful installation on my own desktop, it wasn't quite working. The first problem I encountered was that even though it said I'd successfully added my CVS repository (which is required to submit reviews) the dropdown displaying the repositories against which it would check submitted reviews would not populate.

Of course, I did the requisite Google-ing but apparently no one else had had the problem.

Soon enough, I was looking at Apache logs, had started opening up the Python code (and mind you I don't know Python), and eventually descended into putting in print outs in the source to the Apache log.

Ultimately, I traced the problem to an included library. For each possible SCM (CVS, SVN, etc), there is a file in ReviewBoard to manage the interactions. Before populating the dropdown, the appropriate library makes sure it can run the commands it needs to interact with the repository. To do this it uses an including Djblets library which has a function called is_exe_on_path. This function is passed a string literal representing the command to be invoked, in my case 'cvs'. The function then appends '.exe' if the app is running on windows and checks that the file is on the path. In my case, this resulted in a check for a file named 'cvs.exe'.

The problem is the newest version of Tortoise NO LONGER HAS A FILE CALLED cvs.exe. Oh, it used to...and that's why I didn't have the problem on my desktop where I'd used an older version. But on my new computer I'd installed the latest version of TortoiseCVS which has replaced cvs.exe with a file called TortoiseAct.exe. How it's invoked on the command line when one types 'cvs' I don't know (I think via a registry entry and DLL), but ReviewBoard was having none of it.

So, I made a copy of the stupid TortoiseAct.exe, renamed it 'Cvs.exe' and hey, presto, the dropdown was working. I left off for awhile and started playing with some other applications for my new tools box assuming everything had been resolved. In fact, subsequently I ended up installing an older version of Tortoise as the newer version didn't work with Hudson, the Java-based continuous integration server, I also put on the box. So ultimately all of my effort would have been unnecessary had I installed Hudson first.

After I’d finished setting up Hudson and prior to rolling out ReviewBoard to the team, I decided I should do a quick run through. So, I made a file diff (which is how one submits a review) and clicked 'Upload' and waited.

And waited. The thing just sat there hanging frozen on the submission page.

I quickly descended back into opening Python source, adding print outs in various ReviewBoard libraries until I found that the line that it was hanging on. It was where ReviewBoard tried to invoke a 'cvs up' command.

Okay, it's CVS again. Great.

But here there was no apparent problem. I spent a few hours, looking at environment variables, making sure all my batch files were readable by the Default Windows user (which is who services on Windows run as default) was accessible. But nothing worked. I could run the same cvs command the program was running on the command line myself just fine, though. It made no sense.

Finally, I set Apache to run not as the Default User in Windows but as *me*. Hey, presto, it worked.

I tried copying over my ssh.bat and ssh keys into the Default User home directory, tried all sorts of random stuff, but nothing worked. Ultimately, I created a new local user on Windows, gave him administrator rights, and logged into the box to start Apache while actually logged in as the user.

It still hanged.

So, I tried running the cvs command myself while logged in as my new user. And a prompt popped up saying the host needed to be added to the 'known hosts' file. I clicked 'Yes' and suddenly the thing was working. Goddammit! It was because the known hosts file was local to my user account and the Default User’s hadn’t had the IP of the CVS server added yet.

Of course, you may be asking what the point of all this blathering is. Well, I got into a conversation with my co-worker, Jim, during the middle of all of this about the age old 'build vs. buy' question.

There was a very definite cost to all this mucking around. I probably spent the better part of two days mucking around trying to get the goddamn thing to work. I don't blame the ReviewBoard people for this. They used an existing library and the people that wrote that library weren't expecting the application used for CVS access to not have a fill called ‘cvs.exe’ (I still think it's dumb code). Why did TortoiseCVS get rid of the cvs.exe file? Seems stupid to me. But again, it's all random decisions by unrelated parties that make installing an piece of software a potential pain.

The argument could be, I could have bought SmartBear code review and been spared all this pain. I don't know that that's true. It probably is. Of course, I see value in the time I spent. Coming out on the other end of all this, I'd learned quite a bit about setting up Apache2, I'd learned a little tiny bit of Python and Django, and I knew exactly how the software was set up. But again, maybe it was easier to have just bought something that *worked*.

Jim mentioned that he'd really have a reversal of the opinion he had years ago, that often it's better to just shell out the money for something that works, so you don't have to waste time or end up with using mediocre software...just because it's free. And he also mentioned that, as someone who writes software for a living, he believes in the value of software. That good software is worth paying for. I think that's a noble sentiment. Jim is one of the most principled people I know, meaning he really thinks about his beliefs and he acts in accordance with them at all times.

I'm much more of a hypocrite, contradictory, and while I write software that I hope maybe to charge people for one day, I mutter 'No goddamn way I'm paying for this!'. But there is a cost to being cheap.

I think I do agree with Jim in that if at the end, you're using shitty software just because it's free, you've made a mistake. By the same token, I think it's a huge mistake to assume that because you've been charged for something that it has value.

Is JBoss that much better than Jetty? MyEclipse vs Eclipse? SmartBear CodeReview better than ReviewBoard? Maybe.

But I think you have to put some time in and choose where you want to spend money. Look at the cost/benefit carefully. And do realize that in cases such as my ReviewBoard installation, sure I did spend a bunch of my time. But I also learned a hell of a lot. And I see definite value in that. That said, if I tried CodeReview and it was miles and miles better than anything open source, I'd like to think I'd pony up the cash.

In my case, I'm working for a company and having to get a VP's approval on any purchase, just means such things will languish in purgatory forever, whereas I already have ReviewBoard up and running and am ready to roll it out to my team.

Of course, my company has no problem in spending 20,000 on an installation of TFS (Team Foundation Server), which has been presented as the second coming, yet I wonder if people have really examined all the inefficiencies in process we have, rather than assuming Microsoft will solve all our problems. It's all a very interesting subject, but that's all I have time for...for now.

Thursday, November 12, 2009

What is the role of a manager?

I take a pretty dim view of 'management'. Hell, I wrote a whole blog about the existential angst I feel being a manager myself. If you're a developer, your contributions are concrete. You're writing the code and in the end the code is all that matters. In a very real sense, everything else, management, qa, client management, etc is ancillary to what you do.

Once you step into a managerial role, however, things take a decidedly different turn, It's quite likely that nothing you do has such a concrete value.. When I get to the end of each day, I'm often left wondering 'What the hell did I even do today?' And if I'm not sure what I did, how can I even assess whether I added value, whether I contributed something to the team or whether, in fact, I'm just along for the ride, contributing nothing of real worth.

That's not true in all situations, of course. If you're a manager a company where you rose through the ranks--and I suspect, like me, this is how most people make it into management--you probably have a very good understanding of the core code. It's likely you were the one who originally wrote the stuff who others are maintaining. I think that's in those cases it's easy to fall into the trap of ignoring what a manager *should* do so you can continue to contribute in the same way you always have, jumping in to write code, troubleshooting issues no one else can figure out, at that point you're not really a manager, you're a senior developer. Which is fine...as long as you don't have manager in your title.

While labels are often just that, with no real concrete meaning, I think there's an important distinctoin to draw between a "manager" and a "dev".

And if you've take a managerial position at a company where you're *not* already familiar with the technolgy, where you haven't spent years traversing the chaos, so you can navigate it with ease, you have to ask yourself where you can make the biggest value contribution. You can spend your time learning everything, going deep into the details of a small number of items, or you can trust your coworkers, delegate where necessary and focus on the bigger picture stuff that no one else has much time for.

In a company where they've "figured everything out", where the team is cohesive and happy, where client expectations are managed well, and where there are seldom fire drills, there isn't a need for a manager. To paraphrase Groucho Marx, any company I'd really want to work for, doesn't need someone like me. The developers do all the things a manager does. They manage themselves, they introduce change themselves, and collectively they make sure all the important things happen.

But at most places this isn't the case. And while I often have a lot of self-doubt about *not* being involved at everything at the lowest level, I've decided the most important contributions I can make are to make sure things move forward, that the organization improves, that the team is constantly looking for better ways to do things. And that the developers have an advocate, someone who truly understand their concerns and can speak on their behalf. There are many times when I did not have that and I would have been thrilled had my manager done these things.

And when there is a moment free here or there, I do try to run "cvs co" and get into things. Becaues despite all of what I've said, in the end, the code is all that matters.

Tuesday, November 10, 2009

ZPlanner Technology Brief: Mercurial

If you've been following this blog for any period of time at all, you're probably aware of my little personal project called ZPlanner, an Agile project tracking tool. I've listed some of the technologies I chose for the project in past entries, and figured it might be interesting to devote a blog entry to one of them now and again.

Today, I'll give a very brief view of the revision control system I chose, Mercurial. This is by no means a comprehensive overview, there are plenty already floating around on the web and Mercurial's official documentation is great. I see no need to tread the same ground, so I'm more just going to give a very high level overview, why I think it's worth your time to look at Mercurial in more depth, and my own subjective opinions.

In the last few years there has been a dramatic uptick in interest around distributed version control systems. There are a number of competitors in the space, many of which have very similar feature sets. Some of the earlier entries in the field, such as Darcs, seem to be waning in popularity, wherease a few have clearly emerged as leaders in the space. The two biggest names in distributed version control are now GIT and Mercurial.

I'll be upfront, I've not used GIT myself, so I'll stear clear of too much in the way of comparison. What I do mention comes from Mercurial proponents, the validity and objectivity of which I leave to you to assess. The main reason I mention GIT at all is when I was deciding what system to use, it basically came down to a choice between the two.

That said, one of the big reasons I've made the progress I have on ZPlanner is that I try not to obsess too much about technological decisions. One should always do a little research, but at some point it's much better to just pick something and move on than to obsess endlessly about such things. I'm sure if I'd picked GIT instead, I'd be just as far along and it likely would have made little difference in the grand scheme of things.

Pretty much all of my career has been spent using CVS--unless I count an early and unfortunate encounter with Visual SourceSafe at my first job--and most of that time was spent on the command line in Linux. I've certainly heard my share of gripes around it, but to be honest, I never hated it. Part of that, of course, is that most of the time I spent with it it was being used on relatively small, self containted projects. There was little need to branch (and consequently merge), there was usually only a few people doing stuff on head at any one time, and it just pretty much worked. When something went horribly awry, I could usually Google the commands to fix it. In my current environment (where I'm only managing) there is tons of branches and merging and CVS is not really very well suited since commits are only performed on files. In contrast, other systems (and they are not necessarily distributed) have a notion of changesets which show you *everything* that was done as part of change in aggregate. This can be a powerful tool when you have lots of concurrent development.

But if you have a small team, I think CVS is just fine. And when I first started setting up to do work on Zplanner, my thought process was pretty much: I'd like to not have to rename files and move them around when I want to try something new. So, my first impulse was to install CVS. Then, after a moment or two, I thought, everyone says Subversion is better, so maybe I'll do that.

I started installing a Subversion server on my laptop, but about 2 minutes in, I suddenly had an epiphany "What the hell am I doing?" I only need to be able to roll back to older versions. If the server is on my laptop there's nothing special that's making the code any more "backed up" so why go to all this effort? For me, a DVCS would work just as well.

So, I did a bit of reading, and most of what I found talked either about GIT or Mercurial. And enough of it was in favor of Mercurial, that I said 'Hell, Mercurial it is'. The biggest thing for me in Mercurial's favor was the nearly universal opinion that it had better documentation and had an easier learning curve. That was important to me. In contrast to some of my other technology choices, I wasn't using Mercurial to learn Mercurial--I was using it because I wanted to manage change safely within my project.

The biggest difference between Mercurial and something like CVS is the notion that everyone has their own repository and all are equally valid. Like many I suspect, this gave me an oddly unsettled sense that there was no centralized control. But if you think about it, even in CVS there's nothing to stop people from committing bad code in most places. And both GIT and Mercurial have the notion of a hub repository to allay such concerns. This repository is not special, it's simply agreed upon by convention and up to the developers to make sure that the code there is what it should be. If it still sounds disconcerting, it shouldn't. It's really not different.

The biggest difference between Mercurial and CVS, however, is that both Bob and Alice can clone (this is the operation performed rather than a checkout) all the code from a centralized repository and then make any changes they want locally. They don't worry about collisions, or having to update code as they do work. There repository is whole and complete. It contains the complete history of the cloned respository plus anything they've done subsequently.

One they want to integrate their work back into the hub, they simply clone the repository again, merge their changes in and push it back to the hub.

There are a number of advantages to this model of course. One of the biggest is that for geographically dispersed teams (think offshoring), commit operations are much faster. There is no network latency and one need not even have access to do commits, merges, whatever.

Additionally, every contributor has a copy of the repository (though in differing states) on every his PC. If one is lost, it's probalby not the end of the world. Contrast this with a usual CVS setup, where if cvsroot gets hosed you're screwed. One could argue that in such a scenario, people will have a copy of the checked out code, which is just as good. The difference, however, is that in such a situation they *only* have a code. They don't have the complete history of checkins, old version so the files, and so on. They only have the last revision. In Mercurial, you have *everything*.

A big plus for Mercurial is its ease of use. Particularly, if you've used CVS, many of the commands are nearly identical. Additionally, the documentation around Mercurial is excellent and although some of the tools lag a bit, they are quickly catching up. There is already a good plugin for Eclipse, which is what I've been using. For local development, it behaves almost identically to using CVS via Eclipse. There is also a standalone client from Tortoise, TortoiseHg, (though from what I know it is a bit buggy on Windows) among others.

Additionally, Mercurial comes with an extension named 'convert' which allows one to import revision history from any of Subversion, CVS, Git, or Darcs.

When compared to its primary competitor in the distributed RCS arena, GIT, it is often noted that Mercurial is easier to learn due to its focus on simplicity. While most of them are not needed, GIT has over 139 commands. Mercurial about half that. Additionally, due to the how the data is stored GIT repositories need frequent manual maintenance (called "repacks") to maintain performance and prevent rapid increases in disk usage. Mercurial, in contrast, requires no active maintenance.

Mercurial is currently in use by projects having hundredes of contributors, including such high-profile projects such as Mozilla, NetBeans, Python, Symbian, and XEmacs.

The core, however, is what is it like to use Mercurial on a day-to-day basis? And for me, it's nearly invisible. Admittedly, I've been using it in a fairly limited context, but to give some sense of its power, here a quick example:

I installed Mercurial locally to track changes on my laptop. Recently, I decided I'd like to share my code with a few people. Using BitBucket.org (what GitHub is to Git, BitBucket is to Mercurial), I imported my repository--AND ALL OF ITS HISTORY in about 5 minutes. Now, I can give a URL to anyone to checkout the code and they can see every commit I ever made (with associate comments) since I started. I thought that was pretty damn cool.

If you're curious to see the ZPlanner source or what a Mercurial repository looks like, sign up for an account on BitBucket.org and shoot me your username and I'll add you (this is assuming I know you as ZPlanner isn't open source).

Tuesday, November 3, 2009

Using software to do code reviews (ReviewBoard)

It's fairly universally accepted that all code should undergo some type of review. The problem is, while it's easy in theory, in practice it tends to be kinda tough. I've had various experiences participating in code reviews ranging from realtively decent to horribly painful and I still don't know that I have a perfect answer as to "right way to do things" (tm).

If I grab my "Software Engineering" book back from my college days, I can read about doing print outs of the code, having an official reader, while others follow along to make comments, and so on. No one I know has ever done anything quite so formal.

More often, it's grabbing a bunch of developers, and having the guy who wrote the code put it up on the projector, while people make various comments about his variable naming, or why he wrote

if(!boolOne && !boolTwo)

rather than

if(!(boolOne boolTwo))

or things equally silly.

In one particularly painful code review, I remember an "architect" giving a discourse on why magic numbers were bad. Fine, fair point. But after the first fifteen minutes, the point was made...yet we kept going for another 20 minutes. Yay!

The other problem, of course, is that while one should leave his ego at the door, it's kinda tough. While I think there's a lot of value in having devs *know* someone else will look at their code (The "Crap, if do this hacky thing, someone will see" effect), sometimes code reviews can be incredibly demoralizing if it just turns into a 'beat up on the developer' exercise.

Even with these caveats, though, I think code reviews are absolutely essential. But as challenging as code reviews are even with a bunch of motivated, full-time employees, all located onsite, it becomes even more challenging when your resources are located remotely.

What do you do then? Have everyone call in from India at 11:20pm their time using a web-ex and a conference bridge that's intermittantly goes in and out, while they walk through the code?

I've tried this and it's never worked out very well. It still had value, but man, it was *not* fun.

Given the fact that 80% of the devs writing software for my team are remotely located, I've struggled a lot with this question over the past few months. In fact, I've let things kind of just go on as they have, which entails the onsite dev leads waiting for CVS check-in emails, or notes from the offshore guys, then passing back comments via emial.

The problem with this, of course, is that it's ad hoc. Even putting some type of process around it (a checklist?), maybe an item on a Wiki page for the project to be struck-through, it still doesn't say anything qualitatively about the review. And there's no real audit trail. It's all via emial or the phone.

So, recently I started to look around for software that might be able to help facilitate this process. I've known about tools like SmartBear CodeReview for some time, but it's always tough to get approval for software that costs money. Mind you, I'm not saying it's impossible, but if you want to put something in place quickly at my company, it's not the way to go. Add to that the fact, that the company is looking at putting a company-wide installation of Microsoft TFS (Team Foundation Server) which includes some code review facilities and I have little hope of getting approval.

So, instead, I started looking at open source projects. There are a few options, but I ended up installing ReviewBoard.

ReviewBoard was created internally by the guys at VmWare so it's definitely more aimed at the Linux crowd. But with a little effort, I was able to (mostly successfully) get it up on an running on a Windows box. It's not without it's warts, but I think it might be quite useful in my current situation.

I can post all sorts of screenshots and so on, but instead I'll just post up what I did to get it running (in case you want to try it out). Hopefully this will be useful to someone out there (Keep in mind I'm not a systems admin, so please forgive any retardation in what follows):

Download Xampp

Xampp is an easy-to-install Apache distribution with MySQL, PHP, and Perl already included
To install, simply unzip the file to your directory (I use C:\opt) and your good to go

Download Python v2.5.4

Subsequent releases are not currently supported in mod_python unless you want to build your own binary (And I didn't)
Add a new system variable name PYTHON_HOME and set it to your install dir location, which should be something like: C:\python25

Add the following to your PATH:

%PYTHON_HOME%;%PYTHON_HOME%\Scripts

Download mod_python-3.3.1
1. Make sure the version you download corresponds to your version of Python (2.5.4 if you followed the instructions above), to your processor type (most people it'll be the 32-bit binary), adn to the version of Apache you'll be using.
2. Edit the Apache conf (which should be located somewhere like C:/xampp/apache/conf/httpd.conf, find where the other LoadModule lines are and add this line
LoadModule python_module modules/mod_python.so
1. Create a simple test directory within XAMPP htdocs directory (which is where any deployed apps go by default) to test your install of mod_python. If you have any difficult with the steps as detailed below, see the instructions on the mod_python site here:
  1. In my case I created a directoy C:\xampp\htdocs\test
  2. Update your XAMPP install so it can recognize Python code, by updating httpd.conf. You'll want somethlng like this (note this is specific to the test directoy I created in the previous step)
    <Directory "C:\xampp\htdocs\test">
    AddHandler mod_python .py
    PythonHandler mptest
    PythonDebug On
    </Directory>
  3. Add a file containing the following to the newly create 'test' dir (Keep in mind this is Python, so spacing matters and you probably can't just copy and paste what I have below here)
    from mod_python import apache
    
    def handler(req):
    req.content_type = 'text/plain'
    req.write("Hello World!")
    
    return apache.OK
Download the tarball of Django v1.1.1
1. Extract the tarball to your directory of choice
  1. In my case this was C:\opt
2. Goto the directory to which you installed Django and run the following command:
  1. Run setup.py install

Download GNU patch.exe
1. Run install (by clicking on .exe)
2. Add the bin directory to your PATH system variable (i.e. C:\Program Files\GnuWin32\bin)

Download the Python imaging library (Make sure you get the version appropriate to your version of Python. In my case, this was Python 2.5)
Download PyCrypto
Install SetupTools for Python.
1. This includes EasyInstall which will make installing additionally needed Python components for ReviewBoard easy

Use EasyIntall (installed via step 8) to install memcache from the command prompt:
1. easy_install python-memcache

Install ReviewBoard using EasyInstall
1. easy_install ReviewBoard

Install MySQL connector using EasyInstall
1. easy_install mysql-python
2. While ReviewBoard comes with an embedded SQLLite install, you probably want to use MySQL.
3. This step actually failed for me as it requires VisualStudio 2003 (I have VS 2005, which doesn't work for this), so I'm using SQLLite for the moment

Create your 'site' for your reviewboard installation, naming it as you'd like the site to be accessed. I used reviews.com and updated my localhost file so this would work (i.e. I access ReviewBoard on *my* computer via reviews.com)
1. rb-site install C:\xampp\htdocs\reviews.com
2. Source the ReviewBoard specific conf file in your global httpd.conf file
  1. The ReviewBoard conf file should be be here: C:\xampp\htdocs\reviews.com\conf\
  2. The Apache conf will be here C:\xampp\apache\conf, if you used the default install location)
  3. Here's the line you'll want to add:
    1. Include <install dir for ReviewBoard site>/apache-modpython.conf
Create a .reviewboardrc file in C:\Documents and Settings\<username>\Local Settings\Application Data with the same name as your site (in my case "reviews.com")
REVIEWBOARD_URL = http://reviews.com/
Use TortoiseCVS to configure access to CVS via the command line (I'm going to leave this out unless someone complains about it, at which point I can provide more details)
1. Download TortoiseCVS
2. Generate SSH keys using PuttyGen
3. Install your SSH public key on your CVS server
4. Create the following file and name it "ssh.bat"
Create a new Environemnt variable named CVS_RSH with a value of the full path to the "ssh.bat" file
Add the CVS_RSH environment variable as well as the path to the TortoiseCVS install to your PATH variable

Pheww! Now, you're ready to start using ReviewBoard, which maybe I'll cover later.

What Would Picard Do?