Thursday, December 10, 2009

ZPlanner Technology Brief: Maven2

A few weeks ago, I devoted a blog to talking about Mercurial, the distributed revision control system I decided to use for ZPlanner, my Agile project tracking tool. This week, I figured I'd briefly talk about one of the other choices I made, namely to use Maven2.

For most of the career as a hand-on developer, I was trapped in the world of Perl and Apache. Our proprietary system didn't require doing builds. You just created a new file, put it in an appropriate directory, and voila, it worked. All of the libraries we used were global to our servers and installation of new packages (from CPAN) was a rare thing. When they did occur, it usually just mean shooting a request to our of our admins and asking him to install it. So, for all intents and purposes we never really thought much about dependency management or builds or any of that stuff.

A bit later, I was assigned as technical lead to a new, large scale (for our company) project and we decided, for various reasons, to use Java. Immediately, we had to start thinking about how we'd do our builds. Given that I was the first person to start coding and was relatively inexperienced with Java at that time, I didn't think about it very much. I started coding in Eclipse, created a directory structure that I thought was reasonable and did builds solely via Eclipse. Dependencies were managed by plopping a jar in the lib directory, then clicking 'Add to classpath'. I didn't think about it much after that. It seemed to work.

When we brought other more experienced Java developers on board, they immediately starting talking about Ant. My boss (whom I still regard as the most brilliant programmer I've ever worked with), steadfastly maintained that dependencies and builds should be managed via Eclipse. That to use Ant (or Maven) was a duplication of what we'd be doing in Eclipse anyway. Why do we have the need to do builds via command line? We should be able to build our artifacts directly in Eclipse which would produce an artifact (in the case of the service I was writing, a jar), deploy it any which way we wanted, and create simple scripts to set the classpath when we ran it. Why muck around with byzantine Ant files, when Eclipse managed all of this for us? And I really do think he had a good point.

Eventually, though we did end up with an Ant file. I can’t recall if it was for any great reason. Thankfully, due primarily to the efforts of my boss, who whittled down the initial file one of our contractors created to about 1/3 its size, it wasn't too hard to understand. Most Ant files are not so nice. I've seen more than my share of horribly long, incomprehensible Ant files that copy files arbitrarily from one location to another, rename directories, and all sorts of other crazy, random stuff.

And the thing is, when you start creating an Ant file, you almost invariably end up reinventing the wheel. Deploying a jar or a war generally consists of pretty much the same steps every time. But because someone decided to structure the project a little differently than the last one, or made some other random decision, there are all these one-off, unique things that the Ant file has to do.

And ant *only* really manages builds--and maybe starting up your app. If you're doing a build in Eclipse, you'll still have to go into the lib and make sure all the jars you need are on the classpath. Some may be assumed to be provided by your web container at runtime, but Eclipse doesn't konw about those, so make sure you go and add those to the classpath too. It just ends up being a pain in the ass.

And that's where Maven comes in. Instead of recreating the wheel every time you start a project, you agree to some conventions. You'll always structure your directories in a certain, canonical way. This is facilitated by the notion of archetypes in Maven. Archetypes are basically just templates for how projects are structured. There available for the most of the common (and even uncommon) types of applications you’re like to build.

In return for this compliance with convention (with which you may or may not totally agree), Maven provides the basic things you always need to do: build, deploy, run tests without having to hand code them. So, in my mind, it’s already got Ant beat at this point. You don’t have to tell it to copy all the jars in your lib into some other dir, or copy some random config files you shoved in the ‘etc’ directory, into the other directories. But then Maven adds something much, much more useful: Dependency management.

Too often I’ve checked out some Java project, add all the jars in the lib path to the classpath, then try to do a build…and stacktrace.

Crap, okay, so this lib I added, has a dependency on another lib. Time to go out on Google and search around. Then you download that, add it to the classpath, then blam.

Dammit, there’s some other dependency. And through trial and error (and a bunch of time), you eventually figure out what you were missing to start with. Often, these are libraries that are only needed for compilation, or they’re assumed to be provided by your web container, so they haven’t been included in lib. But you need them to do anything. And then you have the problem that it’s quite likely that none of your jars are versions

Okay, I have hibernate.jar from two years ago that has no version information. What the hell version is it? Can I upgrade it or will that break everything? These are all lovely questions to wrestle with.

Instead, Maven manages this stuff for you. The POM (Project Object Model) file that you write, instead of a build.xml as Ant uses, is not a procedural set of steps. Maven knows about doing builds via Plugins (which I won’t go into here), and so really all you end up needing to represent are which plugins (i.e. what types of activities you’re performing, rather than the explicit steps that you need to represent in Ant) and what the dependencies are that your project relies on. You can even notate that a certain dependency is only necessary for the build step or the test step. If one of the libraries you’re using requires some other jars of which you were unaware, Maven will figure this out and download them without you ever having to muck around.

And you can specific that *only* version 1.1.2 of a library should be used. You explicit tell it what version to use, so later on down the line, you can easily make the decision of whether it’s safe to move to a new version. If you use the Maven2 Eclipse plugin, you can even reuse the POM to set your classpath, so that whole laborious process of adding jar after jar to your classpath…you don’t need to do that. Just check out the code and bam, ready to run.

Of course, a lot of this happens by magic. But magic, to paraphrase Arthur C Clarke, just means that something is going on that is more advanced than you understand. And recently I’ve seen some complaints that when you have to delve into the magic, things can get messy.

But while using it on ZPlanner, I’ve not encountered any real problems. A few little hiccups here and there, but honestly I imagine going back to futzing around with 200 line long build.xmls for Ant. Or not having the dependency information readily available so I can make intelligent decisions about what my project should or shouldn't be relying on.

If you haven't played with Maven before, I highly recommend taking a look at the excellent online Maven book, available freely from Sonatype. It clearly explains the basic concepts of Maven and let me get my project up and running using Maven in a matter of hours.

Thursday, December 3, 2009

Timetracking, Funny Math, and the Accountants

I joined my current company about 2.5 years ago. And since then the one constant has been my frustration with how inefficient the company is at times. These have included things such as an overreliance on obscenely expensive contractors (The first team of five developers I managed, cost somewhere around one million a year and contained no full-time employees), to endless meetings in which nothing is resolved, to various hopeless complicated processes that amount to nothing more than busy work.

Of course, when the company was bought out a year ago and a new executive team was brought onboard, they immediately began trying ot reduce cost. This jettisoned the 70% contractor work force (something a few of us had been complaining about for years), moved most of the real development offshore, and made cuts in various departments across the board.

But they're not done. And now, apparently, some of them have turned their gaze on our time tracking system.

Now, I've always seen corporate time tracking a curious thing. Time tracking is almost always something mandated by the accountants and managers, so they can get a handle on cost. Like many things, it's an abstraction, and in every implementation I've seen it's an abstraction meant to give the financial people the information they want.

For the people doing the real work, meaning development and QA, the abstraction is generally far less meanginful. They don't see it as something valuable because financials and project codes aren't how they think about their lives. They think about the project or task they're doing. Sometimes these overlap, but just as often they don't.

Of course, I firmly believe in tracking estimates, how close actuals are to estimates, and so on. That's a big part of what I'm trying to tackle with ZPlanner. The difference though, is that in tools like ZPlanner, the numbers come from the actual tasks and work that people have to do. Entering time in such a system gives them value, becuase they can look at a burndown and see how much work remains for their team, they can look at how many hours and tasks are assigned to them and see what's left.

Time tracking systems don't give any of this valuable feedback to a developer. So, people enter their time because they have to. As such it's validity is very questionable. I have only to open up my manager's interface in our time tracking system, and see timecard after timecard filled in with an exact 8 hours every day to know this is not "real" data. It's constructed for my benefit and the benefit of the accountants. My reports fill it out because if they don't, I send them an email chiding them for having not entered their time not because they really *care* about it.

When people are doing something just because they have to, more often than not they do a half-assed job. That's just human nature. If you have a problem with that, take it up with evolution.

And the more complicated you make the rules and more difficult you make it, the more carelessly the task will be performed. Which brings us back to the the attempts of our executives to make our company more efficient.

For 2010, we now have a series of new rules for entering time. The two most impactful changes being:
  • Every time card must have 8 hours entered per day
  • Buckets for management will be eliminated, which means all time must be charged against a specific project

Currently, when I have to file a timecard, I really try my utmost to charge the time I spent to the appropriate project. But frequently, there will be a big chunk of time on any given day, for which I can't really account. The time was usually spent in random conversation about our technology or other ongoing projects, answering email, and so on. It's in such small, discrete portions, though, it's imposisble to recollect exactly how it was spent. Currently, I log this to the "management" bucket. Now I will have to charge it against a project. Most likely this will mean, I spread it evenly across all the projects for which I have oversight, which is everything in my division. Trying to count it accurately would take 50% of my overall time and would be an exercise if futility.

Even the time I spend with my reports in one-on-ones needs to be charged to a project I'm told. I no longer have a 'management' code against which to charge it which will likely be whatever project that particular person is assigned to. The fact that I probably spent the time discussing stuff having nothing to with the project is immaterial. As flawed and inaccurate as the data is now, I'm pretty convinced these rules will cause it to be even more skewed.

My initial reaction to this was that it was a terrible thing. I kind of thought of it in an absolute sense as in, time logged should correspond to reality, and that's the only way the data can led to sane decisions. I thought the intention was to log time as accurately as possible, but I now understand that's not really the case.

Last night, I brought up my concerns about the new time tracking rules to a VP. He explained that the intention was to either pass along the cost or use use these tools to find inefficiency. I told him I thought it was dumb to mandate that everyone has to log a minimum of 8 hours per day. There are probably days they work less than that. And I guarantee you, no one is "productive" eight hours out of the day at this company. If they work 8 hours, they take a coffee break here, discuss some random thing with their colleagues that's completely non-work related and so on.

He countered that the new system would serve two goals. To make sure the cost is passed along to the customer and to make sure people are being efficient. He said that right now the buckets of things like "Management" allow people to hide things. If they try to just amortize this time across their projects, the people who are in charge of those projects will see through it.

But that presumes people know exactly (or even remotely) how long "managerial" stuff takes. I don't think the problem in our company is (primarily) inefficiency on the part of the developers or QA or anyone doing real work. It's the amount of managerial overhead. I can't imagine what requres my team to have 6 project managers for the amount of projects we have. Often a project manager has only 1-2 assigned projects. How can their be *that* much to manage? I just don't buy it.

The VPs contention is that will come out. I suspect something different, however. People will learn other ways to game the system and because now the data will be ever less accurate (if that's possible) than it is now, the decisions the executives make based upon it will be every more divorced from reality.

But maybe it all works out. I don't know. This VP did bring up some good points insofar as "reality" doesn't matter, if the cost is passed along appropriately to the customer and the margins rare. I guess my incredulity is because I don't have an MBA. Reality doesn't matter.

But I like to think it does. And the *real* way to answer these questions, to make sure that costs are calculated appropriately is to start with how the estimates are generated. To track them in a real tool that caputure how much real work is done and how much "management" is logged. To ask hard questions based on empirical observation of why so much *management* time is needed, and to cut the cord in some cases and see what happens.

Managers will always justifty their existence. They create work for eachother. It's not out of ill will, but it's just their nature. There's a great quote from Northcote Parkinson, after whom Parkinson's law is named which goes:

"But the day came when the air vice marshal went on leave. Shortly afterwards, as it happened, the colonel fell sick. The wing commander was attending a course, and I found I was the group. And I also found that, while the work had lessened as each of my superiors had disappeared, by the time it came to me, there was nothing to do at all. There never had been anything to do. We'd been making work for each other."

So to me using a tool that most people don't care about, with an abstraction meaningful only to a few who aren't actually doing any of the "real" work, and which the smart people will find simple ways to game is not the way to fix these problems.

But what the hell do I know, I don't have an MBA.

Tuesday, November 17, 2009

ReviewBoard Woes Or: Build vs. Buy

A couple weeks ago I wrote a bit about ReviewBoard, an open-source web application for doing code reviews. I now have it up and running and have introduced it to my team. Hopefully, over the next weeks and months we'll find it a useful tool for reviewing code.

But it certainly took more than a little work to get it up and running on Windows. I had to install Apache, MySQL, PHP, Django, and any number of PHP packages. Add to that the time I spent fiddling with Apache. Of course, I got it working on my own desktop fairly easily, but for some reason when I put it on a dedicated box and started trying to run Apache as a service, the program was having none of it.

My first issue was that even though I'd configured the thing nearly identically to how I had on my successful installation on my own desktop, it wasn't quite working. The first problem I encountered was that even though it said I'd successfully added my CVS repository (which is required to submit reviews) the dropdown displaying the repositories against which it would check submitted reviews would not populate.

Of course, I did the requisite Google-ing but apparently no one else had had the problem.

Soon enough, I was looking at Apache logs, had started opening up the Python code (and mind you I don't know Python), and eventually descended into putting in print outs in the source to the Apache log.

Ultimately, I traced the problem to an included library. For each possible SCM (CVS, SVN, etc), there is a file in ReviewBoard to manage the interactions. Before populating the dropdown, the appropriate library makes sure it can run the commands it needs to interact with the repository. To do this it uses an including Djblets library which has a function called is_exe_on_path. This function is passed a string literal representing the command to be invoked, in my case 'cvs'. The function then appends '.exe' if the app is running on windows and checks that the file is on the path. In my case, this resulted in a check for a file named 'cvs.exe'.

The problem is the newest version of Tortoise NO LONGER HAS A FILE CALLED cvs.exe. Oh, it used to...and that's why I didn't have the problem on my desktop where I'd used an older version. But on my new computer I'd installed the latest version of TortoiseCVS which has replaced cvs.exe with a file called TortoiseAct.exe. How it's invoked on the command line when one types 'cvs' I don't know (I think via a registry entry and DLL), but ReviewBoard was having none of it.

So, I made a copy of the stupid TortoiseAct.exe, renamed it 'Cvs.exe' and hey, presto, the dropdown was working. I left off for awhile and started playing with some other applications for my new tools box assuming everything had been resolved. In fact, subsequently I ended up installing an older version of Tortoise as the newer version didn't work with Hudson, the Java-based continuous integration server, I also put on the box. So ultimately all of my effort would have been unnecessary had I installed Hudson first.

After I’d finished setting up Hudson and prior to rolling out ReviewBoard to the team, I decided I should do a quick run through. So, I made a file diff (which is how one submits a review) and clicked 'Upload' and waited.

And waited. The thing just sat there hanging frozen on the submission page.

I quickly descended back into opening Python source, adding print outs in various ReviewBoard libraries until I found that the line that it was hanging on. It was where ReviewBoard tried to invoke a 'cvs up' command.

Okay, it's CVS again. Great.

But here there was no apparent problem. I spent a few hours, looking at environment variables, making sure all my batch files were readable by the Default Windows user (which is who services on Windows run as default) was accessible. But nothing worked. I could run the same cvs command the program was running on the command line myself just fine, though. It made no sense.

Finally, I set Apache to run not as the Default User in Windows but as *me*. Hey, presto, it worked.

I tried copying over my ssh.bat and ssh keys into the Default User home directory, tried all sorts of random stuff, but nothing worked. Ultimately, I created a new local user on Windows, gave him administrator rights, and logged into the box to start Apache while actually logged in as the user.

It still hanged.

So, I tried running the cvs command myself while logged in as my new user. And a prompt popped up saying the host needed to be added to the 'known hosts' file. I clicked 'Yes' and suddenly the thing was working. Goddammit! It was because the known hosts file was local to my user account and the Default User’s hadn’t had the IP of the CVS server added yet.

Of course, you may be asking what the point of all this blathering is. Well, I got into a conversation with my co-worker, Jim, during the middle of all of this about the age old 'build vs. buy' question.

There was a very definite cost to all this mucking around. I probably spent the better part of two days mucking around trying to get the goddamn thing to work. I don't blame the ReviewBoard people for this. They used an existing library and the people that wrote that library weren't expecting the application used for CVS access to not have a fill called ‘cvs.exe’ (I still think it's dumb code). Why did TortoiseCVS get rid of the cvs.exe file? Seems stupid to me. But again, it's all random decisions by unrelated parties that make installing an piece of software a potential pain.

The argument could be, I could have bought SmartBear code review and been spared all this pain. I don't know that that's true. It probably is. Of course, I see value in the time I spent. Coming out on the other end of all this, I'd learned quite a bit about setting up Apache2, I'd learned a little tiny bit of Python and Django, and I knew exactly how the software was set up. But again, maybe it was easier to have just bought something that *worked*.

Jim mentioned that he'd really have a reversal of the opinion he had years ago, that often it's better to just shell out the money for something that works, so you don't have to waste time or end up with using mediocre software...just because it's free. And he also mentioned that, as someone who writes software for a living, he believes in the value of software. That good software is worth paying for. I think that's a noble sentiment. Jim is one of the most principled people I know, meaning he really thinks about his beliefs and he acts in accordance with them at all times.

I'm much more of a hypocrite, contradictory, and while I write software that I hope maybe to charge people for one day, I mutter 'No goddamn way I'm paying for this!'. But there is a cost to being cheap.

I think I do agree with Jim in that if at the end, you're using shitty software just because it's free, you've made a mistake. By the same token, I think it's a huge mistake to assume that because you've been charged for something that it has value.

Is JBoss that much better than Jetty? MyEclipse vs Eclipse? SmartBear CodeReview better than ReviewBoard? Maybe.

But I think you have to put some time in and choose where you want to spend money. Look at the cost/benefit carefully. And do realize that in cases such as my ReviewBoard installation, sure I did spend a bunch of my time. But I also learned a hell of a lot. And I see definite value in that. That said, if I tried CodeReview and it was miles and miles better than anything open source, I'd like to think I'd pony up the cash.

In my case, I'm working for a company and having to get a VP's approval on any purchase, just means such things will languish in purgatory forever, whereas I already have ReviewBoard up and running and am ready to roll it out to my team.

Of course, my company has no problem in spending 20,000 on an installation of TFS (Team Foundation Server), which has been presented as the second coming, yet I wonder if people have really examined all the inefficiencies in process we have, rather than assuming Microsoft will solve all our problems. It's all a very interesting subject, but that's all I have time for...for now.

Thursday, November 12, 2009

What is the role of a manager?

I take a pretty dim view of 'management'. Hell, I wrote a whole blog about the existential angst I feel being a manager myself. If you're a developer, your contributions are concrete. You're writing the code and in the end the code is all that matters. In a very real sense, everything else, management, qa, client management, etc is ancillary to what you do.

Once you step into a managerial role, however, things take a decidedly different turn, It's quite likely that nothing you do has such a concrete value.. When I get to the end of each day, I'm often left wondering 'What the hell did I even do today?' And if I'm not sure what I did, how can I even assess whether I added value, whether I contributed something to the team or whether, in fact, I'm just along for the ride, contributing nothing of real worth.

That's not true in all situations, of course. If you're a manager a company where you rose through the ranks--and I suspect, like me, this is how most people make it into management--you probably have a very good understanding of the core code. It's likely you were the one who originally wrote the stuff who others are maintaining. I think that's in those cases it's easy to fall into the trap of ignoring what a manager *should* do so you can continue to contribute in the same way you always have, jumping in to write code, troubleshooting issues no one else can figure out, at that point you're not really a manager, you're a senior developer. Which is fine...as long as you don't have manager in your title.

While labels are often just that, with no real concrete meaning, I think there's an important distinctoin to draw between a "manager" and a "dev".

And if you've take a managerial position at a company where you're *not* already familiar with the technolgy, where you haven't spent years traversing the chaos, so you can navigate it with ease, you have to ask yourself where you can make the biggest value contribution. You can spend your time learning everything, going deep into the details of a small number of items, or you can trust your coworkers, delegate where necessary and focus on the bigger picture stuff that no one else has much time for.

In a company where they've "figured everything out", where the team is cohesive and happy, where client expectations are managed well, and where there are seldom fire drills, there isn't a need for a manager. To paraphrase Groucho Marx, any company I'd really want to work for, doesn't need someone like me. The developers do all the things a manager does. They manage themselves, they introduce change themselves, and collectively they make sure all the important things happen.

But at most places this isn't the case. And while I often have a lot of self-doubt about *not* being involved at everything at the lowest level, I've decided the most important contributions I can make are to make sure things move forward, that the organization improves, that the team is constantly looking for better ways to do things. And that the developers have an advocate, someone who truly understand their concerns and can speak on their behalf. There are many times when I did not have that and I would have been thrilled had my manager done these things.

And when there is a moment free here or there, I do try to run "cvs co" and get into things. Becaues despite all of what I've said, in the end, the code is all that matters.

Tuesday, November 10, 2009

ZPlanner Technology Brief: Mercurial

If you've been following this blog for any period of time at all, you're probably aware of my little personal project called ZPlanner, an Agile project tracking tool. I've listed some of the technologies I chose for the project in past entries, and figured it might be interesting to devote a blog entry to one of them now and again.

Today, I'll give a very brief view of the revision control system I chose, Mercurial. This is by no means a comprehensive overview, there are plenty already floating around on the web and Mercurial's official documentation is great. I see no need to tread the same ground, so I'm more just going to give a very high level overview, why I think it's worth your time to look at Mercurial in more depth, and my own subjective opinions.

In the last few years there has been a dramatic uptick in interest around distributed version control systems. There are a number of competitors in the space, many of which have very similar feature sets. Some of the earlier entries in the field, such as Darcs, seem to be waning in popularity, wherease a few have clearly emerged as leaders in the space. The two biggest names in distributed version control are now GIT and Mercurial.

I'll be upfront, I've not used GIT myself, so I'll stear clear of too much in the way of comparison. What I do mention comes from Mercurial proponents, the validity and objectivity of which I leave to you to assess. The main reason I mention GIT at all is when I was deciding what system to use, it basically came down to a choice between the two.

That said, one of the big reasons I've made the progress I have on ZPlanner is that I try not to obsess too much about technological decisions. One should always do a little research, but at some point it's much better to just pick something and move on than to obsess endlessly about such things. I'm sure if I'd picked GIT instead, I'd be just as far along and it likely would have made little difference in the grand scheme of things.

Pretty much all of my career has been spent using CVS--unless I count an early and unfortunate encounter with Visual SourceSafe at my first job--and most of that time was spent on the command line in Linux. I've certainly heard my share of gripes around it, but to be honest, I never hated it. Part of that, of course, is that most of the time I spent with it it was being used on relatively small, self containted projects. There was little need to branch (and consequently merge), there was usually only a few people doing stuff on head at any one time, and it just pretty much worked. When something went horribly awry, I could usually Google the commands to fix it. In my current environment (where I'm only managing) there is tons of branches and merging and CVS is not really very well suited since commits are only performed on files. In contrast, other systems (and they are not necessarily distributed) have a notion of changesets which show you *everything* that was done as part of change in aggregate. This can be a powerful tool when you have lots of concurrent development.

But if you have a small team, I think CVS is just fine. And when I first started setting up to do work on Zplanner, my thought process was pretty much: I'd like to not have to rename files and move them around when I want to try something new. So, my first impulse was to install CVS. Then, after a moment or two, I thought, everyone says Subversion is better, so maybe I'll do that.

I started installing a Subversion server on my laptop, but about 2 minutes in, I suddenly had an epiphany "What the hell am I doing?" I only need to be able to roll back to older versions. If the server is on my laptop there's nothing special that's making the code any more "backed up" so why go to all this effort? For me, a DVCS would work just as well.

So, I did a bit of reading, and most of what I found talked either about GIT or Mercurial. And enough of it was in favor of Mercurial, that I said 'Hell, Mercurial it is'. The biggest thing for me in Mercurial's favor was the nearly universal opinion that it had better documentation and had an easier learning curve. That was important to me. In contrast to some of my other technology choices, I wasn't using Mercurial to learn Mercurial--I was using it because I wanted to manage change safely within my project.

The biggest difference between Mercurial and something like CVS is the notion that everyone has their own repository and all are equally valid. Like many I suspect, this gave me an oddly unsettled sense that there was no centralized control. But if you think about it, even in CVS there's nothing to stop people from committing bad code in most places. And both GIT and Mercurial have the notion of a hub repository to allay such concerns. This repository is not special, it's simply agreed upon by convention and up to the developers to make sure that the code there is what it should be. If it still sounds disconcerting, it shouldn't. It's really not different.

The biggest difference between Mercurial and CVS, however, is that both Bob and Alice can clone (this is the operation performed rather than a checkout) all the code from a centralized repository and then make any changes they want locally. They don't worry about collisions, or having to update code as they do work. There repository is whole and complete. It contains the complete history of the cloned respository plus anything they've done subsequently.

One they want to integrate their work back into the hub, they simply clone the repository again, merge their changes in and push it back to the hub.

There are a number of advantages to this model of course. One of the biggest is that for geographically dispersed teams (think offshoring), commit operations are much faster. There is no network latency and one need not even have access to do commits, merges, whatever.

Additionally, every contributor has a copy of the repository (though in differing states) on every his PC. If one is lost, it's probalby not the end of the world. Contrast this with a usual CVS setup, where if cvsroot gets hosed you're screwed. One could argue that in such a scenario, people will have a copy of the checked out code, which is just as good. The difference, however, is that in such a situation they *only* have a code. They don't have the complete history of checkins, old version so the files, and so on. They only have the last revision. In Mercurial, you have *everything*.

A big plus for Mercurial is its ease of use. Particularly, if you've used CVS, many of the commands are nearly identical. Additionally, the documentation around Mercurial is excellent and although some of the tools lag a bit, they are quickly catching up. There is already a good plugin for Eclipse, which is what I've been using. For local development, it behaves almost identically to using CVS via Eclipse. There is also a standalone client from Tortoise, TortoiseHg, (though from what I know it is a bit buggy on Windows) among others.

Additionally, Mercurial comes with an extension named 'convert' which allows one to import revision history from any of Subversion, CVS, Git, or Darcs.

When compared to its primary competitor in the distributed RCS arena, GIT, it is often noted that Mercurial is easier to learn due to its focus on simplicity. While most of them are not needed, GIT has over 139 commands. Mercurial about half that. Additionally, due to the how the data is stored GIT repositories need frequent manual maintenance (called "repacks") to maintain performance and prevent rapid increases in disk usage. Mercurial, in contrast, requires no active maintenance.

Mercurial is currently in use by projects having hundredes of contributors, including such high-profile projects such as Mozilla, NetBeans, Python, Symbian, and XEmacs.

The core, however, is what is it like to use Mercurial on a day-to-day basis? And for me, it's nearly invisible. Admittedly, I've been using it in a fairly limited context, but to give some sense of its power, here a quick example:

I installed Mercurial locally to track changes on my laptop. Recently, I decided I'd like to share my code with a few people. Using BitBucket.org (what GitHub is to Git, BitBucket is to Mercurial), I imported my repository--AND ALL OF ITS HISTORY in about 5 minutes. Now, I can give a URL to anyone to checkout the code and they can see every commit I ever made (with associate comments) since I started. I thought that was pretty damn cool.

If you're curious to see the ZPlanner source or what a Mercurial repository looks like, sign up for an account on BitBucket.org and shoot me your username and I'll add you (this is assuming I know you as ZPlanner isn't open source).

Tuesday, November 3, 2009

Using software to do code reviews (ReviewBoard)

It's fairly universally accepted that all code should undergo some type of review. The problem is, while it's easy in theory, in practice it tends to be kinda tough. I've had various experiences participating in code reviews ranging from realtively decent to horribly painful and I still don't know that I have a perfect answer as to "right way to do things" (tm).

If I grab my "Software Engineering" book back from my college days, I can read about doing print outs of the code, having an official reader, while others follow along to make comments, and so on. No one I know has ever done anything quite so formal.

More often, it's grabbing a bunch of developers, and having the guy who wrote the code put it up on the projector, while people make various comments about his variable naming, or why he wrote

if(!boolOne && !boolTwo)

rather than

if(!(boolOne boolTwo))

or things equally silly.

In one particularly painful code review, I remember an "architect" giving a discourse on why magic numbers were bad. Fine, fair point. But after the first fifteen minutes, the point was made...yet we kept going for another 20 minutes. Yay!

The other problem, of course, is that while one should leave his ego at the door, it's kinda tough. While I think there's a lot of value in having devs *know* someone else will look at their code (The "Crap, if do this hacky thing, someone will see" effect), sometimes code reviews can be incredibly demoralizing if it just turns into a 'beat up on the developer' exercise.

Even with these caveats, though, I think code reviews are absolutely essential. But as challenging as code reviews are even with a bunch of motivated, full-time employees, all located onsite, it becomes even more challenging when your resources are located remotely.

What do you do then? Have everyone call in from India at 11:20pm their time using a web-ex and a conference bridge that's intermittantly goes in and out, while they walk through the code?

I've tried this and it's never worked out very well. It still had value, but man, it was *not* fun.

Given the fact that 80% of the devs writing software for my team are remotely located, I've struggled a lot with this question over the past few months. In fact, I've let things kind of just go on as they have, which entails the onsite dev leads waiting for CVS check-in emails, or notes from the offshore guys, then passing back comments via emial.

The problem with this, of course, is that it's ad hoc. Even putting some type of process around it (a checklist?), maybe an item on a Wiki page for the project to be struck-through, it still doesn't say anything qualitatively about the review. And there's no real audit trail. It's all via emial or the phone.

So, recently I started to look around for software that might be able to help facilitate this process. I've known about tools like SmartBear CodeReview for some time, but it's always tough to get approval for software that costs money. Mind you, I'm not saying it's impossible, but if you want to put something in place quickly at my company, it's not the way to go. Add to that the fact, that the company is looking at putting a company-wide installation of Microsoft TFS (Team Foundation Server) which includes some code review facilities and I have little hope of getting approval.

So, instead, I started looking at open source projects. There are a few options, but I ended up installing ReviewBoard.

ReviewBoard was created internally by the guys at VmWare so it's definitely more aimed at the Linux crowd. But with a little effort, I was able to (mostly successfully) get it up on an running on a Windows box. It's not without it's warts, but I think it might be quite useful in my current situation.

I can post all sorts of screenshots and so on, but instead I'll just post up what I did to get it running (in case you want to try it out). Hopefully this will be useful to someone out there (Keep in mind I'm not a systems admin, so please forgive any retardation in what follows):


  1. Download Xampp
    1. Xampp is an easy-to-install Apache distribution with MySQL, PHP, and Perl already included
    2. To install, simply unzip the file to your directory (I use C:\opt) and your good to go
  2. Download Python v2.5.4
    1. Subsequent releases are not currently supported in mod_python unless you want to build your own binary (And I didn't)
    2. Add a new system variable name PYTHON_HOME and set it to your install dir location, which should be something like: C:\python25
      1. Add the following to your PATH:
        1. %PYTHON_HOME%;%PYTHON_HOME%\Scripts
  3. Download mod_python-3.3.1
    1. Make sure the version you download corresponds to your version of Python (2.5.4 if you followed the instructions above), to your processor type (most people it'll be the 32-bit binary), adn to the version of Apache you'll be using.
      1. The version I used (32-bit, Python 2.5, Apache2.2) is here
    2. Edit the Apache conf (which should be located somewhere like C:/xampp/apache/conf/httpd.conf, find where the other LoadModule lines are and add this line

    LoadModule python_module modules/mod_python.so


    1. Create a simple test directory within XAMPP htdocs directory (which is where any deployed apps go by default) to test your install of mod_python. If you have any difficult with the steps as detailed below, see the instructions on the mod_python site here:
      1. In my case I created a directoy C:\xampp\htdocs\test
      2. Update your XAMPP install so it can recognize Python code, by updating httpd.conf. You'll want somethlng like this (note this is specific to the test directoy I created in the previous step)

        <Directory "C:\xampp\htdocs\test">
        AddHandler mod_python .py
        PythonHandler mptest
        PythonDebug On
        </Directory>

      3. Add a file containing the following to the newly create 'test' dir (Keep in mind this is Python, so spacing matters and you probably can't just copy and paste what I have below here)
        from mod_python import apache

        def handler(req):
        req.content_type = 'text/plain'
        req.write("Hello World!")

        return apache.OK
  4. Download the tarball of Django v1.1.1
    1. Extract the tarball to your directory of choice
      1. In my case this was C:\opt

    2. Goto the directory to which you installed Django and run the following command:
      1. Run setup.py install

  5. Download GNU patch.exe
    1. Run install (by clicking on .exe)
    2. Add the bin directory to your PATH system variable (i.e. C:\Program Files\GnuWin32\bin)

  6. Download the Python imaging library (Make sure you get the version appropriate to your version of Python. In my case, this was Python 2.5)
  7. Download PyCrypto
  8. Install SetupTools for Python.

    1. This includes EasyInstall which will make installing additionally needed Python components for ReviewBoard easy

  9. Use EasyIntall (installed via step 8) to install memcache from the command prompt:
    1. easy_install python-memcache

  10. Install ReviewBoard using EasyInstall
    1. easy_install ReviewBoard

  11. Install MySQL connector using EasyInstall
    1. easy_install mysql-python
    2. While ReviewBoard comes with an embedded SQLLite install, you probably want to use MySQL.
    3. This step actually failed for me as it requires VisualStudio 2003 (I have VS 2005, which doesn't work for this), so I'm using SQLLite for the moment

  12. Create your 'site' for your reviewboard installation, naming it as you'd like the site to be accessed. I used reviews.com and updated my localhost file so this would work (i.e. I access ReviewBoard on *my* computer via reviews.com)
    1. rb-site install C:\xampp\htdocs\reviews.com
    2. Source the ReviewBoard specific conf file in your global httpd.conf file
      1. The ReviewBoard conf file should be be here: C:\xampp\htdocs\reviews.com\conf\
      2. The Apache conf will be here C:\xampp\apache\conf, if you used the default install location)
      3. Here's the line you'll want to add:
        1. Include <install dir for ReviewBoard site>/apache-modpython.conf
  13. Create a .reviewboardrc file in C:\Documents and Settings\<username>\Local Settings\Application Data with the same name as your site (in my case "reviews.com")

    REVIEWBOARD_URL = http://reviews.com/

  14. Use TortoiseCVS to configure access to CVS via the command line (I'm going to leave this out unless someone complains about it, at which point I can provide more details)
    1. Download TortoiseCVS
    2. Generate SSH keys using PuttyGen
    3. Install your SSH public key on your CVS server
    4. Create the following file and name it "ssh.bat"
    5. @echo off

      call "C:\Program Files\TortoiseCVS\TortoisePlink.exe" -2 -i "C:\Documents and Settings\yourusername\sshidentity.key" %*

  15. Create a new Environemnt variable named CVS_RSH with a value of the full path to the "ssh.bat" file
  16. Add the CVS_RSH environment variable as well as the path to the TortoiseCVS install to your PATH variable

Pheww! Now, you're ready to start using ReviewBoard, which maybe I'll cover later.

Thursday, October 29, 2009

Steve Yegge's Rant on Languages

I stumbled on the following blogpost originally made in 2004 (updated in 2006) by Steve Yegge:
http://steve.yegge.googlepages.com/tour-de-babel

I vaguely recall the name and from the page that linked to his post, he apparently worked at Amazon for a time before moving on to Google. That said, I highly suggest taking a look at the original blog post as it's a pretty fascinating (and opinionated) read. I'm amazed by how much I agree with, though I didn't like his C++ bashing. But then, his comments about only liking C++ when he was in college are true of me as well. I suspect if I actually had to do something real in C++ (after having used Java) I'd want to blow my brains out.

In any event, a few highlights from my perspective:

C
You just have to know C. Why? Because for all practical purposes, every computer in the world you'll ever use is a von Neumann machine, and C is a lightweight, expressive syntax for the von Neumann machine's capabilities.
...
You also have to know C because it's the language that Unix is written in, and happens also to be the language that Windows and virtually all other operating systems are written in, because they're OSes for von Neumann machines, so what else would you use? Anything significantly different from C is going to be too far removed from the actual capabilities of the hardware to perform well enough, at least for an OS — at least in the last century, which is when they were all written.


C++

C++ is the dumbest language on earth, in the very real sense of being the least sentient.
...
Stuff takes forever to do around here. An Amazon engineer once described our code base as "a huge mountain of poop, the biggest mountain you've ever seen, and your job is to crawl into the very center of it, every time you need to fix something."
...
It's all C++'s fault. Don't argue. It is. We're using the dumbest language in the world. That's kind of meta-dumb, don't you think? The original brilliant guys and gals here only allowed two languages in Amazon's hallowed source repository: C and Lisp

Lisp/Emacs
All of the greatest engineers in the world use Emacs. The world-changer types. Not the great gal in the cube next to you. Not Fred, the amazing guy down the hall. I'm talking about the greatest software developers of our profession, the ones who changed the face of the industry. The James Goslings, the Donald Knuths, the Paul Grahams2, the Jamie Zawinskis, the Eric Bensons. Real engineers use Emacs. You have to be way smart to use it well, and it makes you incredibly powerful if you can master it. Go look over Paul Nordstrom's shoulder while he works sometime, if you don't believe me. It's a real eye-opener for someone who's used Visual Blub .NET-like IDEs their whole career.
...
Now C++, Java and Perl are all we write in. The elders have moved on to greener pastures too.
...
Religion isn't the opiate of the masses anymore, Karl. IDEs are.


Java
Java is simultaneously the best and the worst thing that has happened to computing in the past 10 years.
...
But Java's missing some nice features from C++, such as pass-by-reference(-to-stack-object), typedefs, macros, and operator overloading. Stuff that comes in handy now and again.
...
Oh, and multiple inheritance, which now I've come to appreciate in my old age. If you think my Opinionated Elf was a good counterpoint to polymorphism dogma, I've got several brilliant examples of why you need multiple inheritance, or at least Ruby-style mixins or automatic delegation. Ask me about the Glowing Sword or Cloak of Thieving sometime. Interfaces suck.

Gosling even said, a few years ago, that if he had to do it all over again,he wouldn't have used interfaces.

But that's just exactly what the problem with Java is. When James said that, people were shocked. I could feel the shock waves, could feel the marketing and legal folks at Sun maneuvering to hush him up, brush it off, say it wasn't so.

The problem with Java is that people are blinded by the marketing hype. That's the problem with C++, with Perl, with any language that's popular, and it's a serious one, because languages can't become popular without hype. So if the language designer suggests innocently that the language might not have been designed perfectly, it's time to shoot the language designer full of horse tranquilizers and shut down the conference.
...
Bad developers, who constitute the majority of all developers worldwide, can write bad code in any language you throw at them.
...
When in doubt, hire Java programmers who are polyglots, who detest large spongy frameworks like J2EE and EJB, and who use Emacs. All good rules of thumb.


Perl

There are "better" languages than Perl — hell, there are lots of them, if you define "better" as "not being insane". Lisp, Smalltalk, Python, gosh, I could probably name 20 or 30 languages that are "better" than Perl, inasmuch as they don't look like that Sperm Whale that exploded in the streets of Taiwan over the summer. Whale guts everywhere, covering cars, motorcycles, pedestrians. That's Perl. It's charming, really.
...
But Perl has many, many things going for it that, until recently, no other language had, and they compensated for its exo-intestinal qualities. You can make all sorts of useful things out of exploded whale, including perfume. It's quite useful. And so is Perl.
...
Now you can't read a book or tutorial or PowerPoint on Perl without spending at least a third of your time learning about "references", which are Larry's pathetic, broken, Goldbergian fix for his list-flattening insanity. But Perl's marketing is so incredibly good that it makes you feel as if references are the best thing that ever happened to you. You can take a reference to anything! It's fun! Smells good, too!
...
Like I said, though — until recently, nothing could get the job done like Perl could.


Ruby

Anyway, Ruby stole everything good from Perl; in fact, Matz, Ruby's author (Yukihiro Matsumoto, if I recall correctly, but he goes by "Matz"), feels he may have stolen a little too much from Perl, and got some whale guts on his shoes. But only a little.
...
And he somehow made it all work together so well that you don't even notice that it has all that stuff. I learned Ruby faster than any other language, out of maybe 30 or 40 total; it took me about 3 days before I was more comfortable using Ruby than I was in Perl, after eight years of Perl hacking. It's so consistent that you start being able to guess how things will work, and you're right most of the time. It's beautiful. And fun. And practical.
...
The leap from Perl to Ruby is as significant as the leap from C++ to Java, but without any of the downsides, because Ruby's essentially a proper superset of Perl's functionality, whereas Java took some things away that people missed, and didn't offer real replacements for them.


Python

Python would have taken over the world, but it has two fatal flaws: the whitespace thing, and the permafrost thing.
...
The whitespace thing is simply that Python uses indentation to determine block nesting. It forces you to indent everything a certain way, and they do this so that everyone's code will look the same. A surprising number of programmers hate this, because it feels to them like their freedom is being taken away; it feels as if Python is trampling their constitutional right to use shotgun formatting and obfuscated one-liners.
...
What's the frost thing, you ask? Well, I used to have a lot of exceptionally mean stuff written here, but since Python's actually quite pleasant to work with (if you can overlook its warts), I no longer think it's such a great idea to bash on Pythonistas. The "frost thing" is just that they used to have a tendency to be a bit, well, frosty. Why?

Because they were so tired of hearing about the whitespace thing!

Wednesday, October 21, 2009

StackOverflow DevDays Seattle


Back when I was at my former company, it was always a sore point with me that "continued education" seemed to be a foreign word. Even as I climbed the ranks of the company I felt like I was losing all the hard-won skills and knowledge that I'd accumulated over my years in misery at university. Sure, I knew our codebase, but I didn't feel like I knew very much else. With my skills becoming increasingly narrowed, and focused around *only* what my company did, I wondered if I'd ever be able to find a job *outside* the company.

So, when I became a dev manager at my current company, one of the things I hoped to do waa be the advocate for continued education for the guys under me. Now, for the first couple years all I had were contractors...and it's fairly hard to justify spending money on continued education for people who aren't even FTEs (full-time employees).

So having taken over a new group of devs, comprised entirely of FTEs, I was thrilled to learn Joel Spolsky and Jeff Atwood were organizing the Stack Overflow DevDays set of conferences. Given the extremely low cost for a ticket (A good conference is often $500-700, whereas DevDays was a mere $100 a head) I lobbied to get my entire dev team to go. And, to his credit, my Director agreed.

So, today, my whole development group and I got to spend the day listening to the speakers at the Stack Overflow DevDays conference in Seattle. On the whole I was extremely impressed and if they do it again next year, I highly recommend attending (or if there are still tickets in one of the remaining cities on the tour and you're nearby, snatch one up).

The keynote speach from Joel was about the dichotomy of power and simplicity when it comes to creating software. He made his way through a bunch of examples demonstrating the way in which we're constantly assailed by choices, many of them meaningless. These, he said, are simply an example of the designer being unable to make a decision. One example he gave was the Options bar in IE, which has an option named 'Enable Carat browsing'.

What does 'Enable Carat browsing' mean he asked? And even if you know, is it really a choice that's important? Shouldn't the designer have figured out the right default choice. Instead, it's one more in a series of bewildering decisions, that slow you down and rather than letting you do what you want with your software, force you to be its slave.

So, he asked, does this mean the answer simplicity? Companies like 37 Signals have gone this direction with their BaseCamp software (http://basecamphq.com/). Their "manifesto" is about keeping the feature set small, do one thing, but do it incredibly well. Keep the feature set minimum even if your customers are clamoring for new features. But, the thing is, he went on, that hte features you deem "absolutely essential' and the "minimum" requirements may not be exactly what the customer needs.

So, he said somewhat disappointedly, the answer is not simplicity. What then, is the answer?

In his mind, it's to only offer the choices that matter. He used the quote:

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”
—Antoine De Saint-Exupery

You need to listen to the customer and give them the choices that are *important* but not assail them with things that they don't care about. Enable Carat browsing is one example, another the constant reminders in Windows Vista about whether you want to proceed, even after you've just clicked another very similar dialog box before it. Why would it ask you twice?

It's a reasonable answer I suppose, but also somewhat of a non-answer in my view. I personally think the 37 signals people are closer to the truth, but that's the topic for a whole other blog.

The next speaker was Scott Hanselman, who has the podcast Hanselminutes and he spoke about ASP.NET MVC and some new features. In some ways, I thought his presentation was the most entertaining. My face hurt from how much I was laughing and smiling during his presentation. To give an example, while mentioning he worked for Microsoft and that it 'really isn't so evil anymore', a huge picture of the Death Star was projected behind him. He even showed a language some at Microsoft were working on implementing in .NET that I thought hilarious, called LOLCODE. See it here:
http://lolcode.com/examples/hai-world

The content of Hanselman's talk was actually a bit light, his focus seeming to be more on entertainment than education, but having sat through years of boring Com Sci lecturs in school, I wasn't about to complain. He gave a brief peak into how the new MVC features of .NET worked and it appeared to me that Microsoft is going along the currently popular route of convention over configuration, which is something I was happy to see.

The next speaker, Rory Blyth, then was introduced and gave a talk about developing for the iPhone. My favorite part was when he listed the prerequisites and one of them was:
* A Mac
* "But are you sure?"
* "Yes"
* "But I have a friend who said..."
* "They heard wrong"
He talked about how the iPhone development platform had grown out of the IDE Steve Jobs had put together while at NeXT around 1986 and that, "it was cutting edge about 20 years ago...and it's pretty much the same now". As he went through the examples, it was interesting to see a bit of what Objective C looks like and some of the goofiness of the InterfaceBuilder. He then showed an example of how doing hte same thing in MonoDevelop (which allows one to use C# rather than Objective C) simplified development a great deal. MonoDevelop however is still under development and his example actually crashed while he was trying to run it. One thing Rory stressed again and again if doing iPhone development was that to read Apple's style "guidelines". He made huge air quotes around "guidelines" saying that essentially if you don't follow them, your app will not be published. He continued, "If you ever find yourself wondering how the UI should look, go look at an existing app on your phone (since presumably it made it past Apple's style-dictators) and do that".

Next, Spolsky came back up and gave a rundown of FogBuz7 and its new features. Being a "competitor" to ZPlanner (ha), I was quite interested to see some of the features. Back to his keynote in which he somewhat dismissed the notion of "simplicity" I feared some of the flashier features around his "Evidence based scheduling" might just be confusing and unnecessary, but I'd really have to look at it more closely. Of course, some may have been put off by what amounted to a 30 minute sales pitch, but for whatever else one can say about Joel he is a shrewd businessman. It's very rare for a conference of this caliber to be given for this price and I suspect a large part of the motivation is that FogCreek makes tools for developers. This is the best way to reach them. So, I view it as a symbiotic relationship and didn't mind listening to his spiel. And, like I said, it was interesting to see how his company had approached some of the same problems I have been trying to solve (far more simplisticially) with ZPlanner. I'll probably write an entire blog about Joel and some of my thoughts around his approach and FogBugz, but that will have to wait until next time.

Next up Cody Lindley talked a bit about jQuery. I've played around with it a bit myself while working on ZPlanner but was amazed to learn that something like 35% of all websites now use jQuery. He mentioned that many people only think of using jQuery to manipulated things already in the DOM, but it can also add things to the DOM. He went through some basic examples simply reinforcing what I already knew, that jQuery seems to be the preeminent choice for doing UI work these days. He also gave some useful websites to use when experimenting:
http://codylindley.com/jqueryselectors/
http://jsbin.com/
http://jqueryenlightenment.com/

After Cody, Daniel Rocha came up to speak about QT. I'd not been previously familiar with it, but apparently it's a tool that allows one to cross-compile code (including UI work) to native code across platforms. Currently, it already or will support cross compilation to the following OSes:
* Embedded Linux
* Mac OS X
* Windows
* Linux/X11
* Windows CE/Mobile
* Symbian
* Maemo

Nokia purchased the technology from a company called TrollTech, which had been selling it at a hefty-license fee ($3500 or so) and open sourced the SDK and tools in a hope to have more developers doing work for their phones. Daniel showed a simple example of creating a simple Windows app, then recompiled the same thing to Ubuntu and, hey presto, it worked.

And unlike, Java which has the overhead of running with a JVM, QT compiles to native code so it can do some pretty impressive things. In one example, he showed an OpenGL game (OpenGL is built in) written in QT and compiled to a Nokia device. Very impressive.

After Daniel finished, Ted Leung spoke a bit about Python. His speech was the dryest of the bunch by far and he'd unfortunately made the choice to make his slides from VIM screenshots, not realizing his highlighting scheme (using purple for keywords) was essentially unreadable to the audience. Even after they dimmed the lights my eyes were hurting from straining to see what he was showing, and because his presentation had a bit less pizzazz (it was essentially just showing hte contructs of the language ala a university course), I kind of tuned out a bit. The biggest failure on his part, I think, was that he failed to show me why I'd want to *learn* Python...other than academic curiousity.

Next Dan Sanderson, spoke about the Google App Engine and how it allows one to write apps that can scale as needed. It can run both the JVM as well as Python. It makes guarantees that for the 1st user or the 100000000th concurrent user, the experience will be the same, but imposes some restrictions on developers as a result, one being that it doesn't use a traditional RDBMS. There is a video on the AppEngine code page and I'd recommend watching it as I found the whole topic fascinating.

Last up, was Steve Seitz a local professor here at the University of Washington and while it was the least "usable", it was also the most fascinating. He spoke about Photo Tourism and their work to reconstruct three-dimensional models in space from photos from the internet. You have to watch the video for yourself because it simply blew me away:
http://phototour.cs.washington.edu/
(Watch the "Long Video")

Apparently, this technology is in commercial use on
http://photosynth.net/
Seemed to be old news to some, but I'd never heard of it.

Currently, Steve is working on recreating Rome from pictures downloaded from flickr, you can find his work here:
http://grail.cs.washington.edu/rome

That's it. It was a fairly exhausting day, but a ton of fun. Other than during the Python presentation I was never bored and though what I learned was fairly superficial, it was all fascinating and now I have a bunch of new, shiny things to play around in with my (meager) spare time.

Thursday, October 15, 2009

ZPlanner: Making Things as Dumb as Possible

In my entry last week, I wrote about my experiences using TDD for the little personal project I've been working on called ZPlanner. At that point I was just beginning to refactor. I think that's one of the biggest things forgotten when people try to do some variant of TDD, or any programming really. The thought of starting to code without having spent hours and hours designing your solution to the last method signature is a really scary prospect to some.

If you're anythiung like me, much of your formal education tried to ensure you had all sorts of documentation, that you'd thought out everything throughly and it tends to be difficult to break out of this frame of mind. The thing is, TDD *does* work. The thing is TDD assumes that after you've written your test and you've gotten your code to pass it (by hook or crook), you go back and refactor. The problem is many people skip the refactoring step and it's absolutely essential.

That's where you fix all the crappy stuff you did to pass your tests, where you try to tighten up your design. Otherwise, you end up with just want TDD detractors say you will, crappily designed and implemented code.

In any event, TDD isn't really the subject of this blog. It's more what my experiences with refactoring ZPlanner and something I noticed about my own coding (at least for ZPlanner) in the process.

If I were to sum up the result of my this refactoring, in a single sentence it would be this:
"Make the code as stupid as possible".

For a long time, I've always claimed I liked to keep things simple. That complexity is the enemy of maintainability and that it's important to "do the simplest thing possible" per TDD. When I started looking at my ZPlanner code, though, there was all sorts of complexity. My primary class, EstimatedItem, was an abstract base class which represented both stories and tasks (neither had a more concrete representation), I'd subclassed it to create my Iteration object, used a recursive Hibernate relationship (which took some time to figure out), and because the object was recursive, I had a bunch of recursive functions and complex logic to figure out how to sum estimates and where a given EstimateItem might be in a node-like structure. Okay, great! But the thing is all of that likely made it pretty damn difficult for anyone who was first coming to my code.

I'd also created an abstract base class in an attempt to represent all form actions. Of course, that wasn't good enough for me, so I tossed in Java Generics, with the notion that every item that was modifiable via the interface had a parent and child type, each passed in as template arguments. The fact that in some cases this abstraction really had to be hammered together to fit--for example, in the case of a Project there was no parent entity so I passed in Object as the Parent type of Project (which really makes absolutely no sense).

Of course, I was quite happy with myself for all of this at the time! Oh, look at me, I have a recursive Hibernate relationship! Oh, look at my cool use of Generics! Oh, look at all this recursive logic I know!

I think that's often the case with us programmers. we want to prove to others that we know our shit. All that complexity was almost like a badge of honor to show that I, too, was worth my meddle. Of course, I didn't justify it this way. No, it all makes sense, I told myself. It's the simplest thing possible after all, because I'm using the least code. But then using the least code, is often not the same thing as doing the simplest thing possible. Just look at any gob of hacked up Perl where the goal seems to be to put as much logic on one line as possible and obfuscate it to the greatest degree possible. "If it was hard to write, it should be hard to debug!", goes the old saying.

When I stepped back, though, I realized that all my cleverness really hadn't gotten me anything that meaningful. Maybe I had a few less classes, but the ones I did have could only be understood via my huge swaths of comments. Nothing was clear.

So, I started to try and make my code as un-clever as possible. Rather than using a recursive object relationship with an abstract base class, i copied and pasted my EstimatedItem into separate classes for each of Project, Iteration, Story, and Task. I eliminated the recursive nature. After all, did I really need infinitely nestable tasks? When the hell would anyone use that, clever as it may be? Sure, I had a bit more code, but it was obvious. Suddenly, I *had* a Story class. And guess one, it has a private member variable called "Iteration", which go figure, was the parent iteration to which the story belong. That's not very clever is it?

It also meant that rather than having a single table for all my objects (because in Hibernate, that's the preferred strategy when using inheritance) with join tables for the data of subclasses, I had an 'iterations' table with just the iterations, a 'project' class with just hte projects, and so on. I also got rid of my BaseAction class that was poorly representing the abstract notion of a http request. Sure my action classes ended up having a couple extra private members, and a few extra getter and setters, but it also meant you could just look at the class and pretty much know what it was doing.

Thankfully, all of this was relatively easy, because I had a lot of test code. It made the refactoring relatively easy and gave me some assurance that things actually worked when I made changes. I probably "rewrote" about 50-75% of the code and it only took 20-25 hours. I still have around 85-90% test coverage and everything pretty much works.

All of this means, that when I look at the code now, there's nothing really to pat myself on the back about. There's nothing particularly clever. There is a bit more code now, but it's not that much more (maybe 25%), but what's there is all quite mundane and obvious.

And that's the whole point.

Tuesday, October 13, 2009

Interviewing Project Managers

Recently, I've been asked by my Director to help interview a number of Program Manager candidates. In the last two days, I've interviewed four different people. There was another flurry of candidates a few months ago in which I also participated, but we didn't end up hiring any of them.

Now I have a tremendous amount of experience in hiring developers. I've been responsible for staffing large teams in short order on at least three different occasions now. My rough guess is if you combined the people I've phone screened or interviewed in person, the number is somewhere around the three hundred mark. I've probably greenlit upwards of forty of fifty developers. This is just in the past three years mind you. Needless to say I have formed some very definite thoughts about the interview process for developers in this time.

What's interesting, though, is that as concrete and refined as my thoughts are when it comes to what constitutes a good developer interview (I'll leave such thoughts to a subsequent blog entry), I don't really have any similiar set of defined criteria for other positions. It ends up being much more 'intuition' based, which frankly bothers me.

When interviewing a developer, I have expectations that the candidate understands basic data structures, that when presented with one or more problems he can work through them logically and methodically, even if he can't quite arrive at the right answer. If, for example, he doesn't know what a linked list is or a hash table, and the situations in which one of the two data structures would be used over the other, he's going to have a really difficult time getting a thumbs up from me. If he can't solve some simple (and I mean SIMPLE) exercises in his preferred language, then he really is not getting through.

These past few days, though, have really highlighted that I don't have any similar questions for other positions. What I've done for these past few interviews with program managers is to look over the resume, look for things with which I'm familiar and spot check the candidates
knowledge there. Yesterday, for example, I had a Program Manager with an extremely impressive resume. It listed his workat numerous, huge companies, with all sorts of awards.

I also noticed that his education was in programming and that first six years of his career were spent as a Java developer in Java. So, I asked him to explain inheritance in a few sentences. I don't even recall exactly what his answer was as it was mostly gibberish. He certainly he didn't mention the idea of using the criteria of 'is a' to determine whether it a particular inheritance scheme was appropriate, or that inheritance, along with encapsulation and polymorphism, is one of the fundamental concepts of object-oriented programming. Nor did he say anything about using inheritance to help reduce code duplication by centralizing shared functionality in a base class from which other classes could be derived. Any of these would have been acceptable. I wasn't looking for a text book answer, just something that gave me an indication he knew what inheritance was at some basic level.

He couldn't do it.

He also listed Scrum on his resume, and so I gently asked him to give a basic explanation of that as well. Where one might expect him to mention the various roles within a Scrum team, such as Product Owner, Scrum Master, things like Sprint planning sessions, the daily Scrum, or retrospectives, he gave me a vague answer about how it 'helped focus on past and future requirements' and some other drivel.

In fact, I was happy that he had these things on his resume. He choose to list them, thereby making them fair game, and could provide a basic explanation of neither. I felt quite justified in justified in excluding him from future consideration.

More often, however, there is nothing so concrete on a PGM's resume that I can spot check. If the person doesn't have a development background, if he doesn't have any experience with Scrum (or another program management methodology with which I'm familiar) I end up having to rely almost solely on hypotheticals and questions about his past roles, which I still don't like much at all. The problem is such questions are (relatively) easy to answer.

Not to say that many candidates don't still do a miserable job with them, but if you're asking what a PGM would do when the client is making unreasonable demands, it's all well and good for them to say "I'd stick to my guns and refer them to the SOW, but see if we could accomodate their request reasonably with minimal risk"

When they have someone at the client company yelling at them saying how they missed some requirement or that the company promised this feature, though, it's much less likely they'll act that way though. Like many things, it is easy in theory, but in practice tend to be horribly difficult.

I may ask the person what was the most challenging project he worked on and why? What are the characteristics of a well-run project (beyond hitting milestones), and what the characteristics of a poorly run one? I may pose questions based on current difficulties faced by the team and ask him what he would do?

But the problem is, none of this really gives me any great sense the person is qualified. At best, it disqualifies him if he can't give a plausible, well-reasoned answer, or if he veers totally off topic. But a "good" answer guarantees almost nothing. I'm still struggling with the question of whether there are "good" ways to evaluate a potential PGM beyond a reference from someone who's previously worked with the person (which is the best thing) or hypotheticals like this, which really aren't all that useful.

The things is I think a *good* PGM can contribute immense value to a project. A poor one thought simply create work and stress, derails process, and is a huge liability to a team. And unfortunately most PGMs in my experience are poor ones. Bureaucrats more concerned with action items and holding status meetings that with thinking critically about the projects to which they are assigned and how to intelligently manage scope and the customer.

That what makes the good ones even more remarkable and valuable. I just wish I had a better way to find the good ones.

Tuesday, October 6, 2009

ZPlanner, TDD, and Over abstraction

As I've written once or twice before, I've been working on a lightweight Scrum/Agile piece of project management software. It's called ZPlanner and I've probably put all of 200-300 hours into it, though it's hard for me to say precisely. I started writing it when I thought I was going to be laid off from my job as development manager and wanted to brush up a bit on my Java programming skills. I somewhat arbitrarily chose to write a piece of project management software as it was a domain I knew and I've used XPlanner fairly frequently over the years. It's a decent tool, but has some serious shortcomings. But again, the motivation was mainly to try a few new technologies and do things "the right way" since I had no timelines or customers chomping at the bit.

Though I've been a big proponent of unit tests for some time and even TDD my formal computer science education instilled in me the absolute need to "architect" everything I do prior to writing a line of code. Ever since I came into contact with TDD, I've questioned this approach and I've done my best to move away from big upfront design (BUFD), but if I'm honest with myself I've not really done TDD proper. With ZPlanner, however, there was nothing on the line really, so for the first time I decided to really make a legitimate attempt at using TDD from the beginning.

I didn't think about a line of code or or spend days upon days trying to come up with some perfect object hierarchy, instead for once, I just started coding.
  1. Write a failing unit test for new functionality
  2. Do the simplest possible thing to make the code pass the test
  3. Refactor
  4. Repeat


What's interesting was that in an amazingly short period of time I actually had something that worked at some basic level. My first milestone was to have a simple web page in which one could enter a task with a name, description, and an estimate and save it to the database. As trivial as that may seem, using my old approach I would have spent days and days (or longer) agonizing over some design, thinking of every possible use case, going back and forth on which technology to use, and so on. It probably would have been several weeks before I even was at the stage to write a line of code.

Instead, within 40 hours I had set up Maven2 for dependency management and to do builds, Hibernate Annotations to manage database persistence, Cobertura to do unit test coverage reports, and Mercurial to manage my revisioning. And I even had a working web application to boot!

And it did something meaningful, which was allow me to save tasks with estimates.

It was basically a slightly glorified Excel sheet (minus all the crazy macros and editing and so on).
As I kept moving forward I tried to repeat the TDD mantra

  1. Write a failing unit test for new functionality
  2. Do the simplest possible thing to make the code pass the test
  3. Refactor
  4. Repeat

Of course, "doing the simplest possible thing" is a subjective assessment. Sometimes what seems the "simplest thing" is not at all the simplest thing. And though I'd overcome my education with regard to doing some BFUD, and my need to be a "good computer scientist" and abstract things to reduce duplication reared it's head.

Unit test by unit test I built up the application, refactoring, removing duplication, doing "the simplest thing possible" until a week or two ago. I actually had a system that could store a hierarchical tree of projects, iterations, stories, and infinitely recursed tasks, it could roll up estimates using some fairly complex logic, it had validations for the web app, a clean, easy build process, the ability to create users, an Ajax drag-and-drop interface to move around stories/tasks, and about 80-90% code coverage. All with about 20 files and only a few thousand lines of code.


Talking to one of my colleagues (thanks Jim!), though, he pointed something obvious out. Part of my interpretation of doing the "simplest thing possible" had been trying to reuse code as much as possible. This in turn ended up resulting in object model in which Iterations, Stories, and Tasks had all been abstracted into a notion of an EstimatedItem.

This meant most of my functionality was able to be captured in only a few lines of code, but it also meant it was pretty non-intuitive. The comments were strewn with explanations about how an EstimatedItem in one case represented a Story and in another case represented a Task.

I'd also oddly (and wrongly) decided to subclass the Iteration class from EstimatedItem as well, reasoning that the fundamental characteristic of an estimated item was that it had a name, description, and an estimate. Well, I figured at the time, an iteration has all of these things. It only differs in that it has a start date and an end date. It seemed to make sense at the time and it also seemed the "simplest thing possible"


The problem of course is that an Iteration IS NOT a story.
A task ISN'T even a story.

One can abstract things sufficiently such that all things are the same. I blame Java and its ilk for this to some degree with its notion of everything being an Object. Okay, great! Everything's an object. What does that MEAN? But Java isn't really to blame either, I've seen plenty of other examples of over abstraction. The problem is we're taught in school and by our peers that reuse is bad.

But then if our abstraction makes the code much harder to read. Rather than looking at the code and seeing Stories and Tasks and things that have concrete meaning, we're dealing with abstractions that it really takes some thought to understand. Maybe it's a little less code, but it's much harder to understand.

I don't really blame TDD for my mistake. I think it could just as easily (perhaps even more likely) happened with a traditional approach. In any event, I'm no rewriting a large portion of the code, copy-and-pasting even and where I had three classes I know have 9-12. I guess this could be seen as a bad thing, but suddenly what was abstruse and explained in comments makes perfect sense just looking at the code

Whereas one used to add a task like this

EstimatedItem story = new EstimatedItem(...)
story.addSubItem(new EstimatedItem(...))


is replaced with this

Story story = new Story(...)
story.addTask(new Task(...))

The latter is infinitely easier to understand in my view. And hopefully once I've exploded my classes I might be able to see some commonalities that *are* meaningful, rather than my flawed notion of abstraction. Now, since I have unit tests for all this stuff, it's pretty easy to change them and make sure I'm still passing the same tests. So, TDD is what makes this *better* design possible. And now we're back on TDD and we've come full circle.

Until next time.

Friday, October 2, 2009

Thoughts on Offshoring, Part IV

This is the fourth and final installment I'm going to write about offshoring. I've already written about the two implementations of offshoring I've been part of, what I thought worked, and what didn't.

Let me recap the different approachs and my thoughts about what worked and what didn't.

The first approach:
  • Use Scrum with teams spanning geographical boundaries. Each onshore team was augmented with a few offshore resources.
  • The offshore location was chosen so as to minimize the time difference. Using resources basedin Costa Rica reduced this difference to 3 hours, even with us being on the West coast
  • Offshore resources were screened so as to try and ensure competency and language fluency.
  • Direction for these offshore resources came from onshore leads and all management of the project was done onshore.

The problems:

  • Offshore team members did not feel like "full-fledged team members", frequently waiting for onshore leads to tell them what to do, despite our attempts to involve them in Sprint Planning and retrospectives.
  • Because the offshore developers did not take ownership, it fell to the onshore devs to "assign" them work. Often the onshore devs did not feel confident in giving the offshore resources critical work items, and so they ended up being assigned low-priority work that did not contribute significantly to the business value of our project
  • Even with our daily Scrum being held via conference call, due to communication difficulties, cultural differences, or something else frequently the offshore resources wouldn't give much context to what they were doing and so these Scrums were of relatively low value

When we finally made this approach worked well:

  • We rolled-off all the offshore developers who had been seen as contributing marginal value, keeping only the best resources
  • We reduced the team sizes, so the offshore resources were not "extra capacity" but were absolutely essential to the team's ability to deliver on its commitments.We started using online collaborative tools (namely the Wiki) to ensure the offshore team members knew exactly what their priorities were and so that onshore leads/managers could effectively communicate the highest priority work items to them.
  • We removed the members of the onshore management team who had failed to give clear direction to the offshore team members and consolidated the management of the project in the hands of a few senior team members

The second approach:

  • More traditional Waterfall development schedule
  • Onsite architects and leads largely flesh out the initial design and implementation for new features, which are then handed off to largely self-contained offshore teams
  • The offshore company provides "embedded" team members who are onsite. This includes technical program managers as well as development staff. There are also managers who are co-located with the offshore developers
  • Onshore developers serve largely as giving direction to the offshore developers and reviewing the work they have done

The problems we currently face:

  • The offshore developers were not directly screened. Instead, we choose the leads who were allowed to choose their developers. In practice, who knows if they screened them. I suspect they were simply "given" resources by their offshoring company. This meant that the quality of the offshore resources was somewhat inconsistent at times.
  • Projects estimated to require a single dev for a few days, are assigned to multiple offshore developers since there is so much capacity offshore. This ends up negating the cost saving of using offshore in hte first place.
  • Related to the previous point, there are no tools to easily know what the offshore members are working on.
  • Given the ebb and flow of project work and the massive size of hte offshore team (almost 30 developers), we often end up with a bunch of developers with nothing to do. We are rate limited in our ability to farm work out to them, because it tends to need analysis by onsite leads which is rate limited.


What is working and current improvements we are attempting to make:

  • Some of the offshore developers are extremely competent.
  • The onsite technical PMs are also extremely competent and better than many of our FTE managers as they have technical backgrounds. It should be noted there are only a few and they are over 10-15K per month per head, so the cost savings are debatable.
  • Having onsite technical PMs and resource managers also helps to overcome the pain associated to a significant time difference between locations. Of course, it only "transfers" the pain to the onsite representatives.

Concluding thoughts:

I have done something of an about-face with regard to offshore. I'm now convinced that there is some great talent out there. I still, however, have yet to be shown that there is anything better than a motivated staff of full-time, onsite developers, both in terms of the quality of what they produce, but also the cost at which they can produce it. The communication overhead and the low visibility onsite members have into what offshore team members are doing are severe impediments.

These impediments can be overcome by using onsite representative to "bridge the gap", but the cost of these people is significant and negates much of the value proposition of offshoring.

If *I* was the CEO, the way I would use offshoring would be:

  • Recruit offshore developers using even the same criteria as used for an FTE. In fact, they have to be *better* than an equivalent onshore developer as they have signficiant geographical and language barriers to overcome.
  • Build your offshore talent pool slowly, add one or two here and there, augmenting existing staff. Just as you could not expect to build a great programming staff by hiring 30 people all at once, don't expect that your going to end up with a "great" team by replacing all of your staff with offshore devs.
  • Ensure you have the tools/communication vectors in place to communicate work to your offshore devs. it can be as simple as a Wiki. But you absolutely need to know *what* the offshore resources are doing on any given day.
  • Prefer resources located in a timezone as close to your own as possible. Costa Rica has some great talent.
  • Ensure your onsite program management team is top notch. Having mediocre PGMs can be worked through wtih onsite people picking up the slack. If you're using offshore, it ends up being a disaster.
  • Continue to recruit onshore developers and do the majority of your work via these resources. Use offshore only as an augmentation to this staff rather than a replacement.

I'm not sure that totally captures my thoughts, but it will have to do for now.

Monday, September 28, 2009

Best Job Application Ever

Usually, I try to use this space to talk about deep, meaningful subjects (or something). But every once in awhile I think there's room for a little levity. I happened to be digging around some files and happened upon a real gem I figured I should share. I almost submitted it to thedailywtf.com, but decided:

"Why the hell should I give them *my* content for free?"

Anyway, a few years back I was still just a lowly developer. We were swamped with work and desperate to find Perl developers. There was an alias set up for incoming job applications. One of the ones that came had this cover letter:

I am interested in a position in Computers and Technology in Research and Development with your company. I do possess many technological positions with scientific advancements and discoveries and skills that are not listed on my application and resume with some older employers. I am interested in developing novel technology for your corporation that is outranking outside of every text book and that is not taught at any university, company, or federal agency. My computer is the first computer on the Internet powering the entire Universe over thin air, which consists of optical motherboards, optical RAM, optical hard drives, optical processors, and automorphing anti-viral biological-plasmas replacing the solid-state nuclear magnetic computer. Some of my inventions are:

Optical Biological Digital Computers & Components Hyper Drives [FTL] [Faster Than Light] Warp Drives [FTL] [Faster Than Light] Time Machines [Direct & Real-Time] Biological Atomic Clocks [Direct & Real-Time] Liquid-Crystal Quantum Mechanics & Biological Continuum Liquid-Crystal Astrophysics Liquid-Crystal Physics Liquid-Crystal Mathematics [Differential Equations] Liquid-Crystal Science Liquid-Crystal Astronomy Liquid-Crystal Programming Liquid-Crystal Data Storage Liquid-Crystal Universe Mapping Liquid-Crystal Time Travel Liquid-Crystal High-Energy Cold Fusion [######] Liquid-Crystal Civil Defense Computer Systems Liquid-Crystal Cyber Health Care Liquid-Crystal Experimental Computers & Components Liquid-Crystal Nanobots and Nanotechnology

I am interested in starting as soon as possible. Please don't hesitate to contact myself at (###) ###-####. It will certainly be a pleasure in meeting as well as working with you. Please consider me a candidate for employment with your company.

Sadly, we did *not* interview this guy. Some thought it was a joke. I'm more inclined to think it was someone who was completely delusional and off his meds. But, hey, maybe the guy was totally legit. We may have missed out on a computer 'powering the entire Universe over thin air'. Crap!!!