Sunday 9 November 2008

Java language change - Unique identifier strings

This last week I've been refactoring some old code in my day job to simplify some very crusty code. One of the parts of the refactor has made me write up why language features often affect more than is immediately obvious.

The case of the Unique Identifier String

The specific code I've been working on isn't that significant - its a set of pools that manage resources. The important feature for this discussion is that there are several pool instances each of which has a unique identifier.

When the code started out many years ago, the identifier was simple - the unique name of the pool. As a result, the unique identifier was the pool name, and that was defined as a String:

// example code - hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName) { ... }
  ...
}
public class Pool {
  private String poolName;
  ...
}

Internally, the manager consists of maybe 25 classes (its over-complex, no IoC, and needs refactoring, remember...). Most of the 25 classes have some kind of reference to the pool name, whether to access configuration, logging or some other reason.

At some point in the past, a new development was commissioned that affected the whole system. The new development - maintenance code - was to allow multiple sets of configuration throughout the system.

To achieve this, everywhere that accessed configuration needed a unique key for the configuration it needed to access. Again, as this was a simple lookup, a String was used. And, since the pooling component was affected, a second unique key was added:

// example code - still hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName, String configName) { ... }
  ...
}
public class Pool {
  private String poolName;
  private String configName;
  ...
}

Now, in order to complete the change, the config name was rolled out to most of the 25 classes alongside the poolId. In effect, the true 'unique id' for the pool became the combination of the two separate keys of poolName and configName.

Now, we could debate lots about this design, but thats not the point. The point, in case you missed it, is that we now have up to 25 classes with two 'unique ids' that are really one. In addition, this creates confusion in what things mean. After all, with two keys we now need a map within a map to lookup the actual pool, right? (again, I know the alternatives - this is a blog about what maintenance code does over time, and how to tackle it...)

OK, so how might we improve this using Java?

A better design

If the original developer had coded a PoolId class then the overall design would have been a lot better:

// pre-maintenance:
public class PoolId {
  private String poolName;
  ...
}
public class PoolManager {
  public static Pool getPool(PoolId poolId) { ... }
  ...
}
public class Pool {
  private PoolId poolId;
  ...
}

Now, the maintenance coder would have had a much easier task:

// post-maintenance:
public class PoolId {
  private final String poolName;
  private final String configName;   // NEW CODE ADDED
  ...
}
// NOTHING ELSE CHANGES! PoolManager and Pool stay the same!

Wow! Thats a lot clearer. We've properly encapsulated the concept of the unique pool identifier. This allowed us to change the definition to add the configName during the later maintenance. This isn't rocket science of course, and there isn't anything new in this blog so far...

Now, what I want to do is ask the awkward question - Why wasn't the PoolId class written originally?

Its vital that we understand that question. Its the root cause as to why the code now needs refactoring, and why it is hard to understand and change. (And bear in mind this is just an example scenario - You should be able to think of many similar examples in your own code)

Well, lets look at the PoolId class in more detail. In particular, lets look at the code I omitted above with some '...'.

// real version of PoolId in Java - pretty boring...
public final class PoolId {
  private final String poolName;
  
  public PoolId(String poolName) {
    if (poolName == null) {
      throw new IllegalArgumentException();
    }
    this.poolName = poolName;
  }
  public String getPoolName() {
    return poolName;
  }
  public boolean equals(Object obj) {
    if (obj == this) {
      return true;
    }
    if (obj instanceof PoolId == false) {
      return false;
    }
    PoolId other = (PoolId) obj;
    return poolName.equals(other.poolName);
  }
  public int hashCode() {
    return poolName.hashCode();
  }
  public String toString() {
    return poolName;
  }
}

Now we know why the original developer didn't write the class PoolId. Very, very few of us would - the effort required is simply too great. Its verbose, boring, and probably has enough potential for bugs that it might need its own test.

But the way we write this class - what it actually looks like - is a language design issue!

Quick composites

It is perfectly possible to design a language that makes such classes really easy to write. For example, here is a psuedo-syntax of an imaginary language a bit like Java:

// made up language, based on Java
public class PoolId {
  property state public final String! poolName;
}

The 'property' keyword adds the get/set methods (no need for set in this case, as the field is final). The 'state' keyword indicates that this is part of the main state of the class. Adding the keyword generates the constructor, equals(), hashCode() and toString() methods. And finally, the '!' character means that the string cannot be null.

Adding another item of state is really simple:

public class PoolId {
  property state public final String! poolName;
  property state public final String! configName;
}

Suddenly, adding a new class for things like PoolId doesn't seem a hardship. In fact, we've changed implementing the right design to doing the easy thing. Basically, its about as easy as its ever going to get.

My real point is that if Java had a language feature like this, then there would a much greater chance for the better design to be written. After all, most developers will always take the lazy option - and in Java that is way too many String identifiers.

So, does this imaginary language exist? Well, some get a lot closer than Java, but I don't think any language achieves quite this kind of brevity (prove me wrong!).

In addition, I'm arguing for Fan to encompass 'quick composites' like this. After all, I'd argue that most (80%?) of the classes we write could have auto-generated equals() and hashCode() based on a 'state' keyword.

Summary

As a community of Java developers, we need sometimes to realise that the language we develop in can actually hold us back. A language design feature like this is not just about saving a few keystrokes. It can fundamentally change the way lots of code gets developed simply by changing the better design from very hard/verbose to really easy. And the knock-on effects in maintenance could be huge.

Finally, I want to be clear though that I am NOT advocating a change like this in Java. Java is probably too mature now to handle big changes like this. But new languages should definitely be thinking about it.

Opinions

What languages come close to this design?
What percentage of classes in your codebase could have their equals()/hashCode() methods generated by a 'state' keyword?
Opinions welcome!