Sunday, 9 November 2008

Java language change - Unique identifier strings

This last week I've been refactoring some old code in my day job to simplify some very crusty code. One of the parts of the refactor has made me write up why language features often affect more than is immediately obvious.

The case of the Unique Identifier String

The specific code I've been working on isn't that significant - its a set of pools that manage resources. The important feature for this discussion is that there are several pool instances each of which has a unique identifier.

When the code started out many years ago, the identifier was simple - the unique name of the pool. As a result, the unique identifier was the pool name, and that was defined as a String:

// example code - hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName) { ... }
  ...
}
public class Pool {
  private String poolName;
  ...
}

Internally, the manager consists of maybe 25 classes (its over-complex, no IoC, and needs refactoring, remember...). Most of the 25 classes have some kind of reference to the pool name, whether to access configuration, logging or some other reason.

At some point in the past, a new development was commissioned that affected the whole system. The new development - maintenance code - was to allow multiple sets of configuration throughout the system.

To achieve this, everywhere that accessed configuration needed a unique key for the configuration it needed to access. Again, as this was a simple lookup, a String was used. And, since the pooling component was affected, a second unique key was added:

// example code - still hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName, String configName) { ... }
  ...
}
public class Pool {
  private String poolName;
  private String configName;
  ...
}

Now, in order to complete the change, the config name was rolled out to most of the 25 classes alongside the poolId. In effect, the true 'unique id' for the pool became the combination of the two separate keys of poolName and configName.

Now, we could debate lots about this design, but thats not the point. The point, in case you missed it, is that we now have up to 25 classes with two 'unique ids' that are really one. In addition, this creates confusion in what things mean. After all, with two keys we now need a map within a map to lookup the actual pool, right? (again, I know the alternatives - this is a blog about what maintenance code does over time, and how to tackle it...)

OK, so how might we improve this using Java?

A better design

If the original developer had coded a PoolId class then the overall design would have been a lot better:

// pre-maintenance:
public class PoolId {
  private String poolName;
  ...
}
public class PoolManager {
  public static Pool getPool(PoolId poolId) { ... }
  ...
}
public class Pool {
  private PoolId poolId;
  ...
}

Now, the maintenance coder would have had a much easier task:

// post-maintenance:
public class PoolId {
  private final String poolName;
  private final String configName;   // NEW CODE ADDED
  ...
}
// NOTHING ELSE CHANGES! PoolManager and Pool stay the same!

Wow! Thats a lot clearer. We've properly encapsulated the concept of the unique pool identifier. This allowed us to change the definition to add the configName during the later maintenance. This isn't rocket science of course, and there isn't anything new in this blog so far...

Now, what I want to do is ask the awkward question - Why wasn't the PoolId class written originally?

Its vital that we understand that question. Its the root cause as to why the code now needs refactoring, and why it is hard to understand and change. (And bear in mind this is just an example scenario - You should be able to think of many similar examples in your own code)

Well, lets look at the PoolId class in more detail. In particular, lets look at the code I omitted above with some '...'.

// real version of PoolId in Java - pretty boring...
public final class PoolId {
  private final String poolName;
  
  public PoolId(String poolName) {
    if (poolName == null) {
      throw new IllegalArgumentException();
    }
    this.poolName = poolName;
  }
  public String getPoolName() {
    return poolName;
  }
  public boolean equals(Object obj) {
    if (obj == this) {
      return true;
    }
    if (obj instanceof PoolId == false) {
      return false;
    }
    PoolId other = (PoolId) obj;
    return poolName.equals(other.poolName);
  }
  public int hashCode() {
    return poolName.hashCode();
  }
  public String toString() {
    return poolName;
  }
}

Now we know why the original developer didn't write the class PoolId. Very, very few of us would - the effort required is simply too great. Its verbose, boring, and probably has enough potential for bugs that it might need its own test.

But the way we write this class - what it actually looks like - is a language design issue!

Quick composites

It is perfectly possible to design a language that makes such classes really easy to write. For example, here is a psuedo-syntax of an imaginary language a bit like Java:

// made up language, based on Java
public class PoolId {
  property state public final String! poolName;
}

The 'property' keyword adds the get/set methods (no need for set in this case, as the field is final). The 'state' keyword indicates that this is part of the main state of the class. Adding the keyword generates the constructor, equals(), hashCode() and toString() methods. And finally, the '!' character means that the string cannot be null.

Adding another item of state is really simple:

public class PoolId {
  property state public final String! poolName;
  property state public final String! configName;
}

Suddenly, adding a new class for things like PoolId doesn't seem a hardship. In fact, we've changed implementing the right design to doing the easy thing. Basically, its about as easy as its ever going to get.

My real point is that if Java had a language feature like this, then there would a much greater chance for the better design to be written. After all, most developers will always take the lazy option - and in Java that is way too many String identifiers.

So, does this imaginary language exist? Well, some get a lot closer than Java, but I don't think any language achieves quite this kind of brevity (prove me wrong!).

In addition, I'm arguing for Fan to encompass 'quick composites' like this. After all, I'd argue that most (80%?) of the classes we write could have auto-generated equals() and hashCode() based on a 'state' keyword.

Summary

As a community of Java developers, we need sometimes to realise that the language we develop in can actually hold us back. A language design feature like this is not just about saving a few keystrokes. It can fundamentally change the way lots of code gets developed simply by changing the better design from very hard/verbose to really easy. And the knock-on effects in maintenance could be huge.

Finally, I want to be clear though that I am NOT advocating a change like this in Java. Java is probably too mature now to handle big changes like this. But new languages should definitely be thinking about it.

Opinions

What languages come close to this design?
What percentage of classes in your codebase could have their equals()/hashCode() methods generated by a 'state' keyword?
Opinions welcome!

14 comments:

  1. Java has the equivalent of a state keyword. Any variable not marked transient can be considered a state variable. All that java would need to get a free default implementation of equals() and hashCode() would be an implementation in Object.

    I'd also like to see the default toString modified so that it prints out the address of the object along with all state objects.

    ReplyDelete
  2. Common Lisp's DEFSTRUCT macro comes pretty close. And if you want to come closer, it'd be pretty trivial to add your own macro.

    In my Java codebase, we have a rule against using String as anything but a String. There's a base class called AbstractValueHolder (which is generic) and implements not only a getter, equals and hashCode, but also toString, compareTo (maybe), and formatTo. People that need a new "PoolId" can start by extending that class. It's not perfect, but it makes the barrier to entry a lot lower.

    ReplyDelete
  3. Saying
    PoolId=Struct.new(:poolName)
    will get you everything except for the non-null constraint in Ruby. Ruby, like Java, doesn't have any built-in mechanism for non-null constraints, though one could more easily be metaprogrammed into Ruby than Java.

    ReplyDelete
  4. It might be worthy to provide language constructs to implement the bean component model, I actually see a some downsides:

    Going down this route, many people would always define classes for every kind of string value just in case (i.e. instead of first and last name as a string, make a Name class, so you can accomodate multi-part Arabic family names). Here you have to carefully balance flexibility against complexity. While you gain stronger type checking and abstraction, you introduce another layer of moving parts you have to keep in mind (your classes usually have more complicated contracts than the automatically generated equals and hashCode).

    For a language like Java, one way to achieve 90% of the described is to use a factory generate mostly-bean-compliant (no default constructor) value object implementations from annotated bean interfaces (including bound, non-null and vetoable properties.) I've done it with dynamic proxies and it could be done with ASM as well. The approach is convenient and gets the job done, but promotes anemic domain layer, which I came to consider a bad design.

    ReplyDelete
  5. Sounds like a Scala case class:
    case class PoolId(poolName:String, configName:String)

    equals() and hashCode() are automatically defined, and final fields are created for the two class parameters you defined.

    scala> case class PoolId(poolName:String, configName:String)
    defined class PoolId

    scala> PoolId("pool-1", "config-1") == PoolId("pool-1", "config-1")
    res6: Boolean = true

    scala> PoolId("pool-1", "config-1") == PoolId("pool-2", "config-1")
    res7: Boolean = false

    (== is defined to call .equals())

    Patrick

    ReplyDelete
  6. If you use the code generation options of an IDE like IDEA, then such classes become easier to write without any change to the language being required. My (out of date) version of IDEA will generate the getters/setters and equals/hashCode methods.

    ReplyDelete
  7. I wrote a simple replacement for Object that uses an annotation to provide equals, hashcode and toString implementations. Of course it doesn't help with the other aspects you identify, and it complicates things a little by requiring you to subclass this Object class. Might be a little long to post in a comment...

    If I win the Lottery I'd love to create YAJSL (Yet Another Java Substitute Language) - I've looked at Fan, Ruby and Scala and programmed quite a lot in Groovy, but none of them quite hit the sweet spot for me. Sounds like you have quite similar notions of what you'd like from a language.

    ReplyDelete
  8. One simple way to avoid having to write getter and setter is to just not write them. Unless you have a library which won't work without them I see them as entirely optional. You can inherit a generic definition for equals/hashCode/toString from a parent class. No changes needed to the language.
    PS: however I would suggest that the default constructor should take all final fields without a value, but that might be dangerous.

    public class PoolId extends AbstractPojo {
    public final String poolName;
    public final String configName;

    public PoolId(String poolName, String configName) {
    this.poolName = poolName;
    this.configName = configName;
    }
    }

    ReplyDelete
  9. This seems like a rather bizarre problem that only came up because a project was refactored in a strange way. I can't say how it should have gone, since I don't know how the whole project works. Are you hard-coding poolnames and config names? If so, why?

    It makes much more sense to have a Config object, with a getPool(String s) function

    ReplyDelete
  10. Stephen Colebourne10 November 2008 at 07:49

    Thanks for all the good summaries of other languages. Its always useful to document equivalent ideas. I certainly think the Java ideas (subclass/proxies/code generation) are more clunky than what is out there, and I strongly suspect that the absence of the transient keyword isn't the same as the presence of the state keyword.

    Robert, any hints as to what isn't present in the languages that you would add to your YAJSL?

    ReplyDelete
  11. It's more a matter of style or philosophy - for instance most of them have some degree of dynamism, and I'm not a fan of dynamic languages, because it's too easy for poor coders to make an unnavigable mess (you can do it with Java too, of course, but at least static code analysis gets you a bit further into working out what on earth is going on...). I also like a lot of type safety for the same reason. I see ease of maintenance and trying to coax poor coders in the right direction as more important than power, by and large, because poor coders will be always with us and I seem to spend most of my billable hours working with utter rubbish code.

    Scala's probably the closest to what I want, but I find the syntax, as with Ruby, a turn-off - which might sound a little silly, but for me syntax matters quite a lot.

    ReplyDelete
  12. Is some of this achievable with annotations and apt (http://java.sun.com/j2se/1.5.0/docs/guide/apt/) ? Like a @NotNull ? Or a method generation task ?

    ReplyDelete
  13. @Ed Bowler: Essentially, yes, to some extent. Nullity is a very complex subject, due to generics (for a full treatment on the topic, see my blog at http://www.zwitserloot.com/2008/10/23/non-null-in-static-languages/ - but the other stuff; generating toString, getters, setters, hashCode, equals, and, while we're at it, why not go all the way and toss addChangeListener in there as well.

    Unfortunately, APT cannot change existing classes - it can only add new ones. The solution, then, becomes something like this:

    @ValueClass
    public class PoolIdTemplate {
    String poolName;
    }

    The code is extremely short (a ValueClass is immutable with getters and a constructor as well as the expected hashCode and equals method, where all fields not marked transient are assumed to be state, so this would produce a private final String poolName, a getPoolName, and a constructor that takes a String. If you want more flexibility, which is what Stephen was getting at, I think, you can make this scheme just about as complex as you want it to be.

    However.

    You can't name your class the same as the target class (well, you can, but then you'd have to have a funky package name instead), because you can't 'overwrite' it. This is rather ugly, unfortunately. So, that's why the class is called PoolIdTemplate instead of PoolId. You can make the annotation smart-ish, by having a single parameter for the name of the real class, which is by default demunged from the template class name by removing 'Template' off the end, and if thats not possible, erroring during compilation. Still, that's a small nicety that doesn't really help the central problem of having a weird class name.

    ReplyDelete
  14. You can also consider this in Groovy:

    @Immutable
    final class PoolId { String poolName, configName }

    this gives you both a positional and named-arg constructor as well as equals, hashCode, toString etc., plus it will do defensive copy in and out if you specify a mutable type for your properties, e.g. List, Date.

    It does this through compile-time metaprogramming (AKA AST Macros) so that it produces byte code that can be used anywhere in Java without even needing Groovy runtime bootstrapping.

    ReplyDelete