Friday, 21 March 2014

VALJOs - Value Java Objects

The term "value object" gets confusing in Java circles with various different interpretations. When I think about these I have my own very specific interpretation, which I'm christening VALJOs.

Introduction

The Wikipedia article on value objects gives this definition:

In computer science, a value object is a small object that represents a simple entity whose equality is not based on identity: i.e. two value objects are equal when they have the same value, not necessarily being the same object.

Classic examples of values are numbers, amounts, dates, money and currency. Martin Fowler also gives a good definition, and related wiki.

I agree with the Wikipedia definition, however there is still a lot of detail to be added. What does it take to write a value object in Java - is there a set of rules?

Recently, life has got more complicated with talk of adding value types to a future Java release:

"A future version of Java" will almost certainly have support for "value types", "user-defined primitives", "identity-less aggregates", "structs", or whatever you would like to call them.

Work on value types is focussed on extending the JVM, language and libraries to support values. Key benefits revolve and memory usage and performance, particularly in enabling the JVM to be much more efficient. Since this post isn't about value types, I'll direct curious readers to the JEP and various articles on John Rose's blog.

JVM supported value types may be the future, but how far can be go today?

VALue Java Objects - VALJOs

What I want to achieve is a set of criteria for well-written value objects in Java today. Although written for today, the criteria have one eye on what value types of the future may look like.

As the term "value object" is overloaded, I'm giving these the name VALJO. Obviously this is based on the popular POJO naming.

  • The class must be immutable, which implies the final keyword and thread-safety.
  • The class name must be simple and direct, focussed on the value.
  • Instances must be obtained using static factory methods. All constructors must be private.
  • The state that defines the value must be clearly specified. It will consist of one or more elements.
  • The elements of the state must be other values, which includes primitive types.
  • The equals() and hashCode() methods must check all the elements of the state and nothing else.
  • If the class implements Comparable, then it must check all the elements of the state and nothing else.
  • If the class implements Comparable, the compareTo() method must be consistent with equals.
  • The toString() method must return a formally-defined string fully exposing the state and nothing else.
  • The toString() for two equal values must be the same. For two non-equal values it must be different.
  • There must be a static factory method capable of creating an instance from the formal string representation. It is strongly recommended to use the method name parse(String) or of(String).
  • The clone() method should not be public.
  • Provide methods, typically simple getters, to get the elements of the state.
  • Consider providing with() methods to obtain a copy of the original with different state.
  • Other methods should be pure functions, depending only on the arguments are the state or derived state.
  • Consider providing a link to the JDK 8 value-based classes documentation.

It is important to note that the state of a VALJO is not necessarily identical to the instance variables of the Class. This happens in two different ways:

Firstly, the state could be stored using different instance variables to those in the definition. For example, the state definition and public variables may expose an int but the instance variable might be a byte, because the validated number only needs 8 bit storage, not 32 bit. Similarly, the state definition might expose an enum, but store the instance variable as an int. In this case, the difference is merely an implementation detail - the VALJO rules apply to the logical state.

Secondly, there may be more instance variables than those in the state definition. For example, a Currency is represented in ISO 4217 with two codes, a three letter string and a three digit number. Since it is possible to derive the numeric code from the string code (or vice versa), the state should consist only of the string. However, rather than looking up the number each time it is needed, it is perfectly acceptable to store the numeric code in an instance variable that is not part of the state. What would not be acceptable for Currency is including numeric code in the toString (as the numeric code is not part of the state, only the string code is).

In effect, the previous paragraph simply permits caching of related data within a VALJO so long as it can be derived from the state. Extending this logic further, it is acceptable to cache the toString() or hashCode().

On parsing, I recommend using of(String) if the state of the VALJO consists entirely of a single element which is naturally a string, such as Currency. I recommend parse(String) where the state consists of more than one element, or where the single element is not a string, such as Money.

It is acceptable for the parse method to accept alternate string representations. Thus, a Currency parsing method might accept both the standard three letter code and the string version of the numeric code.

On immutability, it is allowed to have an abstract VALJO class providing the constructor is private scoped and all the implementations are immutable private static final nested classes. The need for this is rare however.

Using VALJOs

It is best practice to use VALJOs in a specific way tailored for potential future conversion to JVM value types:

  • Do not use ==, only compare using equals().
  • Do not rely on System.identityHashCode().
  • Do not synchronize on instances.
  • Do not use the wait() and notify() methods.

The first of these rules is stronger than absolutely necessary given we are talking about normal Java objects in a current JVM. Basically, so long as your VALJO implementation promises to provide a singleton cached instance for each distinct state then using == is acceptable, because == and equals() will always return the same result. The real issue to be stopped here is using == to distinguish between two objects that are equal via equals().

Separately from these restrictions, the intention of VALJOs is that users should refer to the concrete type, not an interface. For example, users refer to the JSR-310 LocalDate, not the Temporal interface that it implements. The reason for this is that VALJOs are, of necessity, simple basic types, and hiding them behind an interface is not helpful.

Tooling

Actually writing VALJO classes is boring and tedious. The rules and implications need to be carefully thought through on occasion. Some projects provide tools which may assist to some degree.

Joda-Convert provides annotations for @ToString and @FromString which can be used to identify the round-trippable string format.

Joda-Beans provides a source code generation system that includes immutable beans. They are not instant VALJOs, but can be customised.

Auto-Value provides a bytecode generation system that converts abstract classes into immutable value objects. They are not instant VALJOs, but can be customised.

Project Lombok provides a customised compiler and IDE plugin which includes value object generation. They are not instant VALJOs, but can be customised.

In general however, if you want to write a really good VALJO, you need to do it by hand. And if that VALJO will be used by many others, such as in JSR-310, it is worth doing correctly.

Summary

The aim of this blog was to define VALJOs, a specific set of rules for value objects in Java. The rules are intended to be on the strict side - all VALJOs are value objects, but not all value objects are VALJOs.

Did I miss anything? Get something wrong? Let me know!

23 comments:

  1. Similar to Auto-Value, Wire generates value objects. It generates from proto schemas rather than .java files.
    http://corner.squareup.com/2013/08/introducing-wire.html

    ReplyDelete
  2. What would be the reason to force all "value objects" to be immutable? Immutability is a separate property and can be applied to any kind of object when needed, but forcing it everywhere is obsessive.

    ReplyDelete
    Replies
    1. A VALJO is by design a simple immutable type, like java.lang.String. But not all value objects are or should be VALJOs. I would note that Martin Fowler believes all value objects should be immutable - http://martinfowler.com/bliki/ValueObject.html . In other words, I would suggest that the simple mutable object you're thinking of is probably better described as a POJO, not a value object.

      Delete
    2. If a VALJO is not immutable, it is possible to change it's hash code. This would be bad, because in this case it wouldn't be safe to use VALJOs as keys for a HashMap.

      Delete
    3. A mutable object is a "place" to hold data or state, or a "place" to put a value. A value, is a value, and by definition has no state.

      You can't change 1 to 2. You can change a "place" that holds numbers to hold 1, then 2.

      This is a critical distinction for many properties. Values are composable -- a complex value made of values is a value. Its referentially transparent. All copies are indistinguishable.

      Delete
    4. C# architect Eric Lippert basically says that allowing mutable value types was a mistake, "This is yet another reason why mutable value types are evil.". He explains why here:

      http://blogs.msdn.com/b/ericlippert/archive/2008/05/14/mutating-readonly-structs.aspx

      Delete
  3. I agree with all points, but I'm not getting what is the reason for the rule "Instances must be obtained using static factory methods. All constructors must be private."
    If I follow all the rules but this one, what consequences would it have?

    ReplyDelete
    Replies
    1. Should the class Boolean have a constructor? These days we see that as a mistake. Values are indistinguishable by identity, so creating a new one is simply the wrong message. Factory methods also enable caching of instances, as per the Boolean example. When looking forward to JVM value types, Oracle have indicated that factory methods are a requirement. See also the John Rose links for more info.

      Delete
    2. Well Boolean is not an ideal example, because it's more like an enum, can only have two values; Whereas LocalData can have infinite number of values.
      My point is - if another rule sais "never rely on VALJOs identity, always use equals() instead of ==", and taking into account there is no obligation for the factory method to always return the same instance for the same data, I don't see any practical difference between constructor and factory - I mean, having constructor does not lead to any practical problems (even though agree that factory method seems more logical in this case; but it's more about "taste" than anything else - right?)

      Delete
    3. The constructor provides a mechanism to make identity visible, and its a mechanism which is closed down using a factory method. While this may be taste now, in a future version of Java it probably won't be - you'll have to use factories when writing a value type. ie VALJOs are intended to be well-placed for future conversion to JVM value types, but you have to follow all the rules above, not pick and choose.

      Delete
  4. Another great initiative, Stephen! Esp. agreeing on the requirement for factory methods is a step forward IMHO, as it's become a common pattern in modern Java frameworks. E.g. [JAX-RS](https://jsr311.java.net/nonav/releases/1.1/spec/spec3.html#x3-220003.2) supports factory methods for conversion of bits of URIs to typed arguments. It supports _valueOf_ and _fromString_, which works quite nicely with a lot of common value object classes like Integers, etc. So I'd prefer the more common _valueOf_ name over _of_ or _parse_.

    ReplyDelete
  5. Could you kindly fix the link to the "JDK 8 value-based classes documentation"? It's 404.

    I wonder what VALJO property is missing when using Lombok's `@Value(staticConstructor="of") @Wither`?

    ReplyDelete
  6. Why clone should not be public?

    If an object is really immutable and thread-safe, I should be able to make as many copies (clones) as I want to.

    ReplyDelete
    Replies
    1. But why? Cloning serves no useful purpose, it simply fills up memory with identical instances. Just share the original object instead of cloning it.

      Delete
    2. You're right. Clone doesn't matter.

      But if you make it public, your VO will still be a VO in Fowler/DDD sense. Not exposing it is just a matter of resource optimisation, but is not a criterion that, if not fulfilled, distinguishes a VO from a non-VO.

      I would just put this criterion as 'nice-have', but not mandatory.

      Anyway, you compiled a very complete list of VO attributes. Thank you!

      Delete
  7. How does this play well with JAXB?

    ReplyDelete
  8. Hi Stephen,

    thank you very much for this definition. I contrasted it with the definition of value-based classes and found one interesting difference: While you do not remark on serialization, the docs explicitly forbid it.

    Since I don't precisely understand why this concerns serialization (I also just asked about that on StackOverflow) I'm interested in your opinion.

    Thanks,
    Nicolai

    ReplyDelete
    Replies
    1. The stack overflow answer explains the issue, that serialization uses identity hash code internally. However, I doubt it will cause much of an issue, because immutable objects can't contain object cycles.

      Delete
    2. Serialization may circumvent the constructor method, which can result in several instances of the same value. As the instances are defined to be indistinguishable (cycles or not), only some optimization may not work. Wouldn't be a problem in most cases, would it? Compared to not being able to serialize the value (e.g., for distributed caches), this is a lesser evil. And if you really do need that optimization, you could implement readObject to take care.

      I don't see the necessity of this restriction.

      Delete
  9. I have created a open source tool VALJOGen for generating highly customizable value objects inspired partly by your blog posting here. You can try it out at "http://valjogen.41concepts.com". Hope you like it?
    /Morten

    ReplyDelete
    Replies
    1. Thanks for the link and interesting project. Code generation from interfaces is indeed an alternative approach to Joda-Beans for a simple VALJO with no need for properties. (personally I've not yet used annotation generation and haven't yet found an ideal use case for it.)

      Delete
    2. There's also http://immutables.org which are inspired more by Guava in it's current form. And It's all started as experimentation after some experience gained while working with generated java beans on Airline E-Commerce platform (omg, it was 5 years ago ;), from which, I guess, Joda-Beans also derived.
      As for AutoValue, Immutables and VALJOGen – important practice is to never mix user written code with generated code in one file. Annotation processing does a best job with tool integration. Despite IDE integration is still sometimes fails to auto-configure and manual intervention needed in some cases, but almost any build tool can process it out of the box as annotation processing is a part of standard java compiler

      Delete
    3. Thanks for the link to Immmutables (the airline bean system was developed by me). Joda-Beans does generate code in the source file (but in a way where the generator can be run again and again). This works very well in practice and has few negatives. It is particularly important to get Javadoc generation of the resulting class, allowing users to perceive no difference between the generated class and one written by hand. With annotation generated systems coders have to know about the generated class in order to create it.

      Being able to access the properties of the Joda-Bean, loop around them and pass them about is a level of power significantly above similar projects.

      Delete

Please be aware that by commenting you provide consent to associate your selected profile with your comment. Long comments or those with excessive links may be deleted by Blogger (not me!). All spam will be deleted.