Friday, 21 March 2014

VALJOs - Value Java Objects

The term "value object" gets confusing in Java circles with various different interpretations. When I think about these I have my own very specific interpretation, which I'm christening VALJOs.

Introduction

The Wikipedia article on value objects gives this definition:

In computer science, a value object is a small object that represents a simple entity whose equality is not based on identity: i.e. two value objects are equal when they have the same value, not necessarily being the same object.

Classic examples of values are numbers, amounts, dates, money and currency. Martin Fowler also gives a good definition, and related wiki.

I agree with the Wikipedia definition, however there is still a lot of detail to be added. What does it take to write a value object in Java - is there a set of rules?

Recently, life has got more complicated with talk of adding value types to a future Java release:

"A future version of Java" will almost certainly have support for "value types", "user-defined primitives", "identity-less aggregates", "structs", or whatever you would like to call them.

Work on value types is focussed on extending the JVM, language and libraries to support values. Key benefits revolve and memory usage and performance, particularly in enabling the JVM to be much more efficient. Since this post isn't about value types, I'll direct curious readers to the JEP and various articles on John Rose's blog.

JVM supported value types may be the future, but how far can be go today?

VALue Java Objects - VALJOs

What I want to achieve is a set of criteria for well-written value objects in Java today. Although written for today, the criteria have one eye on what value types of the future may look like.

As the term "value object" is overloaded, I'm giving these the name VALJO. Obviously this is based on the popular POJO naming.

  • The class must be immutable, which implies the final keyword and thread-safety.
  • The class name must be simple and direct, focussed on the value.
  • Instances must be obtained using static factory methods. All constructors must be private.
  • The state that defines the value must be clearly specified. It will consist of one or more elements.
  • The elements of the state must be other values, which includes primitive types.
  • The equals() and hashCode() methods must check all the elements of the state and nothing else.
  • If the class implements Comparable, then it must check all the elements of the state and nothing else.
  • If the class implements Comparable, the compareTo() method must be consistent with equals.
  • The toString() method must return a formally-defined string fully exposing the state and nothing else.
  • The toString() for two equal values must be the same. For two non-equal values it must be different.
  • There must be a static factory method capable of creating an instance from the formal string representation. It is strongly recommended to use the method name parse(String) or of(String).
  • The clone() method should not be public.
  • Provide methods, typically simple getters, to get the elements of the state.
  • Consider providing with() methods to obtain a copy of the original with different state.
  • Other methods should be pure functions, depending only on the arguments are the state or derived state.
  • Consider providing a link to the JDK 8 value-based classes documentation.

It is important to note that the state of a VALJO is not necessarily identical to the instance variables of the Class. This happens in two different ways:

Firstly, the state could be stored using different instance variables to those in the definition. For example, the state definition and public variables may expose an int but the instance variable might be a byte, because the validated number only needs 8 bit storage, not 32 bit. Similarly, the state definition might expose an enum, but store the instance variable as an int. In this case, the difference is merely an implementation detail - the VALJO rules apply to the logical state.

Secondly, there may be more instance variables than those in the state definition. For example, a Currency is represented in ISO 4217 with two codes, a three letter string and a three digit number. Since it is possible to derive the numeric code from the string code (or vice versa), the state should consist only of the string. However, rather than looking up the number each time it is needed, it is perfectly acceptable to store the numeric code in an instance variable that is not part of the state. What would not be acceptable for Currency is including numeric code in the toString (as the numeric code is not part of the state, only the string code is).

In effect, the previous paragraph simply permits caching of related data within a VALJO so long as it can be derived from the state. Extending this logic further, it is acceptable to cache the toString() or hashCode().

On parsing, I recommend using of(String) if the state of the VALJO consists entirely of a single element which is naturally a string, such as Currency. I recommend parse(String) where the state consists of more than one element, or where the single element is not a string, such as Money.

It is acceptable for the parse method to accept alternate string representations. Thus, a Currency parsing method might accept both the standard three letter code and the string version of the numeric code.

On immutability, it is allowed to have an abstract VALJO class providing the constructor is private scoped and all the implementations are immutable private static final nested classes. The need for this is rare however.

Using VALJOs

It is best practice to use VALJOs in a specific way tailored for potential future conversion to JVM value types:

  • Do not use ==, only compare using equals().
  • Do not rely on System.identityHashCode().
  • Do not synchronize on instances.
  • Do not use the wait() and notify() methods.

The first of these rules is stronger than absolutely necessary given we are talking about normal Java objects in a current JVM. Basically, so long as your VALJO implementation promises to provide a singleton cached instance for each distinct state then using == is acceptable, because == and equals() will always return the same result. The real issue to be stopped here is using == to distinguish between two objects that are equal via equals().

Separately from these restrictions, the intention of VALJOs is that users should refer to the concrete type, not an interface. For example, users refer to the JSR-310 LocalDate, not the Temporal interface that it implements. The reason for this is that VALJOs are, of necessity, simple basic types, and hiding them behind an interface is not helpful.

Tooling

Actually writing VALJO classes is boring and tedious. The rules and implications need to be carefully thought through on occasion. Some projects provide tools which may assist to some degree.

Joda-Convert provides annotations for @ToString and @FromString which can be used to identify the round-trippable string format.

Joda-Beans provides a source code generation system that includes immutable beans. They are not instant VALJOs, but can be customised.

Auto-Value provides a bytecode generation system that converts abstract classes into immutable value objects. They are not instant VALJOs, but can be customised.

Project Lombok provides a customised compiler and IDE plugin which includes value object generation. They are not instant VALJOs, but can be customised.

In general however, if you want to write a really good VALJO, you need to do it by hand. And if that VALJO will be used by many others, such as in JSR-310, it is worth doing correctly.

Summary

The aim of this blog was to define VALJOs, a specific set of rules for value objects in Java. The rules are intended to be on the strict side - all VALJOs are value objects, but not all value objects are VALJOs.

Did I miss anything? Get something wrong? Let me know!

Monday, 10 February 2014

New project: ThreeTen-Extra for JDK 8

JDK 8 includes JSR-310, a new date and time library. But what about functionality that didn't make it into the JDK?

ThreeTen-Extra

The main ThreeTen project is now essentially complete. The project was developed and delivered via JSR-310 into OpenJDK and JDK 8.

However, as part of that process, certain pieces of functionality were rejected and/or excluded from the JDK. This was sometimes due to scope management and sometimes due to whether something was appropriate for the JDK.

The TheeTen-Extra project provides a home for that functionality.

ThreeTen-Extra is a project on GitHub that provides additional functionality. It is delivered via Maven Central and is dependent on JDK 8.

The following functionality is currently provided:

  • DayOfMonth temporal value type
  • DayOfYear temporal value type
  • AmPm temporal enum
  • Quarter temporal enum
  • YearQuarter temporal value type
  • Days, Months and Years amount value types
  • Next/previous day adjusters that skip the weekend (Saturday/Sunday)
  • Coptic calendar system
  • Support for the TAI and UTC time-scales

The project has spare space to add more functionality, so long as it is generally applicable. For example, additional calendar systems would be a good fit. Feel free to raise a pull request with your ideas.

Summary

The ThreeTen-Extra project is now available, providing an additional jar file of date/time code that builds on java.time (JSR-310) in JDK 8.

Comments welcome!

Saturday, 8 February 2014

Turning off doclint in JDK 8 Javadoc

JDK 8 includes many updates, but one is I suspect going to cause quite a few complaints - doclint for Javadoc.

Javadoc doclint

Documentation is not something most developers like writing. With Java, we were fortunate to have the Javadoc toolset built in and easy to access from day one. As such, writing Javadoc is a standard part of most developers life.

The Javadoc comments in source code use a mixture of tags, starting with @, and HTML to allow the developer to express their comment and format it nicely.

Up to JDK 7, the Javadoc tool was pretty lenient. As a developer, you could write anything that vaguely resembled HTML and the tool would rarely complain beyond warnings. Thus you could have @link references that were inaccurate (such as due to refactoring) and the tool would simply provide a warning.

With JDK 8, a new part has been added to Javadoc called doclint and it changes that friendly behaviour. In particular, the tool aim to get conforming W3C HTML 4.01 HTML (despite the fact that humans are very bad at matching conformance wrt HTML).

With JDK 8, you are unable to get Javadoc unless your tool meets the standards of doclint. Some of its rules are:

  • no self-closed HTML tags, such as <br /> or <a id="x" />
  • no unclosed HTML tags, such as <ul> without matching </ul>
  • no invalid HTML end tags, such as </br>
  • no invalid HTML attributes, based on doclint's interpretation of W3C HTML 4.01
  • no duplicate HTML id attribute
  • no empty HTML href attribute
  • no incorrectly nested headers, such as class documentation must have <h3>, not <h4>
  • no invalid HTML tags, such as List<String> (where you forgot to escape using &lt;)
  • no broken @link references
  • no broken @param references, they must match the actual parameter name
  • no broken @throws references, the first word must be a class name

Note that these are errors, not warnings. Break the rules and you get no Javadoc output.

In my opinion, this is way too strict to be the default. I have no problem with such a tool existing in Javadoc, but given the history of Javadoc, errors like this should be opt-in, not opt-out. Its far better to get slightly broken Javadoc than no Javadoc.

I also haven't been able to find a list of the rules, which makes life hard. At least we can see the source code to reverse engineer them.

Turning off doclint

The magic incantation you need is -Xdoclint:none. This goes on the command line invoking Javadoc.

If you are running from maven, you need to use the additionalparam setting, as per the manual. Either add it as a global property:

  <properties>
    <additionalparam>-Xdoclint:none</additionalparam>
  </properties>

or add it to the maven-javadoc-plugin:

  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-javadoc-plugin</artifactId>
      <configuration>
        <additionalparam>-Xdoclint:none</additionalparam>
      </configuration>
    </plugin>
  </plugins>

Ant also uses additionalparam to pass in -Xdoclint:none, see the manual.

Gradle does not expose additionalparam but Tim Yates and Cedric Champeau advise of this solution:

  if (JavaVersion.current().isJava8Compatible()) {
    allprojects {
      tasks.withType(Javadoc) {
        options.addStringOption('Xdoclint:none', '-quiet')
      }
    }
  }

See also the Gradle manual.

Summary

I don't mind doclint existing, but there is no way that it should be turned on to error mode by default. Getting some Javadoc produced without hassle is far more important than pandering to the doclint style checks. In addition, it is very heavy handed with what it defines to be errors, rejecting plenty of HTML that works perfectly fine in a browser.

I've asked the maven team to disable doclint by default, and I'd suggest the same to Ant and Gradle. Unfortunately, the Oracle team seem convinced that they've made the right choice with errors by default and their use of strict HTML.

Comments welcome, but please note that non-specific "it didn't work for me" comments should be at Stack Overflow or Java Ranch, not here!

Wednesday, 5 February 2014

Exiting the JVM

You learn something new about the JDK every day. Apparantly, System.exit(0) does not always stop the JVM!

System.exit()

This is a great Java puzzler from Peter Lawrey:

  public static void main(String... args) {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
      @Override
      public void run() {
        System.out.println("Locking");
        synchronized (lock) {
          System.out.println("Locked");
        }
      }
    }));
    synchronized (lock) {
      System.out.println("Exiting");
      System.exit(0);
    }
  }

What does the code do?

  1. Our code registers the shutdown hook
  2. Our code acquires the lock
  3. Our code prints "Exiting"
  4. Our code calls System.exit(0)
  5. System.exit(0) calls our shutdown hook
  6. Our shutdown hook prints "Locking"
  7. Our shutdown hook tries to acquire the lock
  8. Deadlock - Code never exits

Clearly, calling System.exit(0) and not exiting is a Bad Thing, although hopefully badly written shutdown hooks are rare. And there are also deprecated runFinalizersOnExit, another potential source of problems.

What are the alternatives?

The System.exit(0) call simply calls Runtime.getRuntime().exit(0), so that makes no difference.

The main alternative is Runtime.getRuntime().halt(0), described as "Forcibly terminates the currently running Java virtual machine". This does not call shutdown hooks or exit finalizers, it just exits.

But what if you want to try and exit nicely first, and only halt if that fails?

Well that seems like its a missing JDK method. However, a delay timer can be used to get a reasonable approximation:

  /**
   * Exits the JVM, trying to do it nicely, otherwise doing it nastily.
   * 
   * @param status  the exit status, zero for OK, non-zero for error
   * @param maxDelay  the maximum delay in milliseconds
   */
  public static void exit(final int status, long maxDelayMillis) {
    try {
      // setup a timer, so if nice exit fails, the nasty exit happens
      Timer timer = new Timer();
      timer.schedule(new TimerTask() {
        @Override
        public void run() {
          Runtime.getRuntime().halt(status);
        }
      }, maxDelayMillis);
      // try to exit nicely
      System.exit(status);
      
    } catch (Throwable ex) {
      // exit nastily if we have a problem
      Runtime.getRuntime().halt(status);
    } finally {
      // should never get here
      Runtime.getRuntime().halt(status);
    }
  }

All things being equal, that really should exit the JVM in the best way it can. In the puzzler, if you replace System.exit(0) with a call to this method, the deadlock will be broken and the JVM will exit.

Summary

System.exit(0) does not always stop the JVM. Runtime.getRuntime().halt(0) should always stop the JVM.

Final thought - if you're writing code to exit the JVM, have you considered whether you even need that code? After all, exiting the JVM doesn't play well in embedded or cloud environments...

Sunday, 15 September 2013

Speaking at JavaOne2013

Just a quick post to note that I'm speaking at JavaOne 2013.

I have three talks:

The first date and time talk is a full introduction to the concepts and classes in the API. The second date and time talk is example based, focussed on conversion from the older APIs. The money BOF is designed to get everyone talking around what we need for a Money class.

Looking forward to seeing you there!

Monday, 18 March 2013

Speaking at DevoxxUK

Next week I'm looking forward to speaking at a new conference - DevoxxUK.

I'll be speaking about JSR-310 in JDK 1.8. They'll be lots of other interesting talks too over the two days. So plenty of good stuff for real developers to get their teeth into.

And I'm fortunate enough to be able to offer my blog readers a discount on the already incredible price!

Looking forward to seeing you next week!

Tuesday, 4 December 2012

Annotating JDK default data

A common issue in Java development is use of the default Locale, default TimeZone and default CharSet.

JDK defaults

The JDK has a number of defaults that apply to a running JVM. The most well known are the default Locale, default TimeZone and default CharSet.

These defaults are very useful for getting systems up and running quickly. When a developer new to Java writes some code, they should get the localized answer they expect. However, in any larger environment, especially code intended to run on a server, the defaults cause a problem.

The classic example of this is the String.toLowerCase() method. Many developers use str1.toLowerCase().equals(str2.toLowerCase()) to check if two strings are equal ignoring case. But this code is not valid is all Locales! It turns out that in Turkey, there are two different upper case versions of the letter I and two different lower case versions. This causes lots of problems.

The problem applies more generally to server side applications. Any server-side code that relies on a JDK method using one of the defaults will alter its behaviour based on where and how the server is setup. This is clearly undesirable.

As is often the case, there are two competing forces - ease of use for newcomers, and bugs in larger applications. There is also the issue of backwards compatibility, meaning that the JDK methods depending on the defaults cannot be removed.

One approach has been taken by the Lucene project, scanning code using ASM. The link/tool also provides a useful list of JDK methods affected by this problem:

  java.lang.String#<init>(byte[])
  java.lang.String#<init>(byte[],int)
  java.lang.String#<init>(byte[],int,int)
  java.lang.String#<init>(byte[],int,int,int)
  java.lang.String#getBytes()
  java.lang.String#getBytes(int,int,byte[],int) 
  java.lang.String#toLowerCase()
  java.lang.String#toUpperCase()
  java.lang.String#format(java.lang.String,java.lang.Object[])

  java.io.FileReader
  java.io.FileWriter
  java.io.ByteArrayOutputStream#toString()
  java.io.InputStreamReader#(java.io.InputStream)
  java.io.OutputStreamWriter#(java.io.OutputStream)
  java.io.PrintStream#(java.io.File)
  java.io.PrintStream#(java.io.OutputStream)
  java.io.PrintStream#(java.io.OutputStream,boolean)
  java.io.PrintStream#(java.lang.String)
  java.io.PrintWriter#(java.io.File)
  java.io.PrintWriter#(java.io.OutputStream)
  java.io.PrintWriter#(java.io.OutputStream,boolean)
  java.io.PrintWriter#(java.lang.String)
  java.io.PrintWriter#format(java.lang.String,java.lang.Object[])
  java.io.PrintWriter#printf(java.lang.String,java.lang.Object[])

  java.nio.charset.Charset#displayName()

  java.text.BreakIterator#getCharacterInstance()
  java.text.BreakIterator#getLineInstance()
  java.text.BreakIterator#getSentenceInstance()
  java.text.BreakIterator#getWordInstance()
  java.text.Collator#getInstance()
  java.text.DateFormat#getTimeInstance()
  java.text.DateFormat#getTimeInstance(int)
  java.text.DateFormat#getDateInstance()
  java.text.DateFormat#getDateInstance(int)
  java.text.DateFormat#getDateTimeInstance()
  java.text.DateFormat#getDateTimeInstance(int,int)
  java.text.DateFormat#getInstance()
  java.text.DateFormatSymbols#()
  java.text.DateFormatSymbols#getInstance()

  java.text.DecimalFormat#()
  java.text.DecimalFormat#(java.lang.String)
  java.text.DecimalFormatSymbols#()
  java.text.DecimalFormatSymbols#getInstance()
  java.text.MessageFormat#(java.lang.String)
  java.text.NumberFormat#getInstance()
  java.text.NumberFormat#getNumberInstance()
  java.text.NumberFormat#getIntegerInstance()
  java.text.NumberFormat#getCurrencyInstance()
  java.text.NumberFormat#getPercentInstance()
  java.text.SimpleDateFormat#()
  java.text.SimpleDateFormat#(java.lang.String)

  java.util.Calendar#()
  java.util.Calendar#getInstance()
  java.util.Calendar#getInstance(java.util.Locale)
  java.util.Calendar#getInstance(java.util.TimeZone)
  java.util.Currency#getSymbol()
  java.util.GregorianCalendar#()
  java.util.GregorianCalendar#(int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int,int)
  java.util.GregorianCalendar#(java.util.Locale)
  java.util.GregorianCalendar#(java.util.TimeZone)

  java.util.Scanner#(java.io.InputStream)
  java.util.Scanner#(java.io.File)
  java.util.Scanner#(java.nio.channels.ReadableByteChannel)
  java.util.Formatter#()
  java.util.Formatter#(java.lang.Appendable)
  java.util.Formatter#(java.io.File)
  java.util.Formatter#(java.io.File,java.lang.String)
  java.util.Formatter#(java.io.OutputStream)
  java.util.Formatter#(java.io.OutputStream,java.lang.String)
  java.util.Formatter#(java.io.PrintStream)
  java.util.Formatter#(java.lang.String)
  java.util.Formatter#(java.lang.String,java.lang.String)

Thinking about the two competing forces, it seems to me that the best option would be a new annotation.

Imagine the JDK adds an annotation @DependsOnJdkDefaults. This would be used to annotate all methods in the JDK that directly or indirectly rely on a default, including all of those above. Developers outside the JDK could of course also use the annotation to mark their methods relying on defaults.

  @DependsOnJdkDefaults
  public String toLowerCase() {
    return toLowerCase(Locale.getDefault());
  }

Tooling like Checkstyle, IDEs and perhaps even the JDK compiler could then use the annotation to warn developers that they should not use the method. It would be possible to even envisage an IDE setting to hide the methods from auto-complete.

This would seem to balance the competing forces by providing information that could be widely used and very valuable.

Summary

I think marking methods that rely on JDK defult Locale/TimeZone/CharSet with an annotation would be a powerful tool. What do you think?