Monday 26 November 2012

Six months in

I've been busier than usual over the last few months. Thats because I now have a new focus in my life beyond ThreeTen/JSR-310 and OpenGamma.

He is our first son, now a little over six months old, and definitely into his food. A wild ride, but definitely worth it!

Friday 23 November 2012

Javadoc coding standards

Javadoc is a key part of coding in Java, yet there is relatively little discussion of what makes good Javadoc style - a coding standard.

Javadoc coding standard

These are the standards I tend to use when writing Javadoc. Since personal tastes differ, I've tried to explain some of the rationale for some of my choices. Bear in mind that this is more about the formatting of Javadoc, than the content of Javadoc.

There is an Oracle guide which is longer and more detailed than this one. The two agree in most places, however these guidelines are more explicit about HTML tags, two spaces in @param and null-specification, and differ in line lengths and sentence layout.

Each of the guidelines below consists of a short description of the rule and an explanation, which may include an example:

Write Javadoc to be read as source code

When we think of "Javadoc" we often think of the online Javadoc HTML pages. However, this is not the only way that Javadoc is consumed. A key way of absorbing Javadoc is reading source code, either of code you or your team wrote, or third party libraries. Making Javadoc readable as source code is critical, and these standards are guided by this principal.

Public and protected

All public and protected methods should be fully defined with Javadoc. Package and private methods do not have to be, but may benefit from it.

If a method is overridden in a subclass, Javadoc should only be present if it says something distinct to the original definition of the method. The @Override annotation should be used to indicate to source code readers that the Javadoc is inherited in addition to its normal meaning.

Use the standard style for the Javadoc comment

Javadoc only requires a '/**' at the start and a '*/' at the end. In addition to this, use a single star on each additional line:

  /**
   * Standard comment.
   */
  public ...

  /** Compressed comment. */
  public ...

Do not use '**/' at the end of the Javadoc.

Use simple HTML tags, not valid XHTML

Javadoc uses HTML tags to identify paragraphs and other elements. Many developers get drawn to the thought that XHTML is necessarily best, ensuring that all tags open and close correctly. This is a mistake. XHTML adds many extra tags that make the Javadoc harder to read as source code. The Javadoc parser will interpret the incomplete HTML tag soup just fine.

Use a single <p> tag between paragraphs

Longer Javadoc always needs multiple paragraphs. This naturally results in a question of how and where to add the paragraph tags. Place a single <p> tag on the blank line between paragraphs:

  /**
   * First paragraph.
   * <p>
   * Second paragraph.
   * May be on multiple lines.
   * <p>
   * Third paragraph.
   */
  public ...

Use a single <li> tag for items in a list

Lists are useful in Javadoc when explaining a set of options, choices or issues. These standards place a single <li> tag at the start of the line and no closing tag. In order to get correct paragraph formatting, extra paragraph tags are required:

  /**
   * First paragraph.
   * <p><ul>
   * <li>the first item
   * <li>the second item
   * <li>the third item
   * </ul><p>
   * Second paragraph.
   */
  public ...

Define a punchy first sentence

The first sentence, typically ended by a dot, is used in the next-level higher Javadoc. As such, it has the responsibility of summing up the method or class to readers scanning the class or package. To achieve this, the first sentence should be clear and punchy, and generally short.

While not required, it is recommended that the first sentence is a paragraph to itself. This helps retain the punchiness for readers of the source code.

It is recommended to use the third person form at the start. For example, "Gets the foo", "Sets the "bar" or "Consumes the baz". Avoid the second person form, such as "Get the foo".

Use "this" to refer to an instance of the class

When referring to an instance of the class being documented, use "this" to reference it. For example, "Returns a copy of this foo with the bar value updated".

Aim for short single line sentences

Wherever possible, make Javadoc sentences fit on a single line. Allow flexibility in the line length, favouring between 80 and 120 characters to make this work.

In most cases, each new sentence should start on a new line. This aids readability as source code, and simplifies refactoring re-writes of complex Javadoc.

  /**
   * This is the first paragraph, on one line.
   * <p>
   * This is the first sentence of the second paragraph, on one line.
   * This is the second sentence of the second paragraph, on one line.
   * This is the third sentence of the second paragraph which is a bit longer so has been
   * split onto a second line, as that makes sense.
   * This is the fourth sentence, which starts a new line, even though there is space above.
   */
  public ...

Use @link and @code wisely

Many Javadoc descriptions reference other methods and classes. This can be achieved most effectively using the @link and @code features.

The @link feature creates a visible hyperlink in generated Javadoc to the target. The @link target is one of the following forms:

  /**
   * First paragraph.
   * <p>
   * Link to a class named 'Foo': {@link Foo}.
   * Link to a method 'bar' on a class named 'Foo': {@link Foo#bar}.
   * Link to a method 'baz' on this class: {@link #baz}.
   * Link specifying text of the hyperlink after a space: {@link Foo the Foo class}.
   * Link to a method handling method overload {@link Foo#bar(String,int)}.
   */
  public ...

The @code feature provides a section of fixed-width font, ideal for references to methods and class names. While @link references are checked by the Javadoc compiler, @code references are not.

Only use @link on the first reference to a specific class or method. Use @code for subsequent references. This avoids excessive hyperlinks cluttering up the Javadoc.

Never use @link in the first sentence

The first sentence is used in the higher level Javadoc. Adding a hyperlink in that first sentence makes the higher level documentation more confusing. Always use @code in the first sentence if necessary. @link can be used from the second sentence/paragraph onwards.

Do not use @code for null, true or false

The concepts of null, true and false are very common in Javadoc. Adding @code for every occurrence is a burden to both the reader and writer of the Javadoc and adds no real value.

Use @param, @return and @throws

Almost all methods take in a parameter, return a result or both. The @param and @return features specify those inputs and outputs. The @throws feature specifies the thrown exceptions.

The @param entries should be specified in the same order as the parameters. The @return should be after the @param entries, followed by @throws.

Use @param for generics

If a class or method has generic type parameters, then these should be documented. The correct approach is an @param tag with the parameter name of <T> where T is the type parameter name.

Use one blank line before @param

There should be one blank line between the Javadoc text and the first @param or @return. This aids readability in source code.

Treat @param and @return as a phrase

The @param and @return should be treated as phrases rather than complete sentences. They should start with a lower case letter, typically using the word "the". They should not end with a dot. This aids readability in source code and when generated.

Treat @throws as an if clause

The @throws feature should normally be followed by "if" and the rest of the phrase describing the condition. For example, "@throws IllegalArgumentException if the file could not be found". This aids readability in source code and when generated.

@param should two spaces after the parameter name

When reading the Javadoc as source code, a single space after the parameter name is a lot harder to read than two spaces. Avoid aligning the parameters in a column, as it is prone to difficulty in refactoring where parameter names are changed or added.

  /**
   * Javadoc text.
   * 
   * @param foo  the foo parameter
   * @param bar  the bar parameter
   * @return the baz content
   */
  public String process(String foo, String bar) {...}

Define null-handling for all parameters and return types

Whether a method accepts null on input, or can return null is critical information for building large systems. All non-primitive methods should define their null-tolerance in the @param or @return. Some standard forms expressing this should be used wherever possible:

  • "not null" means that null is not accepted and passing in null will probably throw an exception , typically NullPointerException
  • "may be null" means that null may be passed in. In general the behaviour of the passed in null should be defined
  • "null treated as xxx" means that a null input is equivalent to the specified value
  • "null returns xxx" means that a null input always returns the specified value

When defined in this way, there should not be an @throws for NullPointerException.

  /**
   * Javadoc text.
   * 
   * @param foo  the foo parameter, not null
   * @param bar  the bar parameter, null returns null
   * @return the baz content, null if not processed
   */
  public String process(String foo, String bar) {...}

While it may be tempting to define null-handling behaviour in a single central location, such as the class or package Javadoc, this is far less useful for developers. The Javadoc at the method level appears in IDEs during normal coding, whereas class or package level Javadoc requires a separate "search and learn" step.

Other simple constraints may be added as well if applicable, for example "not empty, not null". Primitive values might specify their bounds, for example "from 1 to 5", or "not negative".

Specifications require implementation notes

If you are writing a more formal specification that will be implemented by third parties, consider adding an "implementation notes" section. This is an additional section, typically at the class level, that specifies any behaviours required by implementations that are not otherwise specified, or not of general interest. See this example.

Avoid @author

The @author feature can be used to record the authors of the class. This should be avoided, as it is usually out of date, and it can promote code ownership by an individual. The source control system is in a much better position to record authors.

Examples

The ThreeTen repository has some more complete examples

Summary

Hopefully these suggestions will help you to write better Javadoc. Feel free to disagree or point to some alternative standards.

Tuesday 6 November 2012

Pitfalls of "consistent with equals"

The equals() method and the compareTo() method on Comparable are two of the most basic in Java. Yet their definitions have an interesting wrinkle around the concept of "consistent with equals".

The equals() method

The equals() method is both well defined and unclear in Java. It is well-defined in that it specifies exactly how the check for equal must work. They must be reflexive, symmetric, transitive, consistent and handle null.

However the equals() is also unclear. The Javadoc says that the method "Indicates whether some other object is "equal to" this one". Note the "equal to" part is in quotes. The key here is that it does not define how the equality should be determined. There are three standard choices:

  • the identity of the object (==), which is the default inherited from Object
  • the entire observable state of the object, such that if two objects are equal then one is substitutable for the other in other parts of the application
  • some subset of the information that makes logical sense to check equals on, such as an ID

The compareTo() method

The Comparable interface defines the concept of comparability. The Javadoc specifies that it "imposes a total ordering on the objects of each class that implements it".

Classes that implement Comparable have a natural ordering and can be sorted easily and used in collections like TreeSet and TreeMap without a separate Comparator.

The interface is well-defined, in that it specifies that the implementation must ensure a form of symmetry and transitive behaviour, just like equals(). It does not specify exactly how the comparison should be made, but defines the concept of consistency with equals.

Consistent/Inconsistent with equals

The Comparable interface says the following:

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C.

Basically, this requires that the concept of equals defined by compareTo is the same as that defined by equals() (apart from nulls). This is at first glance a simple requirement but, as I'm going to discuss, it has complications.

This kind of definition is especially useful when considering operator overloading. If we consider a Java-like language where == is not object identity, but comparison using a method, and the greater-than/less-than operators are also available the question is what method to call. Greater-than/less-than would in this Java-like language naturally be based on compareTo() and == would call equals().

  // our new Java-like language
  if (a < b) return "Less";      // translation ignoring nulls: if (a.compareTo(b) < 0)
  if (a > b) return "Greater";   // translation ignoring nulls: if (a.compareTo(b) > 0)
  if (a == b) return "Equal";    // translation ignoring nulls: if (a.equals(b))
  throw new Exception("Impossible assuming no nulls?");

But if compareTo() is "inconsistent with equals" then this code can throw the exception, because a.compareTo(b) can return zero when a.equals(b) is false.

Other problems can occur in collections like TreeMap:

  // Foo class is "inconsistent with equals"
  assert foo1.equals(foo2) == false;
  assert foo1.compareTo(foo2) == 0;
  
  TreeMap<Foo, String> map = ...
  map.put(foo1, "a");
  map.put(foo2, "b");

Here the two objects are not equal using equals() but are equal using compareTo(). In this case, the map will have size one, not size two!

Because of the problems with being "inconsistent with equals", the Javadoc says It is strongly recommended (though not required) that natural orderings be consistent with equals.

Many JDK classes implementing Comparable do so in a manner "consistent with equals". These include Byte, Short, Integer, Long, Character and String.

Some others are more interesting:

  • BigDecimal - well-known to be "inconsistent with equals", as 4.00 is not equal to 4.0, but do compare the same
  • Double/Float - provides an explicit sort order and equals check for positive and negative zero and NaN to ensure compareTo is "consistent with equals"
  • CharSet - based on the ID/name. equals is case-senstitive, while compareTo is not, although as the names are normalized it tends not to matter - dubiously "consistent"
  • *Buffer (nio) - based on remaining content of the buffer. equals and compareTo appear to be "consistent" under my testing
  • Rdn (ldap) - based on the normalized form of the state, thus is "consistent with equals"
  • ObjectStreamField (serialization) - based on the name, but with primitives sorted first. equals is not overriden, so is "inconsistent with equals"
  • ObjectName (jmx) - equals based on the canonical name, compareTo defined as "not completely specified but is intended to be such that a sorted list of ObjectNames will appear in an order that is convenient for a person to read"! "inconsistent with equals"
  • FileTime (io) - implements equals using compareTo, definitely "consistent with equals"
  • File (io) - OS-specific, implements equals using compareTo, definitely "consistent with equals"
  • URI - source code appears to be "consistent with equals"
  • UUID - source code for equals includes variant that compareTo doesn't, but it appears to be unessesary code with no effect such that it is "consistent with equals"
  • Date (java.util) - source code difficult to follow because equals changes the state of the object (normalizing the deprecated setters), whereas compareTo does not change the state. As both ultimately so the same calculation it is "consistent with equals"
  • Date/Time (sql) - subclass java.util.Date and add nothing new, so are also "consistent with equals"
  • Timestamp (sql) - horrible compareTo and equals implementations that break the contracts, but is "consistent with equals" when comparing two Timestamp objects
  • Calendar/GregorianCalendar - equals compares the state, while compareTo compares along the time-line. Definitely "inconsistent with equals"

Note: In almost all cases, I had to examine the source code or write test code to determine whether a class was or was not "consistent with equals". There is a good Adopt-a-JDK task here to clarify the Javadoc and check UUID equals.

JSR-310

Having seen too many issues around BigDecimal, the plan has been for JSR-310 classes to be "consistent wth equals". However, a recent thread shows how controversial this can get.

Basically, it seems really simple to define equals/compareTo for some classes. LocalDate is just a date in a single calendar system, so has an obvious ordering and equals. LocalTime is just a time, so has an obvious ordering and equals. Instant is just an instantaeous point on the time-line, so has an obvious ordering and equals.

However other cases are not so obvious. Consider OffsetDateTime:

  dt1 = OffsetDateTime.parse("2012-11-05T06:00+01:00");
  dt2 = OffsetDateTime.parse("2012-11-05T07:00+02:00");

These two date-time objects represent the same instantaneous point on the time-line. But they have different local times and different offsets from UTC/Greenwich.

So, one question to readers.... which of these options do you prefer...

  1. dt1 not equal to dt2, compareTo() takes into account the local time and offset, "consistent with equals" (with a separate Comparator to sort by instant)
  2. dt1 not equal to dt2, compareTo() based on the instantaneous point on the time-line, "inconsistent with equals"
  3. dt1 equal to dt2, compareTo() based on the instantaneous point on the time-line, "consistent with equals"
  4. dt1 equal to dt2 and do not implement Comparable
  5. dt1 not equal to dt2 and do not implement Comparable

Personally I struggle with the idea that dt1.equals(dt2) would be true, but I'm open to opinions.

BTW, the question could also be posed with respect to BigDecimal. If you could to make that class "consistent with equals" would you change the equals() method or the compareTo() method?